Ensemble Self-Paced Learning Based on Adaptive Mixture Weighting

Liu, Liwen; Wang, Zhong; Bai, Jianbin; Yang, Xiangfeng; Yang, Yunchuan; Zhou, Jianbo

doi:10.3390/electronics11193154

Open AccessArticle

Ensemble Self-Paced Learning Based on Adaptive Mixture Weighting

by

Liwen Liu

^1,*

,

Zhong Wang

¹,

Jianbin Bai

¹,

Xiangfeng Yang

¹,

Yunchuan Yang

¹ and

Jianbo Zhou

²

¹

Xi’an Precision Machinery Research Institute, Xi’an 710077, China

²

School of Marine Science and Technologyg, Northwestern Polytechnical University, Xi’an 710072, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(19), 3154; https://doi.org/10.3390/electronics11193154

Submission received: 8 September 2022 / Revised: 20 September 2022 / Accepted: 23 September 2022 / Published: 1 October 2022

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Self-paced learning (SPL) is a learning mechanism inspired by human and animal learning processes that gives variable weights to samples, gradually introducing simple to more complicated samples into the learning set as the “age” parameter increases. To regulate the learning process, a self-paced weighting regularization term with an “age” parameter is introduced to the learning function. Several self-paced weighting methods have been proposed, and different regularization terms might result in varied learning performance. However, on the one hand, it is difficult to select a suitable weighting method for SPL. On the other hand, it is challenging to determine the “age” parameter, and it is easy for SPL to obtain poor results as the “age” of the model increases. To solve the aforementioned difficulties, an ensemble SPL approach with an adaptive mixture weighting mechanism is proposed in this study. First, as the “age” parameter increases, a set of base classifiers is collected to produce a new data set, which is used to learn the second-level classifier. Then, the ensemble model is used to generate the final output to avoid the selection of the optimal “age” parameter. An adaptive mixture weighting method is designed to reduce the dependence of parameters on human experience. The previous methods find it difficult to determine the “age” parameters or self-paced parameters. In this paper, these parameters can be adjusted adaptively during the learning process. In comparison with the previous SPL techniques, the proposed method achieves the best results in 27 of the 32 datasets in the experiments with the adaptive parameters. The statistical tests are carried out to show that the proposed method is superior to other state-of-the-art algorithms.

Keywords:

ensemble learning; self-paced learning; supervised learning; classification

1. Introduction

Self-paced learning (SPL) is a machine learning mechanism that is based on human and animal learning processes [1]. SPL can be traced back to curriculum learning [2,3,4]. Curriculum learning and SPL both use the method of gradually introducing simple to more complex examples into the training set for learning. In curriculum learning, curriculum knowledge must be specified in advance as priori information, and then, the information remains intact. However, based on what the learners have already learned, the SPL mechanism can dynamically generate curriculum information [5,6]. Self-paced learning has been widely employed in pattern recognition and computer vision since its inception, such as multimedia event detection [7,8], matrix factorization [9,10], long-term tracking [11,12], image segmentation and classification [13,14,15], co-saliency detection [16], and so on. Many studies have found that SPL can help models avoid local minima and improve their generalization ability [17,18]. SPL can also lessen the impact of noise and anomalous data in the data set, resulting in more reliable learning.

In recent years, self-paced learning has been widely developed [5,8,19,20,21,22]. Kumar et al. [1] gave the basic form of SPL and solved it in an iterative manner. Then, SPL was used to for ranking problems in the field of zero-example multimedia search [7]. In [19], a self-paced learning method with diversity (SPLD) is proposed to consider sample diversity. This method improves the diversity of the current training set by selecting easy samples from different categories, thereby avoiding the imbalance of sample categories caused by different attributes. Li et al. [8] proposed a multi-objective self-paced learning method (MOSPL), by introducing multi-objective optimization technology to optimize the loss function and the self-paced regular term at the same time. This method obtains the complete solution path of the self-paced learning problem when the self-paced parameters have no prior knowledge and are difficult to obtain. In [9], matrix factorization problems were solved by self-paced learning, which achieves good results in the related applications, such as background subtraction and structure from motion problems. In order to solve the problem of prior knowledge, Jiang et al. [5] combined curriculum learning and self-paced learning, and they proposed self-paced curriculum learning (SPCL), which also considered the prior knowledge before training and the curriculum information during the learning process. Liang et al. [20] proposed active self-paced learning (ASPL), which combines active learning and self-paced learning to make the classifier have better classification in the case of less labeled sample data. Following [20], a new Deep Self-paced Active Learning (DSAL) was proposed in [6] by combining ASPL with deep learning techniques. ASPL realizes incremental learning under weak manual labeling on face recognition problems. In [22], to deal with spatial prior information and make the suggested technique noise-resistant, the spatial information of the samples is taken into account in the self-paced learning process. Furthermore, some intelligent optimization algorithms were used to solve self-paced problem to find the optimal solution path [10]. The design of SPL regularization terms has become a key research direction in recent years [1,23,24]. SPL reflects the difficulty of the sample by assigning a weight to each sample, and the SPL regularization term determines the selection of samples and the calculation of weights. Finally, the curriculum information and the model are jointly learned by iteratively updating the weight variables and model parameters. Kumar et al. [1] control the optimization process of SPL by introducing the

l_{1}

-norm of the weight variable to the objective function as the regularization term. The weight variable only uses two value states “0” or “1” to indicate whether the sample is selected, which is the hard weighting form of SPL. In order to measure the importance of samples on the basis of selected samples, Jiang et al. [19] proposed several SPL regularization terms, such as linear soft weighting, logarithmic soft weighting, and mixture weighting, which are used to solve learning problems in different data and problem contexts. When the sample loss value is small, compared with other weighting forms, the mixture weighting method can assign a large weight to the sample. Therefore, the mixture weighting method favors the sample with a small loss value to a certain extent, and it is widely used in the real-world applications [19,22].

As indicated in [23,25], the current SPL methods increase the “age” parameter to gradually involve more samples into training. However, it is difficult to select a suitable “age” parameter to achieve the best results. This paper aims to learn an ensemble classifier to avoid the selection of the “age” parameter, which is able to improve classification performance by combining multiple base classifiers [26,27,28]. The current ensemble learning methods can be roughly divided into three categories: bagging, boosting, and stacking. Bagging employs numerous versions of a training set generated via a random draw with the substitution of n samples, where n is the size of the original training set [29,30]. A new model is trained using each of these data sets, and voting is used to aggregate the model outputs into a single output. Boosting is an ensemble approach for turning a collection of weak classifiers into a strong classifier, which allows the predictors to be learned sequentially [31,32]. Stacking is divided into two levels: basic models (first level) and stacking model (second level) [33,34]. To learn from a dataset, the base learner employs a variety of models to learn from the current data set. To produce a new data set, the outputs of each model are collected. The stacking model then uses that data set to generate the final output.

In the current SPL regimes, on the one hand, it is a difficult task to determine the optimal “age” parameter. On the other hand, the form of the soft part of the mixture weighting is too simple, and only one form of soft weighting is considered. It is an urgent task to extend the mixture weighting method to adaptively penalize the objective function. In this paper, in order to address the above issues, we propose an ensemble self-paced learning method based on an adaptive mixture weighting strategy (ESPL). As the “age” parameter increases, a set of weak classifiers is constructed to formulate a new training set. Then, these classifiers are stacked to acquire the second-level classifier. Next, an adaptive mixture weighting regularization term is proposed to penalize the SPL objective function. Experiments on classification problems have demonstrated that the proposed ESPL is robust to the selection of “age” parameter and can achieve the best results in terms of statistical tests.

This paper proposes an ensemble self-paced learning (ESPL) method based on an adaptive mixture weighting scheme. On the one hand, this paper aims to avoid the selection of “age” parameters by stacking multiple base self-paced learning models with different “age” parameters. On the other hand, this paper aims to propose a new adaptive mixture weighting strategy to penalize the SPL objective function adaptively. The remainder of this paper is structured as follows: The background knowledge of self-paced learning is described in Section 2. Section 3 introduces the proposed ensemble SPL method in detail. Experimental studies are shown in Section 4. Finally, concluding remarks are given in Section 5.

2. Background

In this section, the background of self-paced learning is described. Some well-known self-paced weighing regularizers are shown in this section. Finally, the background of ensemble stacking learning is stated.

2.1. Self-Paced Learning

In this paper,

D = {(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}

denotes the training data, where

x_{i} \in R^{m}

represents the ith observed sample, and

y_{i}

is its corresponding label. To indicate the ease of a sample, SPL uses the weight variable

s

. The latent weight variable

s

and the model parameters

w

are jointly learned by minimizing:

min_{w, s} E (w, s; ϕ) = \sum_{i = 1}^{n} s_{i} L (y_{i}, g (x_{i}, w)) + f (s; ϕ),

(1)

where

s \in {[0, 1]}^{n}

and

f (s; ϕ)

is the SPL regularization term.

ϕ

is called the “age” parameter. In general, an increasing pace sequence is given in advance to define this “age” parameter. The cost between the ground truth label

y_{i}

and the estimated label

g (x_{i}, w)

is calculated using the loss function

L (y_{i}, g (x_{i}, w))

. The algorithm of SPL is shown in Algorithm 1. Alternative convex search is commonly used to solve the aforementioned equation. The variables are divided into two discontinuous blocks in this biconvex optimization iterative approach. Each iteration optimizes one set of variables while leaving the others alone.

When

w

is fixed, as shown in Algorithm 1, we can derive closed-form optimum solutions using various SPL regularization terms. The weighted training loss and the negative

l_{1}

-norm regularizer

- {∥ s ∥}_{1} = - \sum_{i = 1}^{n} s_{i}

are minimized with an increasing pace parameter in the original SPL implementation [1], where

s_{i}

is a binary variable. Then, some more efficient SPL regularization terms have been proposed as follows [7]:

Linear soft weighting regularizer: The losses are penalized linearly in this method, which can be expressed as the following function:

f (s; ϕ) = ϕ (\frac{1}{2} {∥s∥}_{2}^{2} - \sum_{i = 1}^{n} s_{i}),

(2)

where

ϕ > 0

. The SPL model’s closed-form solutions under the linear soft weighting scheme are as follows:

s_{i}^{*} = \{\begin{matrix} 1 - \frac{L_{i}}{ϕ} & L_{i} < ϕ \\ 0 & L_{i} \geq ϕ . \end{matrix}

(3)

Logarithmic soft weighting regularizer: This approach can be represented as the following function to logarithmically discriminate samples with respect to losses:

f (s; ϕ) = \sum_{i = 1}^{n} (ζ s_{i} - \frac{ζ^{s_{i}}}{log ζ}),

(4)

where

ζ = 1 - ϕ

and

0 < ϕ < 1

. Then, using the logarithmic soft weighting method, the closed-form optimal solution is expressed as:

s_{i}^{*} = \{\begin{matrix} \frac{1}{log ζ} log (L_{i} + ζ) & L_{i} < ϕ \\ 0 & L_{i} \geq ϕ . \end{matrix}

(5)

Mixture weighting regularizer: The mixture approach is a combination of the “soft” and the “hard” schemes, as represented by the function:

f (s; ϕ) = - ζ \sum_{i = 1}^{n} log (s_{i} + \frac{1}{ϕ_{1}} ζ),

(6)

where

ζ = \frac{ϕ_{1} ϕ_{2}}{ϕ_{1} - ϕ_{2}}

and

ϕ_{1} > ϕ_{2} > 0

. The following is the closed-form optimal solution:

s_{i}^{*} = \{\begin{matrix} 1 & L_{i} \leq ϕ_{2} \\ 0 & L_{i} \geq ϕ_{1} \\ \frac{ζ}{L_{i}} - \frac{ζ}{ϕ_{1}} & otherwise . \end{matrix}

(7)

When

s

is set, the available off-the-shelf learning methods can be used to find the best

w^{*}

. The “age” of the SPL model to control the learning pace is the parameter

ϕ

. We gradually raise the “age” parameter during the evolution of SPL to learn new samples.

Algorithm 1: Algorithm of SPL

Input: The training dataset

D

.

Output: Model parameter

w

.

2.2. Ensemble Stacking Learning

Several classifiers are used in ensemble approaches. Each of these classifiers creates a model based on the data. Finally, learning, fusion, or voting among various classifiers is completed for classification, and the class with the highest votes is the final class. Stacking is a very powerful ensemble learning strategy, in addition to the bagging and boosting methods discussed above. The basic idea behind stacking is to improve the generalization performance by enhancing the heterogeneity of the basis classifiers and combining the outputs predicted by the base classifiers using the second-level classifier.

Figure 1 illustrates the general stacking method, which consists of three major steps. First, the first-level classifiers are created using the original training data set. There are several strategies for learning base classifiers. For example, we can use the bootstrap sampling technique to learn independent classifiers. To build various base classifiers, we can tune parameters in a learning algorithm. Furthermore, we can use a variety of classification and/or sampling approaches to generate base classifiers. Second, a new data set is established from the output of the base classifiers. The labels of the first-level classifiers are considered new features in this case, but the original class labels are retained as labels in the new generated data set. Finally, using the newly produced data set, we create a second-level classifier. The second-level classifier could be learned using any learning approach.

3. Methodology

In this section, we describe the proposed ensemble self-paced learning (ESPL) method in detail. First, the framework of ESPL is introduced. Next, a new adaptive mixture weighting method is designed. Finally, the Majorization Minimization algorithm is used to optimize the objective function of the proposed method.

3.1. Ensemble Self-Paced Learning

Ensemble learning brings together several learners to address the same problem. Traditional machine learning algorithms learn a decision function from training data and use it to generate predictions on the test set, whereas ensemble learning methods attempt to construct a set of decision functions and combine them. As described above, during the evolution of SPL, a set of classifiers is generated based on the increasing “age” parameter. The previous studies have indicated that it is difficult to determine this “age” parameter, since SPL can easily obtain bad results as the “age” parameter increases. In order to avoid the above issues, this paper proposes an ensemble self-paced learning method to ensemble multiple SPL models generated in the evolution of SPL.

To train independent base learners, bagging uses bootstrap sampling and takes the majority as the resulting prediction [28]. Every round, boosting adjusts the weight distribution, learns the appropriate base classifiers, and then merges them based on their accuracy. Stacking, unlike the above two methods, trains a high-level classifier on top of the underlying classifiers. In this paper, stacking is considered to learn a second-level classifier, which is used to combine multiple SPL models generated in the evolution of SPL. As shown in Figure 2, as the “age” parameter increases, more samples are included into the training process during the evolution of SPL. At a certain “age” parameter, the samples with weight

s > 0

constitute the current training set to establish a first-level classifier. Then, a new training set is constructed based on the output of first-level classifiers. Finally, a second-level classifier is created using the newly formed data set. The algorithm of the proposed ESPL method is shown in Algorithm 2.

Algorithm 2: Algorithm of Ensemble SPL

Input: The training dataset

D = {x_{i}, y_{i}}_{i = 1}^{n}

.

Output: An ensemble clssifier H.

3.2. Adaptive Mixture Weighting

The previous mixture weighting method used a combination of hard weights and logarithmic soft weights, and the form was relatively simple. This paper aims to extend this hybrid weight to a more general form and to be able to adaptively select hard and soft weight combination forms. In order to solve the above problems, this paper proposes an adaptive mixture weighting method, as shown in the following formula:

f (s; {α_{1}, α_{2}, β}) = \sum_{i = 1}^{n} (\frac{α_{1}}{β} s_{i}^{β} - (α_{1} + α_{2}) s_{i}),

(8)

where

α_{1} > 0

,

α_{2} > 0

, and

β > 1

.

According to the above formula, the objective function of SPL based on the adaptive mixture weighting method can be obtained as:

\begin{matrix} min_{w, s} E (w, s) = & \sum_{i = 1}^{n} s_{i} L (y_{i}, g (x_{i}, w)) \\ + \sum_{i = 1}^{n} (\frac{α_{1}}{β} s_{i}^{β} - (α_{1} + α_{2}) s_{i}) . \end{matrix}

(9)

The Hessian matrix

H

of

f (s; {α_{1}, α_{2}, β})

with respect to

s

can be represented as:

\begin{matrix} [\begin{matrix} α_{1} (β - 1) s_{1}^{β - 2} & 0 & \dots & 0 \\ 0 & α_{1} (β - 1) s_{2}^{β - 2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & α_{1} (β - 1) s_{n}^{β - 2} \end{matrix}] \end{matrix}

(10)

where

α_{1} > 0

and

β > 1

.

s_{i} \in (0, 1]

is the weight of the ith sample for the sampled involved in training. It can be observed that

α_{1} (β - 1) s_{i}^{β - 2}

is greater than zero for every

i = 1, 2, \dots, n

. Therefore,

H

is a symmetric diagonal matrix. The diagonal elements are the eigenvalues of a diagonal matrix. The main diagonal entries are obviously greater than zero. Then, the Hessian of f is positive definite. Therefore, the proposed regularizer is convex.

3.3. Optimization and Analysis

The Majorization Minimization (MM) algorithm can be used to optimize the objective function of the proposed method. By iterating the two phases of identifying the optimization function and minimizing it, the technique seeks to reduce a complex optimization issue to a simple one. Assume that

s^{*} (α; L)

is the optimal weights and the integrative function of

s^{*} (α; L)

is

F_{α} (L) = \int_{0}^{L} s^{*} (α; L) d L .

(11)

Q_{α} (w | w^{*})

denotes a tractable surrogate for

F_{α} (L (w))

. In [17], Meng et al. have indicated that

\begin{matrix} Q_{α}^{(i)} (w | w^{*}) = & F_{α} (L_{i} (w^{*})) \\ + s_{i}^{*} (α; L_{i} (w^{*})) (L_{i} (w) - L_{i} (w^{*})), \end{matrix}

(12)

and then we have:

\sum_{i = 1}^{n} F_{α} (L_{i} (w)) \leq \sum_{i = 1}^{n} Q_{α}^{(i)} (w | w^{*}) .

(13)

In this paper,

w^{t}

is used to represent the model parameters in the tth iteration of the MM method.

3.3.1. Majorization Step

To acquire each

Q_{ϕ}^{(i)} (w | w^{t})

, we can compute

s_{i}^{*} (ϕ; L_{i} (w^{t}))

by solving the following problem:

s_{i}^{*} (ϕ; L_{i} (w^{t})) = arg min_{s_{i} \in [0, 1]} s_{i} L_{i} (w^{t}) + f (s_{i}; ϕ) .

(14)

where the form of

f (s_{i}; ϕ)

has been given in the Equation (8). As described above, the proposed

f (s_{i}; ϕ)

in (8) is convex. Then, we have

\frac{\partial E}{\partial s_{i}} = L_{i} + α_{1} s_{i}^{β - 1} - (α_{1} + α_{2}) = 0,

(15)

and

s_{i}^{β - 1} = \frac{α_{1} + α_{2} - L_{i}}{α_{1}} .

(16)

Finally, the closed-form optimal solution is

v_{n}^{*} = \{\begin{matrix} 1, L_{i} \leq α_{2} \\ 0, L_{i} \geq α_{1} + α_{2} \\ {(\frac{α_{1} + α_{2} - L_{i}}{α_{1}})}^{\frac{1}{β - 1}}, otherwise . \end{matrix}

(17)

3.3.2. Minimization Step

In this minimization step, we have

w^{t + 1} = arg min_{w} \sum_{i = 1}^{n} v_{i} L_{i} (w^{t}) .

(18)

Here, support vector machine (SVM) is used as the learning model for self-paced learning. The proposed method combined with support vector machine can be expressed as the following form:

\begin{matrix} min_{w, s} \frac{1}{2} {∥ w ∥}_{2}^{2} + C \sum_{i = 1}^{n} s_{i} L_{i} + f (s; λ) \\ s . t . \forall i, y_{i} (w^{T} ϕ (x_{i}) + b) \geq 1 - L_{i}, L_{i} \geq 0 \\ y \in {- 1, + 1}^{n}, s \in {[0, 1]}^{n}, \end{matrix}

(19)

where

L_{i}

is the standard hinge loss:

L_{i} = max \{0, 1 - y_{i} \cdot (w^{T} ϕ (x_{i}) + b)\} .

(20)

ϕ (x_{i})

is a feature mapping function. The regularization parameter

C (C > 0)

is a trade-off between the hinge loss and the margin.

3.3.3. Parameter Analysis

The parameters in self-paced learning are usually set based on human experience. The above-mentioned mixture weighting method includes three types of parameters:

α_{1}

,

α_{2}

, and

β

. As shown in Equation (17), the parameter

α_{2}

controls the number of samples with

v = 1

. Therefore, we let

α_{2} = 0.2 \cdot α_{1}

to reduce the number of parameters. Let

L s

represent the loss L in ascending order and

N_{t}

be the number of samples involved in training. Then, we have

α_{1} + α_{2} = 1.2 α_{1} = L s_{N_{t}} .

(21)

For the polynomial parameter

β

, researchers can assign specific values to obtain a SPL regularization term suitable for the specific problem. This paper expects that it can dynamically adjust according to the loss of samples in each iteration. Therefore, we have

β = tan ((1 - \frac{N_{t}}{2 n + 1}) \cdot \frac{π}{2}) .

(22)

In the early stage of SPL, simple samples are selected into the data set, and they are given larger weights. Therefore, this stage

β

should have large values. In the iterative process, more complex samples are selected into the training set; then,

β

should have a small value.

3.4. Analysis

As shown in Figure 3, the flowchart of the proposed method is given. The results of each iteration process constitute a set of first-level classifiers. Then, the prediction results of the first-level classifiers on the training set and the test set constitute the training matrix and the test matrix, and the training matrix and the test matrix in each iteration process together to learn a second-level classifier. In the previous studies on SPL weighting methods, some soft weighting methods have been proposed to penalize the loss. It has been demonstrated that the mixture weighting method is able to receive better generalization ability than the hard and sort weighting methods. However, the form of the soft part of the mixture weighting is too simple, and only one form of soft weighting is considered. In formula (8), in comparison with the original mixture weighting method, an additional parameter

β

is involved in the objective function. As shown in formula (17), the parameter

β

is able to control the form of the soft part of the proposed mixture weighting method. Researchers can assign specific values to obtain an SPL regularization term suitable for the specific problem. Furthermore, this paper expects that it can dynamically adjust according to the loss of samples in each iteration, which has been shown in formula (22).

4. Experimental Studies

In the experiments, a set of data sets from the KEEL repository are employed [35], which are widely used for comparative studies in several literatures [36,37]. Using the k-fold cross-validation technique, these datasets accessed on 8 September 2022 (https://sci2s.ugr.es/keel/imbalanced.php) were divided into five training/test subsets. A subset of the data was utilized as a test set, while the rest was used as a training set in cross-validation. As a result, the cross-validation procedure is carried out five times. The five-fold results can then be averaged to provide a single estimate. The data sets utilized in this study are listed in Table 1. We show the number of instances and attributes for each data collection separately.

For comparing several algorithms across multiple data sets, we use the Friedman and Nemenyi tests [38]. The Friedman statistic value is computed as follows

F_{F} = \frac{(J - 1) X_{F}^{2}}{J (K - 1) - X_{F}^{2}},

(23)

where

X_{F}^{2} = \frac{12 J}{K (K + 1)} [\sum_{j} R_{j}^{2} - \frac{K {(K + 1)}^{2}}{4}] .

(24)

K is the number of algorithms being compared, J is the number of datasets and

R_{j}

is the average rank of the jth algorithm. The null hypothesis is that there is no difference in the average ranks. If the Friedman statistic is larger than the critical value of F distribution with

K - 1

and

(K - 1) \times (J - 1)

degrees of freedom for 0.05, we can reject the null hypothesis and perform Nemenyi tests to further analyze the average ranks. The Nemenyi statistic is obtained as follows

C D = q_{0.05} \sqrt{\frac{K (K + 1)}{6 J}},

(25)

where

q_{0.05}

is the Studentized range statistic divided by

\sqrt{2}

. Let

d_{i j}

denote the difference between the average ranks of the ith algorithm and jth algorithm. If

d_{i j} > C D

, the difference between the algorithms is significant.

4.1. Test of the Proposed Ensemble Strategy

Since the growth of the “age” parameter, the present SPL approaches may generate unsatisfactory results, as discussed in the preceding sections. In most cases, the results obtained by self-paced learning will get better first and then worse. As the “age” parameter increases, some hard samples involved in the training will cause the model performance to degrade. However, due to a lack of prior information, determining the “age” parameter is problematic. The proposed ESPL method considers the models generated in the SPL implementation as the first-level classifiers to avoid the section of the “age” parameter.

Figure 4 shows the convergence curves obtained by SPL and the proposed ESPL methods. It can be observed that the index values obtained by SPL and ESPL gradually increase in the first few iterations. After several iterations, the results obtained by the original SPL will gradually deteriorate. As shown in Figure 4, the blue curves obtained by SPL first rise and then fall. However, the proposed ESPL method ensembles the previous models and is robust to the “age” parameter.

4.2. Test of the Proposed Adaptive Mixture Weighting Method

As described in Section 2, the hard weighting scheme, the linear soft weighting scheme, the logarithmic soft weighting scheme, and the mixture weighting scheme have been widely used in the current SPL-based applications. The previous mixture weighting method used a simple form, which is a combination of hard weights and logarithmic soft weights. The proposed adaptive mixture weighting method broadens the scope of this hybrid weight and enables the adaptive selection of hard and soft weight combinations.

The values in terms of Accuracy, F-measure, and Gmean obtained by different weighting schemes are reported in Figure 5, Figure 6 and Figure 7. Note that all these algorithms with different weighting schemes utilize the proposed ensemble strategy and only ESPL employs the proposed adaptive mixture weighting method. It is obvious that the proposed ESPL with adaptive mixture weighting achieves the best results in these dataset.

We conduct a statistical test to validate statistical differences between the results generated by different weighting methods within the proposed ESPL framework in order to further evaluate the performance of the proposed adaptive mixture weighting method. To examine the results in terms of Accuracy, F-measure, and Gmean, we apply the Friedman and Nemenyi tests [38]. The comparison involves five (

K = 5

) algorithms with 32 (

J = 32

) data sets. The p values of Friedman statistic in terms of Accuracy, F-measure, and Gmean in the experiments are

5.611 \times 10^{- 10}

,

1.43 \times 10^{- 15}

, and

1.005 \times 10^{- 27}

, respectively. As a result, with a 95% confidence level, we find that the evaluated algorithms are significantly different in the three criteria. The results of the Nemenyi test are shown in Figure 8. In terms of Accuracy and Gmean, it can be seen that the proposed adaptive mixture weighting scheme and other weighting algorithms are significantly different. Although there are no significant differences in terms of F-measure between the adaptive mixture weighting scheme and the hard weighting scheme, the proposed adaptive mixture weighting system has the lowest average rank and has been proved to be superior in experiments.

4.3. Experimental Results

We compare ESPL to the Decision Tree classifier [39], linear discriminant analysis (LDA) [40], and AdaBoost [41] algorithms to validate its performance. One of the most extensively used and practical approaches for inductive inference is decision tree learning. First, the decision tree with C4.5 was selected as the competing algorithm in the experiments. It is a method for approximating discrete-valued functions that can learn disjunctive expressions and is robust to noisy data [39]. In comparison to the spread within each class, LDA finds the subspace in which the data between various classes are most spread out, which makes LDA suitable for classification problems. Furthermore, AdaBoost is also chosen as the competing algorithm in the following experiments. AdaBoost is an ensemble method that has been successfully applied to a variety of applications while avoiding the concerns of overfitting. In the next iteration, patterns that are incorrectly identified are given a higher weight. Patterns around the decision boundary are typically more difficult to categorize and, as a result, receive high weights after several iterations.

The results on KEEL datasets obtained by four methods with respect to Accuracy are shown in Table 2. It is obvious that the proposed ESPL algorithm achieves the best results in 23 of the 32 datasets. The decision tree algorithm achieves the best results on certain data such as glass0, ecoli2, glass-0-4_vs_5, vowel0, and page_blocks_1_3_vs_4. The LDA method achieves the best results in eight of the 32 datasets. On some data, such as new_thyroid1, ecoli-0-3-4-6_vs_5, ecoli-0-2-6-7_vs_3-5, ecoli-0-1-4-7_vs_5-6, shuttle_c2_vs_c4, and dermatology-6-5-5tst, the AdaBoost algorithm produces the best results. The results on KEEL datasets obtained by four methods with respect to F-measure are shown in Table 3. The decision tree algorithm achieves the best results on some data sets such as glass0, ecoli2, yeast-0-3-5-9_vs_7-8, glass-0-4_vs_5, vowel0, and page_blocks_1_3_vs_4. The LDA algorithm achieves the best results on yeast_2_vs_4, ecoli-0-1_vs_5, and led7digit-0-2-4-5-6-7-8-9_vs_1. For new_thyroid1, ecoli-0-3-4-6_vs_5, ecoli-0-2-6-7_vs_3-5, shuttle_c2_vs_c4, and dermatology-6-5-5tst, AdaBoost obtains the best results. The results on KEEL datasets obtained by four methods with respect to Gmean are shown in Table 4. It is obvious that the proposed ESPL algorithm achieves the best results in 27 of the 32 datasets. From Table 2, Table 3 and Table 4, it can be observed that the proposed ESPL algorithm achieves satisfactory results in terms of Accuracy, F-measure, and Gmean.

We conduct a statistical test to validate statistical differences between the results provided by Decision Tree, LDA, AdaBoost, and ESPL in order to further evaluate the performance of the proposed ESPL technique. In terms of Accuracy, F-measure, and Gmean, the Friedman and Nemenyi tests are used to examine the data. The comparison involves five (

K = 4

) algorithms with 32 (

J = 32

) datasets. The p values of Friedman statistic in terms of Accuracy, F-measure, and Gmean in the experiments are

3.616 \times 10^{- 8}

,

1.633 \times 10^{- 10}

, and

7.082 \times 10^{- 14}

, respectively. As a result, with a 95% confidence level, we find that the evaluated algorithms are significantly different in the three indexes. The results of the Nemenyi test are show in Figure 9. It can be observed that from a statistical point of view, the performance of our method is superior to the comparison algorithms.

5. Concluding Remarks

The mixture weighting schemes have the characteristics of both hard and soft weights, so they are widely used in many self-paced learning problems. This paper has proposed an ensemble self-paced learning method and designed an adaptive mixture weighting strategy. As the “age” parameter increases, a group of first-level classifiers is generated for each “age” parameter, which is then utilized to establish the new data set and train the second-level classifier. To avoid selecting the best “age” parameter, the ensemble model is utilized to generate the final output. Furthermore, an adaptive mixture weighting approach has been proposed, which is incorporated into the learning objective of the SVM algorithm. Then, the proposed method is solved by the MM algorithm, and the closed-form solutions of the proposed optimization problem are derived. Experimental results on classification challenges illustrate the effectiveness of the proposed ensemble SPL method. Statistical tests have been conducted to demonstrate that the suggested method outperforms other state-of-the-art algorithms. In the future, we hope to investigate the deep learning-based algorithms under the proposed ensemble self-paced learning framework and explore more case studies to discuss the proposed method.

Author Contributions

Conceptualization, L.L. and Z.W.; methodology, L.L.; software, L.L.; validation, J.B., X.Y. and Y.Y.; formal analysis, J.Z.; investigation, J.Z.; resources, J.B.; data curation, Z.W.; writing—original draft preparation, L.L.; writing—review and editing, Z.W.; visualization, J.B.; supervision, J.Z.; project administration, Y.Y.; funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Guangdong Province grant number [2214050001305], the Science Foundation of Zhanjiang City grant number [211207157080994], the National Natural Science Foundation of China grant number [11904290] and the sixth Youth Talent Promotion Project of China Association for Science and Technology grant number [2020QNRC002].

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kumar, M.P.; Packer, B.; Koller, D. Self-paced learning for latent variable models. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; pp. 1189–1197. [Google Scholar]
Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 41–48. [Google Scholar]
Bae, J.; Kim, T.; Lee, W.; Shim, I. Curriculum learning for vehicle lateral stability estimations. IEEE Access 2021, 9, 89249–89262. [Google Scholar] [CrossRef]
Wang, X.; Chen, Y.; Zhu, W. A survey on curriculum learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4555–4576. [Google Scholar] [CrossRef] [PubMed]
Jiang, L.; Meng, D.; Zhao, Q.; Shan, S.; Hauptmann, A.G. Self-paced curriculum learning. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2694–2700. [Google Scholar]
Wang, W.; Feng, R.; Chen, J.; Lu, Y.; Chen, T.; Yu, H.; Chen, D.Z.; Wu, J. Nodule-plus R-CNN and deep self-paced active learning for 3D instance segmentation of pulmonary nodules. IEEE Access 2019, 7, 128796–128805. [Google Scholar] [CrossRef]
Jiang, L.; Meng, D.; Mitamura, T.; Hauptmann, A.G. Easy samples first: Self-paced reranking for zero-example multimedia search. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 547–556. [Google Scholar]
Li, H.; Gong, M.; Meng, D.; Miao, Q. Multi-objective self-paced learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 1802–1808. [Google Scholar]
Zhao, Q.; Meng, D.; Jiang, L.; Xie, Q.; Xu, Z.; Hauptmann, A.G. Self-paced learning for matrix factorization. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 3196–3202. [Google Scholar]
Li, H.; Gong, M.; Wang, C.; Miao, Q. Pareto self-paced learning based on differential evolution. IEEE Trans. Cybern. 2021, 51, 4187–4200. [Google Scholar] [CrossRef]
Supancic, J.S.; Ramanan, D. Self-paced learning for long-term tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2379–2386. [Google Scholar]
Ge, D.; Song, J.; Qi, Y.; Wang, C.; Miao, Q. Self-paced dense connectivity learning for visual tracking. IEEE Access 2019, 7, 37181–37191. [Google Scholar] [CrossRef]
Peng, J.; Wang, P.; Desrosiers, C.; Pedersoli, M. Self-paced contrastive learning for semi-supervised medical image segmentation with meta-labels. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 7–10 December 2021; Volume 34. [Google Scholar]
Li, H.; Li, J.; Zhao, Y.; Gong, M.; Zhang, Y.; Liu, T. Cost-sensitive self-paced learning with adaptive regularization for classification of image time series. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 11713–11727. [Google Scholar] [CrossRef]
Pan, X.; Wei, D.; Zhao, Y.; Ma, M.; Wang, H. Self-paced learning with diversity for medical image segmentation by using the query-by-committee and dynamic clustering techniques. IEEE Access 2020, 9, 9834–9844. [Google Scholar] [CrossRef]
Zhang, D.; Meng, D.; Li, C.; Jiang, L.; Zhao, Q.; Han, J. A self-paced multiple-instance learning framework for co-saliency detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 594–602. [Google Scholar]
Meng, D.; Zhao, Q.; Jiang, L. A theoretical understanding of self-paced learning. Inform. Sci. 2017, 414, 319–328. [Google Scholar] [CrossRef]
Klink, P.; Abdulsamad, H.; Belousov, B.; D’Eramo, C.; Peters, J.; Pajarinen, J. A probabilistic interpretation of self-paced learning with applications to reinforcement learning. J. Mach. Learn. Res. 2021, 22, 1–52. [Google Scholar]
Jiang, L.; Meng, D.; Yu, S.-I.; Lan, Z.; Shan, S.; Hauptmann, A. Self-paced learning with diversity. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27, pp. 2078–2086. [Google Scholar]
Lin, L.; Wang, K.; Meng, D.; Zuo, W.; Zhang, L. Active self-paced learning for cost-effective and progressive face identification. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 7–19. [Google Scholar] [CrossRef] [Green Version]
Sun, L.; Zhou, Y. FSPMTL: Flexible self-paced multi-task learning. IEEE Access 2020, 8, 132012–132020. [Google Scholar] [CrossRef]
Li, H.; Gong, M.; Zhang, M.; Wu, Y. Spatially self-paced convolutional networks for change detection in heterogeneous images. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2021, 14, 4966–4979. [Google Scholar] [CrossRef]
Gong, M.; Li, H.; Meng, D.; Miao, Q.; Liu, J. Decomposition-based evolutionary multi-objective optimization to self-paced learning. IEEE Trans. Evol. Comput. 2019, 23, 288–302. [Google Scholar] [CrossRef]
Huang, Z.; Ren, Y.; Liu, W.; Pu, X. Self-paced multi-view clustering via a novel soft weighted regularizer. IEEE Access 2019, 7, 168629–168636. [Google Scholar] [CrossRef]
Gu, B.; Zhai, Z.; Li, X.; Huang, H. Finding age path of self-paced learning. In Proceedings of the 2021 IEEE International Conference on Data Mining, Auckland, New Zealand, 7–10 December 2021; pp. 151–160. [Google Scholar]
Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 2011, 42, 463–484. [Google Scholar] [CrossRef]
Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. A Syst. Humans 2010, 41, 552–568. [Google Scholar] [CrossRef]
Martínez-Muñoz, G.; Suárez, A. Out-of-bag estimation of the optimal sample size in bagging. Pattern Recogn. 2010, 43, 143–152. [Google Scholar] [CrossRef] [Green Version]
Mordelet, F.; Vert, J.-P. A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 2014, 37, 201–209. [Google Scholar] [CrossRef] [Green Version]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 2–18 December 2018; Volume 31, pp. 6638–6648. [Google Scholar]
Shen, C.; Li, H. On the dual formulation of boosting algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 2216–2231. [Google Scholar] [CrossRef] [Green Version]
Akyol, K. Stacking ensemble based deep neural networks modeling for effective epileptic seizure detection. Expert Syst. Appl. 2020, 148, 113239. [Google Scholar] [CrossRef]
Jiang, W.; Chen, Z.; Xiang, Y.; Shao, D.; Ma, L.; Zhang, J. SSEM: A novel self-adaptive stacking ensemble model for classification. IEEE Access 2019, 7, 120337–120349. [Google Scholar] [CrossRef]
Alcalá-Fdez, J.; Fernández, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F. KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
Ramírez-Gallego, S.; García, S.; Benítez, J.M.; Herrera, F. Multivariate discretization based on evolutionary cut points selection for classification. IEEE Trans. Cybern. 2016, 46, 595–608. [Google Scholar] [CrossRef]
Rosales-Pérez, A.; García, S.; Gonzalez, J.A.; Coello, C.A.C.; Herrera, F. An evolutionary multiobjective model and instance selection for support vector machines with Pareto-based ensembles. IEEE Trans. Evol. Comput. 2017, 21, 863–877. [Google Scholar] [CrossRef]
Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Polat, K.; Güneş, S. A novel hybrid intelligent method based on C4. 5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Syst. Appl. 2009, 36, 1587–1592. [Google Scholar] [CrossRef]
Ioffe, S. Probabilistic linear discriminant analysis. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 531–542. [Google Scholar]
Rätsch, G.; Onoda, T.; Müller, K.-R. Soft margins for adaboost. Mach. Learn. 2001, 42, 287–320. [Google Scholar] [CrossRef]

Figure 1. Ensemble learning.

Figure 2. Illustration of ensemble self-paced learning.

Figure 3. Flowchart of the proposed ESPL method.

Figure 4. The convergence curves obtained by SPL and the proposed ESPL methods. The three rows shows the results on the glass0, vehicle3, and glass-0-4_vs_5 data sets, respectively.

Figure 5. The results in terms of Accuracy obtained by different weighting schemes.

Figure 6. The results in terms of F-measure obtained by different weighting schemes.

Figure 7. The results in terms of Gmean obtained by different weighting schemes.

Figure 8. The graphs of the results of Nemenyi test obtained by different weighting schemes in terms of Accuracy, F-measure, and Gmean.

Figure 9. The graphs of the results of Nemenyi test obtained by Decision Tree, LDA, AdaBoost, and ESPL in terms of Accuracy, F-measure, and Gmean.

Table 1. Description of the data sets used in this paper. For each data set, we present the number of attributes and the number of instances.

Dataset	Instances	Attributes
wisconsin	683	10
pima	768	9
glass0	214	10
haberman	306	4
vehicle2	846	19
vehicle1	846	19
vehicle3	846	19
vehicle0	846	19
ecoli1	336	8
new_thyroid1	215	6
new_thyroid2	215	6
ecoli2	336	8
glass6	214	10
ecoli-0-3-4_vs_5	200	8
ecoli-0-2-3-4_vs_5	202	8
yeast-0-3-5-9_vs_7-8	506	9
ecoli-0-4-6_vs_5	203	7
ecoli-0-3-4-6_vs_5	205	8
ecoli-0-3-4-7_vs_5-6	257	8
ecoli-0-1_vs_2-3-5	244	8
yeast_2_vs_4	514	9
glass-0-4_vs_5	92	10
ecoli-0-2-6-7_vs_3-5	224	8
vowel0	988	14
ecoli-0-6-7_vs_5	220	7
ecoli-0-1-4-7_vs_2-3-5-6	336	8
ecoli-0-1_vs_5	240	7
led7digit-0-2-4-5-6-7-8-9_vs_1	443	8
ecoli-0-1-4-7_vs_5-6	332	7
page_blocks_1_3_vs_4	472	11
shuttle_c2_vs_c4	129	10
dermatology-6-5-5tst	358	35

Table 2. Results on KEEL datasets obtained by four algorithms in terms of Accuracy. The best results are marked in bold.

Dataset	Decision	LDA	ADA	ESPL
Dataset	Tree	LDA	Boost	ESPL
wisconsin	0.9459	0.9590	0.9678	0.9693
pima	0.7227	0.7682	0.7643	0.7786
glass0	0.8175	0.7617	0.7666	0.7897
haberman	0.6601	0.7418	0.7383	0.7548
vehicle2	0.9527	0.9681	0.9634	0.9693
vehicle1	0.7459	0.8002	0.8061	0.8144
vehicle3	0.7364	0.7955	0.7908	0.8014
vehicle0	0.9456	0.9515	0.9669	0.9728
ecoli1	0.8930	0.8841	0.8749	0.8958
new_thyroid1	0.9442	0.9488	0.9907	0.9907
new_thyroid2	0.9628	0.9395	0.9907	0.9953
ecoli2	0.9167	0.9137	0.9018	0.9167
glass6	0.9301	0.9628	0.9579	0.9627
ecoli-0-3-4_vs_5	0.9300	0.9400	0.9500	0.9550
ecoli-0-2-3-4	0.9256	0.9554	0.9604	0.9654
_vs_5
yeast-0-3-5-9	0.8893	0.9170	0.7546	0.9170
_vs_7-8
ecoli-0-4-6_vs_5	0.9460	0.9657	0.9656	0.9656
ecoli-0-3-4-6	0.9512	0.9512	0.9659	0.9659
_vs_5
ecoli-0-3-4-7	0.9378	0.9456	0.9689	0.9728
_vs_5-6
ecoli-0-1_vs	0.9425	0.9589	0.9548	0.9589
_2-3-5
yeast_2_vs_4	0.9436	0.9572	0.9417	0.9533
glass-0-4_vs_5	0.9889	0.9561	0.9673	0.9784
ecoli-0-2-6-7	0.9376	0.9598	0.9687	0.9687
_vs_3-5
vowel0	0.9858	0.9494	0.9727	0.9727
ecoli-0-6-7_vs_5	0.9500	0.9591	0.9591	0.9682
ecoli-0-1-4-7	0.9494	0.9583	0.9643	0.9673
_vs_2-3-5-6
ecoli-0-1_vs_5	0.9583	0.9667	0.9500	0.9542
led7digit-0-2-4	0.9594	0.9617	0.9594	0.9616
-5-6-7-8-9_vs_1
ecoli-0-1-4-7	0.9488	0.9610	0.9669	0.9699
_vs_5-6
page_blocks	0.9915	0.9640	0.9534	0.9746
_1_3_vs_4
shuttle_c2_vs_c4	0.9923	0.9766	1.0000	1.0000
dermatology	0.9944	0.9972	0.9972	0.9972
-6-5-5tst

Table 3. Results on KEEL datasets obtained by four algorithms in terms of F-measure. The best results are marked in bold.

Dataset	Decision	LDA	ADA	ESPL
Dataset	Tree	LDA	Boost	ESPL
wisconsin	0.9227	0.9399	0.9544	0.9565
pima	0.5852	0.6277	0.6399	0.6712
glass0	0.7037	0.5802	0.6291	0.6758
haberman	0.2873	0.2303	0.4836	0.5319
vehicle2	0.9090	0.9379	0.9297	0.9407
vehicle1	0.4920	0.5721	0.5905	0.6567
vehicle3	0.4728	0.5475	0.5724	0.6318
vehicle0	0.8866	0.8971	0.9321	0.9436
ecoli1	0.7681	0.7594	0.7430	0.7694
new_thyroid1	0.8364	0.8030	0.9713	0.9713
new_thyroid2	0.8857	0.7547	0.9713	0.9867
ecoli2	0.7000	0.7139	0.6837	0.7200
glass6	0.7345	0.8576	0.8508	0.8718
ecoli-0-3-4_vs_5	0.6254	0.7021	0.7524	0.7835
ecoli-0-2-3-4	0.6176	0.7754	0.7857	0.8143
_vs_5
yeast-0-3-5-9	0.4298	0.3360	0.2866	0.3612
_vs_7-8
ecoli-0-4-6_vs_5	0.6881	0.8214	0.8063	0.8254
ecoli-0-3-4-6	0.7381	0.7529	0.8040	0.8040
_vs_5
ecoli-0-3-4-7	0.6264	0.6864	0.8311	0.8529
_vs_5-6
ecoli-0-1_vs	0.6633	0.7396	0.7064	0.7521
_2-3-5
yeast_2_vs_4	0.7129	0.7518	0.6342	0.7292
glass-0-4_vs_5	0.9333	0.7667	0.8076	0.8933
ecoli-0-2-6-7	0.6494	0.7600	0.7944	0.7944
_vs_3-5
vowel0	0.9178	0.7265	0.8383	0.8425
ecoli-0-6-7_vs_5	0.7543	0.7611	0.7348	0.8262
ecoli-0-1-4-7	0.6798	0.7188	0.7800	0.7921
_vs_2-3-5-6
ecoli-0-1_vs_5	0.7244	0.7944	0.7024	0.7310
led7digit-0-2-4	0.7614	0.7889	0.7614	0.7766
-5-6-7-8-9_vs_1
ecoli-0-1-4-7	0.6340	0.6810	0.7721	0.7988
_vs_5-6
page_blocks	0.9381	0.5958	0.6683	0.8426
_1_3_vs_4
shuttle_c2_vs_c4	0.9333	0.8000	1.0000	1.0000
dermatology	0.9492	0.9714	0.9778	0.9778
-6-5-5tst

Table 4. Results on KEEL datasets obtained by four algorithms in terms of Gmean. The best results are marked in bold.

Dataset	Decision	LDA	ADA	ESPL
Dataset	Tree	LDA	Boost	ESPL
wisconsin	0.9416	0.9486	0.9665	0.9686
pima	0.6741	0.7021	0.7137	0.7452
glass0	0.7731	0.6681	0.7129	0.7523
haberman	0.4539	0.3673	0.6184	0.6718
vehicle2	0.9405	0.9572	0.9580	0.9626
vehicle1	0.6334	0.6816	0.7202	0.7931
vehicle3	0.6220	0.6649	0.6997	0.7662
vehicle0	0.9304	0.9325	0.9590	0.9663
ecoli1	0.8381	0.8428	0.8263	0.8885
new_thyroid1	0.9042	0.8226	0.9824	0.9824
new_thyroid2	0.9303	0.7834	0.9824	0.9972
ecoli2	0.7882	0.8120	0.8522	0.8834
glass6	0.8342	0.9022	0.9176	0.9495
ecoli-0-3-4_vs_5	0.7652	0.8392	0.8557	0.9143
ecoli-0-2-3-4	0.7550	0.8758	0.8543	0.8811
_vs_5
yeast-0-3-5-9	0.6258	0.4595	0.4743	0.6092
_vs_7-8
ecoli-0-4-6_vs_5	0.7843	0.8816	0.8565	0.9221
ecoli-0-3-4-6	0.8206	0.8493	0.8560	0.9047
_vs_5
ecoli-0-3-4-7	0.7331	0.7849	0.8860	0.9069
_vs_5-6
ecoli-0-1_vs	0.7774	0.8252	0.8122	0.8470
_2-3-5
yeast_2_vs_4	0.8332	0.8098	0.7947	0.8801
glass-0-4_vs_5	0.9940	0.8724	0.9410	0.9743
ecoli-0-2-6-7	0.7675	0.8284	0.8385	0.8947
_vs_3-5
vowel0	0.9393	0.8477	0.9216	0.9599
ecoli-0-6-7_vs_5	0.6442	0.8520	0.8131	0.8863
ecoli-0-1-4-7	0.7958	0.7844	0.8457	0.8803
_vs_2-3-5-6
ecoli-0-1_vs_5	0.8362	0.8796	0.8356	0.8785
led7digit-0-2-4	0.8704	0.8993	0.8704	0.8917
-5-6-7-8-9_vs_1
ecoli-0-1-4-7	0.7547	0.7553	0.8607	0.9021
_vs_5-6
page_blocks	0.9955	0.6859	0.9235	0.9863
_1_3_vs_4
shuttle_c2_vs_c4	0.9414	0.9332	1.0000	1.0000
dermatology	0.9717	0.9732	0.9985	0.9985
-6-5-5tst

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, L.; Wang, Z.; Bai, J.; Yang, X.; Yang, Y.; Zhou, J. Ensemble Self-Paced Learning Based on Adaptive Mixture Weighting. Electronics 2022, 11, 3154. https://doi.org/10.3390/electronics11193154

AMA Style

Liu L, Wang Z, Bai J, Yang X, Yang Y, Zhou J. Ensemble Self-Paced Learning Based on Adaptive Mixture Weighting. Electronics. 2022; 11(19):3154. https://doi.org/10.3390/electronics11193154

Chicago/Turabian Style

Liu, Liwen, Zhong Wang, Jianbin Bai, Xiangfeng Yang, Yunchuan Yang, and Jianbo Zhou. 2022. "Ensemble Self-Paced Learning Based on Adaptive Mixture Weighting" Electronics 11, no. 19: 3154. https://doi.org/10.3390/electronics11193154

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ensemble Self-Paced Learning Based on Adaptive Mixture Weighting

Abstract

1. Introduction

2. Background

2.1. Self-Paced Learning

2.2. Ensemble Stacking Learning

3. Methodology

3.1. Ensemble Self-Paced Learning

3.2. Adaptive Mixture Weighting

3.3. Optimization and Analysis

3.3.1. Majorization Step

3.3.2. Minimization Step

3.3.3. Parameter Analysis

3.4. Analysis

4. Experimental Studies

4.1. Test of the Proposed Ensemble Strategy

4.2. Test of the Proposed Adaptive Mixture Weighting Method

4.3. Experimental Results

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI