Next Article in Journal
An Efficient Fractional Chebyshev Chaotic Map-Based Three-Factor Session Initiation Protocol for the Human-Centered IoT Architecture
Next Article in Special Issue
Role Minimization Optimization Algorithm Based on Concept Lattice Factor
Previous Article in Journal
A Planar-3D Mathematical Model for Studying the Effect of Heterogeneity of Rock Fracture Toughness on Hydraulic Fracture Propagation: Early-Time Solution including the Stage before Propagation
Previous Article in Special Issue
A Random Forest-Based Method for Predicting Borehole Trajectories
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Parallel Selector for Feature Reduction

1
School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China
2
School of Science, Jiangsu University of Science and Technology, Zhenjiang 212100, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(9), 2084; https://doi.org/10.3390/math11092084
Submission received: 29 March 2023 / Revised: 22 April 2023 / Accepted: 24 April 2023 / Published: 27 April 2023
(This article belongs to the Special Issue Data Mining: Analysis and Applications)

Abstract

:
In the field of rough set, feature reduction is a hot topic. Up to now, to better guide the explorations of this topic, various devices regarding feature reduction have been developed. Nevertheless, some challenges regarding these devices should not be ignored: (1) the viewpoint provided by a fixed measure is underabundant; (2) the final reduct based on single constraint is sometimes powerless to data perturbation; (3) the efficiency in deriving the final reduct is inferior. In this study, to improve the effectiveness and efficiency of feature reduction algorithms, a novel framework named parallel selector for feature reduction is reported. Firstly, the granularity of raw features is quantitatively characterized. Secondly, based on these granularity values, the raw features are sorted. Thirdly, the reordered features are evaluated again. Finally, following these two evaluations, the reordered features are divided into groups, and the features satisfying given constraints are parallel selected. Our framework can not only guide a relatively stable feature sequencing if data perturbation occurs but can also reduce time consumption for feature reduction. The experimental results over 25 UCI data sets with four different ratios of noisy labels demonstrated the superiority of our framework through a comparison with eight state-of-the-art algorithms.

1. Introduction

In the real world, data play the role of recording abundant information from objects. Thus, an open issue is how to effectively obtain valuable information from high-dimensional data.
Thus far, based on different deep learning models, various popular feature selection devices with respect to this issue have been provided. For instance, Gui et al. [1] proposed a neural network-based feature selection architecture by employing attention and learning modules, which aimed to improve the computation complexity and stability on noisy data. Li et al. [2] proposed a two-step nonparametric approach by combining the strengths of both neural networks and feature screening, which aimed to overcome the challenging problems if feature selection occurs in high-dimension, low-sample-size data. Chen et al. [3] proposed a deep learning-based method, which aimed to select important features for high-dimensional and low-sample size data. Xiao et al. [4] reported a federated learning system with enhanced feature selection, which aimed to produce high recognition accuracy to wearable sensor-based human activity recognition. In addition, based on probability theory, some classical mathematical models were also used to obtain a qualified feature subset. For example, the Bayesian model was employed in mixture model training and feature selection [5]. The trace of the conditional covariance operator was also used to perform feature selection [6].
Currently, in rough set theory [7,8,9,10], feature reduction [11,12,13,14,15,16,17] is drawing considerable attention in regard to this topic by virtue of its high efficiency in alleviating the case of overfitting [18,19], reducing the complexity of learners [20,21,22], and so on. It has been widely employed in general data preprocessing [23] because of its typical advantage in that redundant features can be removed from data without influencing the structure of raw features. A key point of traditional feature reduction devices is the search strategy. With an extensive review, most of the accepted search strategies employed in previous devices can be categorized into the following three fundamental aspects.
  • Forward searching. The core of such a phase is to discriminate appropriate features in each iteration and add them into a feature subset named the reduct pool. The specific process of one popular forward search strategy named forward greedy searching (FGS) [24,25,26,27,28,29,30] is a follows: (1) given a predefined constraint, each feature is evaluated by a measure [31,32,33,34], and the most qualified feature is selected; (2) the selected feature is added into a reduct pool; (3) if the constraint is satisfied, the search process is terminated. Obviously, the most effective feature in each iteration constitutes the final feature subset.
  • Backward searching. The core of such a phase is to discriminate those features with inferior quality and remove them from the raw features. The specific process of one popular backward searching named backward greedy searching (BGS) [35,36] is as follows: (1) given a predefined constraint, each object in raw feature is evaluated by a measure, and those unqualified features are selected; (2) selected features from raw features are removed; (3) if the constraint is satisfied by the remaining features, the search process is terminated.
  • Random searching. The core of such a phase is to randomly select qualified features from candidate features and add them into a reduct pool. The specific process of one classic random searching strategy is named simulated annealing (SA) [37,38] is as follows: (1) for given a predefined constraint, a randomly generated binary sequence is used to picture features (“1” indicates the corresponding feature is selected; “0” indicates the corresponding feature is not selected; the number of binary digits represents the number of raw features.); (2) multiple random changes are exerted upon the sequence, the corresponding fitness values are recorded, i.e., the selected features are evaluated; (3) the sequence turns into a new state with the highest fitness value; (4), (2), and (3) are executed iteratively until the given constraint is satisfied.
Obviously, there mainly exist three limitations regarding the previous feature reduction algorithms. (1) Lack of diverse evaluations. The evaluation originating from different measures may have a disparity; that is, the features selected in single measure are likely to be ineffective when evaluated by some other measures with different semantic explanations. (2) Lack of stable selection. The previous selected feature(s) based on single constraint may seriously mislead the subsequent selection if data perturbation occurs. (3) Lack of efficient selection. For each iteration, all candidate features are required to be evaluated; thus, the time consumption tends to be unsatisfactory with the increasing of feature dimensions.
Based on these three limitations discussed above, a novel framework named parallel selector for feature reduction is reported in the context of this paper. Compared with previous research, our framework mainly consists of three differences: (1) more viewpoints for evaluating features are introduced; that is, different measures are employed for acquiring more qualified features; (2) data perturbation exerts no obvious effect on our framework because the constraints related to different measures are employed and a stable feature sequence sorted by granularity values also works; (3) the iterative selection process is abandoned and replaced by a parallel selection mechanism, through which the efficiency of deriving a final reduct is then improved.
Significantly, the detailed calculation process of our framework should also be plainly expressed. Firstly, to reveal the distinguishing ability of different features over samples, the granularity of feature is combined with our framework; that is, the granularity values of all features are calculated respectively. Secondly, based on the obtained granularity values, all features are sorted. Note that a smaller granularity value means that a corresponding feature could make samples more distinguishable. Thirdly, another measure is used to evaluate the importance of the reordered features. Fourthly, features are divided into groups by considering their comprehensive performance. Finally, the qualified features are parallel selected from these groups according to the required constraints. Immediately, a few distinct advantages emerge from such a framework.
  • Providing diverse viewpoints for feature evaluation. In most existing search strategies, the richness of the measure is hard to take into account; that is, the importance of candidate features generated from single measure is usually deemed to be sufficient, e.g., the final reduct of greedy-based forward searching algorithm which was proposed by Hu et al. [39] is derived from a measure named dependency and a corresponding constraint. From this point of view, the selected feature subset may be unqualified if another independent measure is used to evaluate the importance of features. However, in our framework, different measures are employed for evaluating features; thus, more comprehensive evaluations about features can then be obtained. In view of this, our framework is then more effective than are previous feature reduction strategies.
  • Improving data stability for feature reduction. In previous studies, the reduct pool is composed of the qualified features selected from each iteration, which indicates the reduct pool is iteratively updated. Thus, it should be pointed out that for each iteration, all features that have been added into the reduct pool are involved in the next evaluation, e.g., the construction of final feature subset in feature reduction strategy proposed by Yang et al. [40] is affected by the selected features. From this point of view, if data perturbation occurs, the selected features will mislead subsequent selection. However, in our framework, each feature is weighted by its granularity value and a feature sequence is then obtained, which is relatively stable in the face of data perturbation.
  • Accelerating searching process for feature reduction. In most search strategies for selecting features, e.g., heuristic algorithm and backward greedy algorithm, all iterative features should be evaluated for characterizing their importance. However, the redundancy of evaluation is inevitable in the iteration. This will bring extra time consumption if selection occurs in higher dimensional data. However, in our framework, according to different measures, the process of feature evaluation should be respectively carried out only once. Moreover, the introduction of a grouping mechanism makes it possible to select qualified features in parallel.
In summary, the main contributions regarding our framework are listed as follows: (1) a diverse evaluation mechanism is designed, which can produce different viewpoints for evaluating features; (2) granularity is used for not only evaluating features but also for providing the stability of the selection results if data perturbation occurs; (3) an efficient parallel selection mechanism is developed to accelerate the process of deriving a final reduct; (4) a novel feature reduction framework is reported, which can be combined with various existing feature reduction strategies to improve the quality of their final reduct.
The remainder of this paper is organized as follows. Section 2 provides the reviews of some basic concepts concerning feature reduction. Section 3 details basic contents of our framework and elaborates its application regarding feature reduction. The results of comparative experiments and the corresponding analysis are reported in Section 4. Finally, conclusions and future prospects are outlined in Section 5.

2. Preliminaries

2.1. Neighborhood Rough Set

In rough set field, a decision system can be represented by a pair such that DS = U , A T { d } . U = { x i | 1 i n } is a nonempty set of samples, A T = { a k | 1 k m } is a nonempty set of conditional features, and d is a specific feature which aims to unlock the labels of samples. Particularly, the set of all distinguished labels in DS is L = { l p | 1 p q } ( q 2 ) . x i U , d ( x i ) L represents the label of sample x i .
Given a decision system DS , assume that a classification task is considered, an equivalence relation over U can be established with d such that IND ( d ) = { ( x i , x j ) U 2 | d ( x i ) = d ( x j ) } . Immediately, U is separated into a set of disjoint blocks such that U / IND ( d ) = { X 1 , X 2 , , X q } . X p U / IND ( d ) ; it is the p-th decision class that contains all samples with label l p . This process is considered to be the information granulation in the field of granular computing [41,42,43,44].
Nevertheless, equivalence relation may be powerless to perform information granulation if conditional features are introduced, mainly because continuous values instead of categorical values are frequently recorded over such type of features. In view of this, various substitutions have been proposed. For instance, fuzzy relation [45,46] induced by kernel function and neighborhood relation [47,48] based on distance function are two widely accepted devices. Both of them are equipped with an advantage of performing information granulation in respect to different scales. The parameter used in these two binary relations is the key to offering multiple scales. Given a decision system DS , δ is a radius such that δ 0 , A A T , and a neighborhood relation over A is
δ A = { ( x i , x j ) U 2 | Δ A ( x i , x j ) δ } ,
in which Δ A ( x i , x j ) is a distance between x i and x j over A.
A higher value of δ will produce a large-sized neighborhood. Conversely, a smaller value of δ will generate a small-sized neighborhood. The detailed formulation of neighborhood is then given by δ A ( x i ) = { x j U | ( x i , x j ) δ A } .
In the field of rough set, one of the important tasks is to approximate the objective by the result of information granulation. Generally speaking, the objectives which should be approximated are decision classes in U / I N D ( d ) . The details of lower and upper approximations which are based on the neighborhood are then shown in the following. Given a decision system DS and a radius δ 0 , A A T , l p L , X p is the p-th decision class related to label l p , and neighborhood lower and upper approximations of X p are
δ ̲ A ( X p ) = { x i U | δ ( x i ) X p } ,
δ ¯ A ( X p ) = { x i U | δ ( x i ) X p } .
Following the above definition, it is not difficult to present the following approximations related to the specific feature d. Given a decision system DS and a radius δ 0 , A A T , and the neighborhood lower and upper approximations of d are
δ ̲ A ( d ) = p = 1 q δ ̲ A ( X p ) ,
δ ¯ A ( d ) = p = 1 q δ ¯ A ( X p ) .

2.2. Neighborhood-Based Measures

2.2.1. Granularity

Information granules with adjustable granularity are becoming one of the most genuine goals of data transformation due to two fundamental reasons: (1) fitting granularity-based granular computing leads to processing that is less time-demanding when dealing with detailed numeric problems; (2) information granules with fitted granularity have emerged as a sound conceptual and algorithmic vehicle because of their way of offering a more overall view of the data to support an appropriate level of abstraction aligned with the nature of specific problems. Thus, granularity has becomes a significant concept, and various models regarding it can be developed and utilized.
Given a pair S = U , R in which U is a finite nonempty set of samples, and R is a binary relation over U, x i U , the R-related set [34] of x i is
R ( x i ) = { x j U : ( x i , x j ) R } .
Given a pair S = U , R , the granularity [49] related to R can be defined as
G R ( U ) = x i U | R ( x i ) | | U | 2 ,
in which | X | is the cardinality of set X.
Following Equation (7), it is not difficult to see that 0 G R ( U ) 1 . Without loss of generality, the binary relation R can be regarded as one of the most intuitive representations of information granulation over U. The granularity corresponding to the binary relation R then plainly reveals the discriminability of the information granulation results (all R-correlation sets). A smaller G R ( U ) value contains fewer ordered pairs, which means R becomes more discriminative; that is, most samples in U can be distinguished from each other.
Note that δ A mentioned in Equation (1) is also supposed to be a kind of binary relation, which furnishes the possibility of proposing the following concept of granularity based on neighborhood relation. Given a decision system DS and a radius δ 0 , A A T , the δ A based granularity can be defined as follows:
G δ ( A , U ) = x i U | δ A ( x i ) | | U | 2 .
Granularity characterizes the inherent performance of information granulation from the perspective of the distinguishability of features. However, it should be emphasized that the labels of samples do not participate in the process of feature evaluation, which may bring some potential limitations to subsequent learning tasks. In view of this, a classical measure called conditional entropy can be considered.

2.2.2. Conditional Entropy

The conditional entropy is another important measure corresponding to neighborhood rough set, which can characterize the discriminating performance of A A T with respect to d. Thus far, various forms of conditional entropy [50,51,52,53] have been proposed in respect to different requirements. A special form which is widely used is shown below.
Given a decision system DS , A A T , δ is a radius such that δ 0 , and the conditional entropy [54] of d with respect to A is defined as follows:
C E δ ( A , d ) = 1 U x i U | δ A ( x i ) [ x i ] d | · log | δ A ( x i ) [ x i ] d | | δ A ( x i ) | .
Obviously, 0 C E δ ( A , d ) | U | / e holds. A lower value of conditional entropy represents a better discrimination performance of A. Immediately, A , B A T . Supposing A B , we then have C E δ ( A , d ) C E δ ( B , d ) ; that is, the conditional entropy monotonously decreases with the increasing scale of A.

2.3. Feature Reduction

In the field of rough set, one of the most significant tasks is to abandon redundant or irrelevant conditional features, which can be considered to be feature reduction. Various measures have been utilized to construct corresponding constraints with respect to different requirements [55,56], with various feature reduction approaches subsequently being explored. A general form of feature reduction presented by Yao et al. [57] is introduced as follows.
Given a decision system DS , A A T , δ is a radius such that δ 0 , C ρ -constraint is a constraint based on the measure which is related to radius δ , and A is referred to as a C ρ -based qualified feature subset ( C ρ -reduct) if and only if the following conditions are satisfied:
  • A meets the C ρ -constraint;
  • B A , B does not meet the C ρ -constraint.
It is not difficult to observe that A is actually a optimal and minimal subset of A T and satisfies the C ρ -constraint. For the purpose of achieving such a subset, various search strategies have been proposed. For example, an efficient searching strategy named forward greedy searching is widely accepted, whose core process is to evaluate all candidate features and select qualified features according to some measure and corresponding constraint. Based on such a strategy, it is possible for us to determine which feature should be added into or removed from A. For achieving more details about forward greedy searching strategy, the readers can refer to [58].

3. The Construction of a Parallel Selector for Feature Reduction

3.1. Isotonic Regression

In the field of statistical analysis, isotonic regression [59,60] has become a typical topic of statistical inference. For instance, concerning the medical clinical trial, it can be assumed that as the dose of a drug increases, so to does its efficacy and toxicity. However, the estimation of the ratio of patient toxicity at each dose level may be inaccurate; that is, the probability of toxicity at the corresponding dose level may not be a nondecreasing function with respect to the dose level, which will prevent the statistically observation of the average reaction of patients with the increase in drug dosage. In view of this, isotonic regression can then be employed in revealing the variation rule of clinical data. Generally, given a nonempty and finite set θ = { θ 1 , θ 2 , , θ m } , an ordering relation “⪯” over θ can be defined as follows.
The ordering relation “⪯” is considered as a total-order over θ if and only if the following entries are satisfied:
  • Reflexivity: θ i θ i ( 1 i m ) .
  • Transitivity: θ j θ k , θ i θ j ( 1 i j k m ) .
  • Antisymmetry: θ i , θ j θ , if θ i θ j and θ j θ i , then θ i = θ j .
  • Comparability: θ i , θ j θ , we always have θ i θ j or θ j θ i .
Without loss of generality, the ordering relation “⪰” can be defined in similar way. Specifically, if ordering relation “⪯” or “⪰” with respect to θ satisfies reflexivity, transitivity, and Antisymmetry only, they will be considered as a semiorder. Now we take “⪯” into discussion. Suppose that Θ = { θ = ( θ 1 , θ 2 , , θ m ) T R m | θ 1 θ 2 θ m } , the definition of isotonic function can then be obtained as follows.
Given a function Y = ( Y 1 , Y 2 , , Y m ) T such that Y k = Y ( θ k ) which is based on Θ = { θ = ( θ 1 , θ 2 , , θ m ) T R m | θ 1 θ 2 θ m } ; if we have Y 1 Y 2 Y m , then Y is called an isotonic function according to the ordering relation “⪯” over Θ .
Let Θ a l l represent all isotonic functions over Θ such that Θ a l l = { X R m | X 1 X 2 X m } , then we can obtain the following definition of isotonic regression.
Given a function Y = ( Y 1 , Y 2 , , Y m ) T , X * = ( X 1 * , X 2 * , , X m * ) T is the isotonic regression of Y if it satisfies
k = 1 m w k ( Y k X k * ) 2 = min k = 1 m w k ( Y k X k ) 2 ,
in which w = ( w 1 , w 2 , , w m ) T is the weight coefficient and 0 w k 1 .
Following Equation (10), we can observe that X * , i.e., solutions of isotonic regression, can be viewed as the projection of Y onto Θ a l l when given the inner product k = 1 m w k X k Y k . Immediately, an open problem about how to find such a projection is then intuitively revealed. Thus far, various algorithms [61,62] have been proposed to address such issue, the pool adjacent violators algorithm (PAVA) proposed by Ayer et al. [62] is considered to be the most widely utilized version under the situation of total order. The following Algorithm 1 gives us a detailed process of PAVA for obtaining the X * shown in Equation (10).    
Algorithm 1: Pool adjacent violators algorithm (PAVA).
Mathematics 11 02084 i001
For the process of updating values in Algorithm 1, it is not difficult to observe that each Y j should be considered for value correction. In the worst case, if all elements of Y need to be corrected, it follows that the time complexity of Algorithm 1 is O ( m ) . To further facilitate the understanding of the above process, an example will be presented.
Example 1. 
Let us introduce the statistical model through a medical example.
1.
Suppose the dosage in a kind of animal is gradually increased such that
θ 1 θ 2 θ r .
N animals are tested corresponding to dosage θ k ( 1 k r ) , and X ˙ k j means the reaction of the j-th animal regarding the dosage θ k such that
X ˙ k j = 1 , a c t i v e 0 , i n a c t i v e .
P ^ k means that the active proportion at the dosage is θ k , which is usually estimated from sample proportion such that
P ^ k = 1 N j = 1 N X ˙ k j .
2.
Following Equation (11), suppose that P = ( P 1 , P 2 , , P r ) T has the same order such that
P 1 P 2 P r .
N P ^ k follows binomial distribution and the likelihood function of P is
k = 1 r P k N P ^ k ( 1 P k ) N ( 1 P ^ k ) .
Note that when Equation (15) takes Equation (14) as a constraint, Equation (15) can obtain the maximum, i.e., the maximum likelihood estimation(MLE) of P can be obtained. In view of this, P ^ k should have the same order as does Equation (14). If P ^ k does not satisfy Equation (14), P ^ k and P ^ k + 1 should be merged as
P ^ k = P ^ k + 1 = ( N k P ^ k + N k + 1 P ^ k + 1 ) / ( N k + N k + 1 ) .
3.
To give a further explanation, suppose r = 5 ; the specific calculation process is shown in Table 1.
According to Table 1, we know that P ^ k does not satisfy Equation (14), and so then we have
(a) 
0.436 = ( 25 × 0.4 + 14 × 0.5 ) / ( 25 + 14 ) ;
(b) 
0.442 = ( 30 × 0.4 + 22 × 0.5 ) / ( 30 + 22 ) .
Due to 0.436 0.442 , we have P ^ 1 ( 1 ) = P ^ 2 ( 1 ) = 0.436 , P ^ 3 ( 2 ) = P ^ 4 ( 2 ) = P ^ 5 ( 2 ) = 0.442 ; that is, P 1 = P 2 = 0.436 , P 3 = P 4 = P 5 = 0.442 , and so Equation (14) holds, which facilitates the general statistical analysis of the medicine’s effects.

3.2. Isotonic Regression-Based Numerical Correction

It should be emphasized that isotonic regression can be understood as a kind of general framework, which has been demonstrated to be valuable not only in providing inexpensive technical supports for data analysis in the medical field but also in bringing new motivation to other research in the academic community. Correspondingly, by reviewing the relevant contents of two feature measures mentioned in Section 2.2, we find the following: although two measures have been specifically introduced, the statistical correlation between them still lacks explanation. Therefore, an interesting idea is then naturally guided: Can we explore and analyze the statistical laws between these two measures by means of isotonic regression? Moreover, it is not difficult to realize this kind of analysis.
Given a decision system DS , A T = { a k | 1 k m } , and δ is a radius such that δ 0 , a k A T , we then have δ -based granularity G δ ( a k , U ) and conditional entropy C E δ ( a k , d ) . Now, we sort conditional features in ascending order by following their values of granularity such that G r a = { G = ( G 1 , G 2 , , G m ) T R m | G 1 G 2 G m } . In particular, a conditional entropy-based function G c e = ( C E 1 , C E 2 , , C E m ) T according to the same feature order of G r a can be obtained.
Definition 1. 
Given a function G c e = ( C E 1 , C E 2 , , C E m ) T such that C E k = G c e ( G k ) which is based on G r a = { G = ( G 1 , G 2 , , G m ) T R m | G 1 G 2 G m } , if we have C E 1 C E 2 C E m or C E 1 C E 2 C E m , then G c e is called an isotonic function according to the ordering relation“⪯” over G r a .
G r a a l l is employed in representing all isotonic functions over G r a such that G r a a l l = { X R m | X 1 X 2 X m o r X 1 X 2 X m } . Likewise, we can obtain the following definition.
Definition 2. 
Given a function G c e = ( C E 1 , C E 2 , , C E m ) T , G c e * = ( G c e 1 * , G c e 2 * , , G c e m * ) T G r a a l l , is the isotonic regression of ( G c e , w ) if it satisfies
k = 1 m w k ( C E k G c e k * ) 2 = min k = 1 m w k ( C E k X k ) 2 ,
where w = ( w 1 , w 2 , , w m ) T is the weight coefficient, 0 w k 1 .
Similarly, we can still apply the PAVA shown in Section 3.1 to calculate G c e * , which can be denoted by Algorithm 2.
Algorithm 2: Pool Adjacent Violators Algorithm for Feature Measure (PAVA_FM).
Mathematics 11 02084 i002
Following the process of Algorithm 2, the time complexity of Algorithm 2 is similar to that of Algorithm 1, i.e., O ( m ) . Specifically, the time complexity of Algorithm 2 can also be written as O ( | A T | ) because m represents the number of raw features.

3.3. Isotonic Regression-Based Parallel Selection

By reviewing what has been discussed of traditional feature reduction algorithms, we can observe that all candidate features should be evaluated in the process of selecting qualified features, which result in a redundant evaluation process. Additionally, although various measures have been explored and corresponding constraints can be constructed, the fact that the single viewpoint of evaluation is underabundant and the reducts derived by single constraint are relatively unstable should not be forgetting. Rather, how to explore the relevant resolution with respect to the above issues becomes significantly urgent. In view of this, motivated by Section 3.2, we introduce the framework for feature reduction.
  • Calculate the granularity of each conditional feature in turn, sort these features in ascending order by granularity value, and record the original location index of sorted features.
  • Based on 1, calculate the conditional entropy of each sorted feature according to the recorded location index.
  • Based on 2, obtain the isotonic regression of conditional entropy according to Definitions (12) and (13). Inspired by Example 1, we group features through updated conditional entropy; that is, features with the same value of conditional entropy are placed into one group. Assume that the number of groups is N g [ 1 , m ] , and m is the number of raw features.
  • Based on 3, when N g becomes too large, i.e., N g approaches m, the grouping mechanism is obviously meaningless. To prevent this from happening, we propose a mechanism to reduce the number of groups. That is, to begin with G r o u p 1 , calculate D-value D V i ( 1 i N g 1 ) between G r o u p i and G r o u p i + 1 (the value of a group means the corrected value of the conditional entropy of features in the group arrived via isotonic regression), obtain the sum of all D-values such that D V s u m = D V 1 + + D V i + + D V N g 1 , and calculate the mean D-value M e a n D of D V s u m . To begin with G r o u p 1 again, if D V i < M e a n D , merge G r o u p i with G r o u p i + 1 .
The main contributions of the above framework are as follows: (1) different measures can be combined in the form of grouping and (2) a parallel selection mechanism to select features is provided. Furthermore, the related reduction strategy is called isotonic regression-based fast feature reduction (IRFFR), and the process of IRFFR is a follows: (1) from G r o u p 1 to G r o u p N g / 2 , select features with the minimum of granularity in each group and put them into a reduct pool; (2) from group G r o u p N g / 2 + 1 to G r o u p N g , select features with the minimum of original conditional entropy in each group and put them into a reduct pool.
Based on the above discussion, further details of IRFFR are shown in Algorithm 3.
Algorithm 3: Isotonic Regression-based Fast Feature Reduction (IRFFR).
Mathematics 11 02084 i003
Obviously, different from what occurs in the greedy-based forward searching strategy, the raw features are added in groups. Notably, IRFFR offers a pattern of parallel selection which can reduce corresponding time consumption greatly.
The time complexity of IRFFR mainly comprises three components: obtaining the isotonic regression feature sequence and dividing features into groups in Steps 2 to 6; in the worst case, all features in A T are required to be queried, and then the scanning times for feature sorting is | A T | + ( | A T 1 | ) + ( | A T 2 | ) + + 1 , i.e., | A T | · ( | A T | 1 ) 2 , the time complexity of such a phase is O ( | A T | 2 ) , where A T is raw conditional features over D S . Updating the number of groups in Steps 7 to 21, in the worst case, considering N g < | A T | holds, the time complexity of such a phase can be ignored. Selecting feature from groups in Steps 22–27 requires N g n e w times ( N g n e w < | A T | ), and the time complexity of such a phase can also be ignored. Therefore, in general, the time complexity of IRFFR is O ( m 2 ) . It is worth noting that the time complexity of the forward greedy searching strategy is O ( n 2 · m 2 ) [16]. From this point of view, the efficiency of feature reduction can be improved.
Example 2. 
The following example of data which contains 12 samples and 11 features is given to further explain Algorithm 3; all samples are classified into four categories by d (see Table 2).
1.
For each feature, we have G δ ( f 1 , U ) = 0.0667 , G δ ( f 2 , U ) = 0.6583 , G δ ( f 3 , U ) = 0.3655 , G δ ( f 4 , U ) = 0.0679 , G δ ( f 5 , U ) = 0.0417 , G δ ( f 6 , U ) = 0.1917 , G δ ( f 7 , U ) = 0.3155 , G δ ( f 8 , U ) = 0.2155 , G δ ( f 9 , U ) = 0.1750 , G δ ( f 10 , U ) = 0.5083 , G δ ( f 11 , U ) = 0.4917 .
2.
Sort features in ascending order such that G 1 = G δ ( f 5 , U ) , G 2 = G δ ( f 1 , U ) , G 3 = G δ ( f 4 , U ) , G 4 = G δ ( f 9 , U ) , G 5 = G δ ( f 6 , U ) , G 6 = G δ ( f 8 , U ) , G 7 = G δ ( f 7 , U ) , G 8 = G δ ( f 3 , U ) , G 9 = G δ ( f 11 , U ) , G 10 = G δ ( f 10 , U ) , G 11 = G δ ( f 2 , U ) .
3.
Calculate the corresponding conditional entropy such that C E 1 = 2.8666 , C E 2 = 2.1805 , C E 3 = 3.1327 , C E 4 = 2.4333 , C E 5 = 2.3917 , C E 6 = 2.0387 , C E 7 = 1.3463 , C E 8 = 1.2591 , C E 9 = 0.9972 , C E 10 = 1.7841 , C E 11 = 1.1557 .
4.
The isotonic regression of G c e is G c e * = ( 2.8666 , 2.6566 , 2.6566 , 2.4125 , 2.4125 , 1.6925 , 1.6925 , 1.3468 , 1.3468 , 1.3468 , 1.1557 ) T , then G r o u p 1 = { f 5 } , G r o u p 2 = { f 1 , f 4 } , G r o u p 3 = { f 9 , f 6 } , G r o u p 4 = { f 8 , f 7 , f 10 } , G r o u p 5 = { f 3 , f 11 } , G r o u p 6 = { f 2 } .
5.
D V 1 = 0.21 , D V 2 = 0.2441 , D V 3 = 0.72 , D V 4 = 0.3457 , D V 5 = 0.1911 , M e a n D = 0.3422 , we then have G r o u p n e w 1 = { f 5 , f 1 , f 4 } , G r o u p n e w 2 = { f 9 , f 6 } , G r o u p n e w 3 = { f 8 , f 7 , f 10 } , G r o u p n e w 4 = { f 3 , f 11 , f 2 } .
6.
For G r o u p n e w 1 G r o u p n e w 2 , we put f 5 and f 9 into a reduct pool; for G r o u p n e w 3 G r o u p n e w 4 , we put f 10 and f 2 into a reduct pool. That is, we have the final reduct { f 2 , f 5 , f 9 , f 10 } .

4. Experiments

4.1. Datasets

To demonstrate the effectiveness of our proposed framework for feature reduction, the 25 UCI data sets were used to conduct the experiments. The following Table 3 shows the details of these data sets.
During the experiments, each dataset participates in the calculation in the form of a two-dimensional table. Specifically, the “rows” of these tables represent “samples”, and the number of rows reveals how many samples participate in the calculation; the “columns” of these tables represent different features of samples, and the number of columns reveals how many features a sample has.
It is worth noting that in practical applications, data perturbation is sometimes inevitable. Therefore, when data perturbation occurs, it is necessary to investigate immunity of the proposed algorithm. In our experiments, the label noise is used to generate data perturbation. Specifically, the perturbated labels are used to inject into raw labels, and if the perturbation ratio is given as β % , the injection is realized by randomly selecting β % number of samples and injecting white Gaussian noise(WGN) [63] into their labels. It should be emphasized that excessive WGN ratio of raw labels will lead to the data losing their original semantics. From this point of view, the experimental results may be meaningless. Thus, in the following experiments, to better observe the performance of our proposed algorithm in response to the increasing noise ratio of raw labels, we conduct 4 WGN ratios such that 10 % , 20 % , 30 % and 40 % .

4.2. Experimental Configuration

In the context of this experiment, the neighborhood rough set is constructed by 20 different neighborhood radii such that δ = 0.02 , 0.04 , , 0.4 . Moreover, 10-fold cross-validation [64] is applied to the calculation of each reduct, whose details are as follows: (1) each data set is randomly partitioned into two groups with the same size of samples, with the first group being regarded as the testing samples and the second group being regarded as the training samples; (2) the set of training samples is further partitioned into 10 groups with the same size such that U 1 , U 2 , , U 10 , and for the first round of computation, U 2 , U 3 , , U 10 is combined such that U 2 U 3 U 10 , which is used to derive reduct, with derived reduct then being used to predict the labels of testing samples; ; for the last round of computation, U 1 U 2 U 9 is used to derive the reduct. In the same way, the derived reduct is used to predict the labels of the testing samples.
All experiments were carried out on a personal computer with Windows 10 and an Intel Core i9-10885H CPU (2.40 GHz) with 16.00 GB memory. The programming language used was MATLAB R2017b.

4.3. The First Group of Experiments

In the first group of experiments, to perform IRFFR, Algorithm 3 is employed in conducting the final reducts. Based on the final reducts, we verifies the effectiveness of our IRFFR by comparing it with six state-of-art feature reduction methods from three aspects: classification accuracy, classification stability, and elapsed time. It is worth noting that the comparative method “Ensemble Selector for Attribute Reduction (ESAR)” is based on the ensemble [65] framework. The comparative methods are as follows:
  • Knowledge Change Rate(KCR) [33].
  • Forward Greedy Searching(FGS) [39].
  • Self Information(SI) [32].
  • Attribute Group(AG) [66].
  • Ensemble Selector for Attribute Reduction(ESAR) [40].
  • Novel Fitness Evaluation-based Feature Reduction(NFEFR) [67].

4.3.1. Comparison of Classification Accuracy

The index called classification accuracy was employed to measure the classification performance of the seven algorithms. Two classic classifiers named KNN (K-nearest neighbor, K = 3) [68], CART (classification and regression tree) [69], and SVM (support vector machine) [70] were employed to reflect the classification performance. Generally, given a decision system DS , assuming that the set U is divided into z (Note that as 10-folds cross-validation was employed in this experiment, z = 10 holds) groups which are disjointed and with the same size, i.e., U 1 , , U τ , , U z ( 1 τ z ). The classification accuracy related to reduct r e d τ ( r e d τ is the reduct derived over U U τ ) which is
A c c r e d τ = | { x r e d τ | P r e r e d τ ( x ) = d ( x ) } | | U τ | ,
in which P r e r e d τ ( x ) is the predicted label of x through employing reduct r e d τ .
The mean values of the classification accuracies are the radar charts shown in Figure 1, Figure 2 and Figure 3, in which four different colors are used to represent four different ratios of noisy labels.
Based on the specific experimental results expressed by Figure 1, Figure 2 and Figure 3, the following becomes apparent.
  • For most of data sets, no matter which ratio of label noise is injected into the raw data, compared with six popular algorithms, the predictions generated through the reducts derived by our IRFFR possess superiorities. The essential reason is that the feature sequence regarding granularity is helpful for selecting out more stable features. In the example of “Parkinson Multiple Sound Recording” (ID-10, Figure 1j)’, all classification accuracies of IRFFR over four label noise ratios are greater than 0.6; in contrast, when the label noise ratio reaches 20%, 30%, and 40%, all classification accuracies of the six comparative algorithms are less than 0.6. Moreover, for some data sets, no matter which classifier is adopted, the classification accuracies regarding our IRFFR are greatly superior to the six comparative algorithms. The essential reason is that diverse evaluations do bring more qualified features out. With the example of “QSAR Biodegradation” (ID-12, Figure 1l and Figure 2l)’, in KNN, all classification accuracies of IRFFR are greater than 0.66 over four label noise ratios; in contrast, the classification accuracies of all comparative algorithms are less than 0.66 over these noise ratios. In CART, all classification accuracies of IRFFR are greater than 0.67 over four label noise ratios; in contrast, the classification accuracies of all comparative algorithms are less than 0.67 over these noise ratios. In SVM, with “Sonar” (ID-14, Figure 3n) as an example, all classification accuracies of IRFFR are greater than 0.76 over four label noise ratios; in contrast, the classification accuracies of all comparative algorithms are around 0.68 over these noise ratios. Therefore, it can be observed that our IRFFR can derive the reducts with outstanding classification accuracy.
  • For most data sets, a higher label noise ratio led to a negative impact on the classification accuracies of all seven algorithms. In other words, with the increase in the label noise ratio ( β increases from 10 to 40), the classification accuracies of all seven algorithms show a significant decrease, which can be seen in Figure 1, Figure 2 and Figure 3. With “Twonorm” (ID-20, Figure 1t and Figure 2t)’ as an example, the increase of β does discriminate the stripes with different colors. However, it should be noted that for some data sets, such as “LSVT Voice Rehabilitation” (ID-8, Figure 1h and Figure 2h) and “SPECTF Heart” (ID-15, Figure 1o and Figure 2o) and “QSAR Biodegradation” (ID-12, Figure 3l, the changes in these figures are quite unexpected, which can be attributed to a higher label noise ratio leading to the lower stability of the classification results. Furthermore, for some data sets, such as “Diabetic Retinopathy Debrecen” (ID-4, Figure 1d and Figure 2d)’, “Parkinson Multiple Sound Recording” (ID-10, Figure 1j, Figure 2j and Figure 3j)’, and “Statlog” (Vehicle Silhouettes) (ID-17, Figure 1q and Figure 2q)’, the increasing label noise ratio does not have a significant effect on the classification accuracies of our IRFFR. In other words, compared with other algorithms, our IRFFR has a better antinoise ability.

4.3.2. Comparison of Classification Stability

In this subsection, the classification stability [40] is discussed, which was obtained over different classification results with respect to all seven algorithms. Similar to the classification accuracy, all experimental results are based on the CART, KNN, and SVM classifiers. Given a decision system DS , suppose that the set U is divided into z (10-folds cross-validation is employed; S thus, z = 10 ) groups which are disjoint and with the same size such that U 1 , , U τ , , U z ( 1 τ z ). Then, the classification stability related to reduct r e d τ ( r e d τ is the reduct derived over U U τ ), which is
S t a b c l a s s i f i c a t i o n = 2 z · ( z 1 ) τ = 1 z 1 τ = τ + 1 z E x a ( r e d τ , r e d τ ) ,
in which E x a ( r e d τ , r e d τ ) represents the agreement of the classification results and can be defined based on Table 4.
In Table 4, P r e r e d τ ( x ) means the predicted label of x obtained by r e d τ . ψ 1 , ψ 2 , ψ 3 , and ψ 4 represents the number of samples meeting the corresponding conditions in Table 4. Following this, E x a ( r e d τ , r e d τ ) is
E x a ( r e d τ , r e d τ ) = ψ 1 + ψ 4 ψ 1 + ψ 2 + ψ 3 + ψ 4 .
It should be emphasized that the index of classification stability describes the degree of deviation of the predicted labels if data perturbation occurs. A higher value of classification stability indicates that the predicted labels are more stable, i.e., the corresponding reduct has better quality. As for what follows, the mean values of the classification stabilities are shown in Figure 4, Figure 5 and Figure 6.
Based on the experimental results reported in Figure 4, Figure 5 and Figure 6, it is not difficult to conclude the following.
  • For most of data sets, regardless of which ratio of label noise was injected into raw data, compared with six popular algorithms, the classification stabilities of the reducts derived by our IRFFR were not the greatest out-performers in SVM; rather, the classification stabilities in KNN and CART were superior. Especially, for some data sets, the predictions conducted by the reducts of our IRRFR obtained absolute dominance. With “Musk” (Version 1)(ID-9, Figure 4i and Figure 5i) as an example, regarding KNN and CART, the classification stabilities of our IRFFR are respectively greater than 0.66 and 0.65; S in contrast, the classification stabilities of the six comparative algorithms are only around 0.56 and 0.58. Therefore, it can be observed that by introducing the new grouping mechanism proposed in Section 3.3, from the viewpoint of both stability and accuracy, our IRFFR is effective in improving the classification performance.
  • Following Figure 4 and Figure 5, similar to the classification accuracy, we can also observe that a higher ratio does have a negative impact on the classification stability. Moreover, the classification stability regarding our IRFFR has superior antinoise ability that is similar to that of the classification accuracy. With “Parkinson Multiple Sound Recording” (ID-10, Figure 4j and Figure 5j)” and “QSAR Biodegradation” (ID-12, Figure 4l and Figure 5l)” as examples, although the increasing ratio of label noise was injected into the raw data, the classification stabilities corresponding to our IRFFR over four different label noise ratios do not show dramatic change.

4.3.3. Comparison of Elapsed Time

In this section, the elapsed time for deriving reducts by employing different approaches are compared. The detailed results are shown in the following Table 5 and Table 6. The bold texts indicate the optimal method for each row.
With a deep investigation of Table 5 and Table 6, it is not difficult to arrive at the following conclusions.
  • The time consumption for selecting features by our IRFFR was much less than that of all the comparative algorithms. The essential reason is that IRFFR can reduce the searching space for candidate features, which indicates that our IRFFR has superior efficiency. With the “Wine quality” (ID-24, Table 5)” data set as an example, if β = 10 , the time consumption to obtain the reducts of IRFFR, KCR, FGS, SI, AG, ESAR, and NFEFR are 2.5880, 59.9067, 1480.2454, 19.2041, 9.1134, 10.0511, and 98.9501 s, respectively. Our IRFFR requires only 2.5880 s.
  • It should be pointed out that for IRFFR and FGS, the time consumption has the largest difference. With “Pen-Based Recognition of Handwritten Digits” (ID-11) as an example, the elapsed time of our IRFFR over four different noisy label ratios are 25.2519, 17.5276, 16.1564, and 19.5612 s respectively; in contrast, the elapsed time of the FGS over four different noisy label ratios are 1083.2167, 4033.1561, 4023.1564, and 4057.2135 s, respectively. Therefore, the mechanism of parallel selection can significantly improve the efficiency in selecting features.
  • With the increase in the label noise ratio, the elapsed time of seven different algorithms express different change tendencies. For example, when β increases from 10 to 20, all elapsed time according to seven algorithms over the data set of “Breast Cancer Wisconsin” (Diagnostic) (ID-1) show a downward tendency. However, when β is 30, the case is quite different, as is the case when β is 40. That is, some algorithms require more time for reduct construction. In addition, we can observe that for the average elapsed time, the change of six comparative algorithms is gradual. On the contrary, the elapsed time of our IRFFR shows a clear descending trend. Therefore, the increase in the noisy label ratio does not significantly affect the time consumption of our IRFFR.
To further show the superiority of our IRFFR, the values of speed-up ratio are further presented in Table 7 and Table 8.
Following Table 7 and Table 8, it is not difficult to observe that in the comparison with the other six famous devices, not only are all the values of speed-up ratio with respect to four different noisy labels over 25 data sets much higher than 35%, but all average values of the speed-up ratio exceed 45%. Therefore, our IRFFR does possess the ability to accelerate the process of deriving reducts. Moreover, the Wilcoxon signed rank test [71] was also used to compare the algorithms. As can be analyzed from the experimental results, the p-values derived from our IRFFR and other six devices are all 8.8574 × 10 5 please use scientific notations throughout the text. which are obviously far less than 0.05. In addition, it can be reasonably conjectured that there exists a tremendous difference between our IRFFR and the other six state-of-the-art devices in terms of efficiency; therefore, the obtained p-values reach the lower bound of Matlab.
On the whole, the conclusion of that our proposed IRFFR does possess a significant advantage in time efficiency as seen by comparison with the other six algorithms can finally be obtained.

4.4. The Second Group of Experiments

In the second group of experiments, to verify the performance of IRFFR, two famous accelerators regarding feature reduction were employed to a conduct a comparison with our framework.
  • Quick Random Sampling for Attribute Reduction (QRSAR) [58].
  • Dissimilarity-Based Searching for Attribute Reduction (DBSAR) [72].

4.4.1. Comparison of Elapsed Time

In this section, the elapsed time derived from all feature reduction algorithms are compared. Table 9, Table 10 and Table 11 show the mean values of the different elapsed time obtained over 25 datasets.
With an in-depth analysis of Table 9, Table 10 and Table 11, it is not difficult to obtain the following conclusions.
  • Compared with those of other advanced accelerators, the time consumptions for deriving the final reduct of our IRFFR were considerably superior, meaning the mechanism of grouping and parallel selection does improve the efficiency of selecting features. In other words, our IRFFR substantially reduces the time needed to complete the process of selecting features. With the data set “Pen-Based Recognition of Handwritten Digits” (ID-11)’ as an example, when β = 10 , the elapsed time of the three algorithms are 16.2186, 76.8057, and 99.5899 s, respectively. Moreover, regarding three other ratios ( β = 20 , 30 , 40 ), the elapsed time also shows great differences.
  • With the examples of both IRFFR and QRSAR, the change of ratio does not bring distinct oscillation to the elapsed time of our IRFFR. The essential reason for this is that the mechanism of the diverse evaluation is especially significant for the selection of more qualified features if data perturbation occurs. However, this mechanism does not exist in QRSAR, which may results in some abnormal changes to QRSAR. For instance, when β changes from 10 to 30, the elapsed times of QRSAR for “Twonorm” (ID-20) are 50.6085, 35.4501, and 42.5415 s, respectively.
  • Although our IRFFR is not faster than the two comparative algorithms in all cases, the speed-up ratios related to elapsed time of IRFFR are all higher than 40%. This is mainly because IRFFR selects the qualified features in parallel; that is, IRFFR places the optimal feature at a specific location in each group, and the final feature subset is then derived. From this point of view, QRSAR and DBSAR are more complicated than IRFFR.

4.4.2. Comparison of Classification Performances

In this section, the classification performances of the selected features with respect to three feature reduction approaches are examined. The classification accuracies and classification stabilities are recorded Table 12, Table 13, Table 14, Table 15, Table 16 and Table 17. Note that the classifiers are KNN, CART and SVM.
Observing Table 12, Table 13, Table 14, Table 15, Table 16 and Table 17, it is not difficult to draw the following conclusions.
  • Compared with QRSAR and DBSAR, when β = 10 , in KNN, our IRFFR achieves slightly superior rising rates of classification accuracy such that 2.36% and 0.59% (see Table 12). With the increase of β , the advantage of our IRFFR is gradually revealed. For instance, when β = 20 , regarding the KNN classifier, the rising rates of classification accuracy with respect to the comparative algorithms are 6.93% and 4.49%, respectively, which shows a significant increase. The essential reason is that the granularity has been introduced into our framework, the corresponding feature sequence is achieved, and the final subset is then relatively stable. Although the rising rates of QRSAR and DBSAR are slightly lower when β increases from 30 to 40, compared with the case of lower ratio of β , i.e., β = 10 , our IRFFR does yield great success.
  • Different from classification accuracy, regardless of which label noise ratio is injected and which classifier employed, the classification stabilities of our IRFFR show steady improvement (see Table 15, Table 16 and Table 17). Specifically, if β = 40 , concerning all three classifiers, all rising rates of average classification stabilities exceed 5.0 %. Such an improvement is especially significant in a higher label noise ratio because diverse evaluation is helpful for deriving a more stable reduct, and our IRFFR can then posses a better classification performance if data perturbation occurs.
In addition, Table 18 and Table 19 show the counts of wins, ties, and losses regarding the classification stabilities and accuracies in the different classifiers. As has been reported in in [73], the number of wins in s the datasets obeys the normal distribution N ( s 2 , s 2 ) under the null hypothesis in the sign test for a given learning algorithm. We assert that the IRFFR is significantly better than are those under the significance level α , when the number of wins is at least s 2 + Z α 2 × s 2 . In our experiments, s = 25 , α = 0.1 , then s 2 + Z α 2 × s 2 17 . This implies that our IRFFR will achieve statistical superiority if the number of wins and ties over 25 datasets reaches 17.
Considering the above discussions, we can clearly conclude that our IRFFR can not only accelerate the process of deriving reducts but can also provide qualified reducts with better classification performance.

5. Conclusions

In this study, considering the predictable shortcomings of the application of a single feature measure, we developed a novel parallel selector which includes the following: (1) the evaluation of features from diverse viewpoints and (2) a reliable paradigm which can be used for improving the effectiveness and efficiency of final selected features. Therefore, the additive time consumption with respect to incremental evaluation is then reduced. Different from previous devices which only consider single measure-based constraints for deriving qualified reducts, our selector pays considerable attention to the pattern of fusing different measures for attaining reducts with better generalization performance. Furthermore, It is worth emphasizing that our new selector can be seen as an effective framework which can be easily combined with other recent measures and other acceleration strategies. The results of the persuasive experiments and the corresponding analysis strongly prove the superiority of our selector.
Many follow-up comparison studies can be proposed on the basis of our strategy, with the items warranting further exploration being the following.
  • It should not be ignored that the problems caused by multilabeling have aroused extensive discussion in the academic community. Therefore, it is urgent to further introduce the proposed method to dimension reduction problems with multilabel distributed data sets.
  • The type of data perturbation considered in this paper involves only the aspect of the label. Therefore, can simulate other data perturbation forms, such as injecting feature noise [74], to make the proposed algorithm more robust.

Author Contributions

Conceptualization, Z.Y. and Y.F.; methodology, J.C.; software, Z.Y.; validation, Y.F., P.W. and J.C.; formal analysis, J.C.; investigation, J.C.; resources, P.W.; data curation, Z.Y.; writing—original draft preparation, Z.Y.; writing—review and editing, J.C.; visualization, J.C.; supervision, P.W.; project administration, P.W.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (no. 62076111), the Key Research and Development Program of Zhenjiang-Social Development (grant no. SH2018005), the Industry-School Cooperative Education Program of the Ministry of Education (grant no. 202101363034, and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (no. SJCX22_1905).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gui, N.; Ge, D.; Hu, Z. AFS: An attention-based mechanism for supervised feature selection. Proc. AAAI Conf. Artif. Intell. 2019, 33, 3705–3713. [Google Scholar] [CrossRef]
  2. Li, K.; Wang, F.; Yang, L.; Liu, R. Deep feature screening: Feature selection for ultra high-dimensional data via deep neural networks. Neurocomputing 2023, 538, 126186. [Google Scholar] [CrossRef]
  3. Chen, C.; Weiss, S.T.; Liu, Y.Y. Graph Convolutional Network-based Feature Selection for High-dimensional and Low-sample Size Data. arXiv 2022, arXiv:2211.14144. [Google Scholar] [CrossRef] [PubMed]
  4. Xiao, Z.; Xu, X.; Xing, H.; Song, F.; Wang, X.; Zhao, B. A federated learning system with enhanced feature extraction for human activity recognition. Knowl.-Based Syst. 2021, 229, 107338. [Google Scholar] [CrossRef]
  5. Constantinopoulos, C.; Titsias, M.K.; Likas, A. Bayesian feature and model selection for Gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1013–1018. [Google Scholar] [CrossRef]
  6. Chen, J.; Stern, M.; Wainwright, M.J.; Jordan, M.I. Kernel feature selection via conditional covariance minimization. Adv. Neural Inf. Process. Syst. 2017, 30, 6946–6955. [Google Scholar]
  7. Zhang, X.; Yao, Y. Tri-level attribute reduction in rough set theory. Expert Syst. Appl. 2022, 190, 116187. [Google Scholar] [CrossRef]
  8. Gao, Q.; Ma, L. A novel notion in rough set theory: Invariant subspace. Fuzzy Sets Syst. 2022, 440, 90–111. [Google Scholar] [CrossRef]
  9. Jiang, Z.; Liu, K.; Yang, X.; Yu, H.; Fujita, H.; Qian, Y. Accelerator for supervised neighborhood based attribute reduction. Int. J. Approx. Reason. 2020, 119, 122–150. [Google Scholar] [CrossRef]
  10. Liu, K.; Yang, X.; Yu, H.; Fujita, H.; Chen, X.; Liu, D. Supervised information granulation strategy for attribute reduction. Int. J. Mach. Learn. Cybern. 2020, 11, 2149–2163. [Google Scholar] [CrossRef]
  11. Kar, B.; Sarkar, B.K. A Hybrid Feature Reduction Approach for Medical Decision Support System. Math. Probl. Eng. 2022, 2022, 3984082. [Google Scholar] [CrossRef]
  12. Sun, L.; Zhang, J.; Ding, W.; Xu, J. Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted K-nearest neighbors. Inf. Sci. 2022, 593, 591–613. [Google Scholar] [CrossRef]
  13. Sun, L.; Wang, X.; Ding, W.; Xu, J. TSFNFR: Two-stage fuzzy neighborhood-based feature reduction with binary whale optimization algorithm for imbalanced data classification. Knowl.-Based Syst. 2022, 256, 109849. [Google Scholar] [CrossRef]
  14. Xia, Z.; Chen, Y.; Xu, C. Multiview pca: A methodology of feature extraction and dimension reduction for high-order data. IEEE Trans. Cybern. 2021, 52, 11068–11080. [Google Scholar] [CrossRef]
  15. Su, Z.g.; Hu, Q.; Denoeux, T. A distributed rough evidential K-NN classifier: Integrating feature reduction and classification. IEEE Trans. Fuzzy Syst. 2020, 29, 2322–2335. [Google Scholar] [CrossRef]
  16. Ba, J.; Liu, K.; Ju, H.; Xu, S.; Xu, T.; Yang, X. Triple-G: A new MGRS and attribute reduction. Int. J. Mach. Learn. Cybern. 2022, 13, 337–356. [Google Scholar] [CrossRef]
  17. Liu, K.; Yang, X.; Yu, H.; Mi, J.; Wang, P.; Chen, X. Rough set based semi-supervised feature selection via ensemble selector. Knowl.-Based Syst. 2019, 165, 282–296. [Google Scholar] [CrossRef]
  18. Li, Z.; Kamnitsas, K.; Glocker, B. Analyzing overfitting under class imbalance in neural networks for image segmentation. IEEE Trans. Med. Imaging 2020, 40, 1065–1077. [Google Scholar] [CrossRef]
  19. Park, Y.; Ho, J.C. Tackling overfitting in boosting for noisy healthcare data. IEEE Trans. Knowl. Data Eng. 2019, 33, 2995–3006. [Google Scholar] [CrossRef]
  20. Ismail, A.; Sandell, M. A Low-Complexity Endurance Modulation for Flash Memory. IEEE Trans. Circuits Syst. II Express Briefs 2021, 69, 424–428. [Google Scholar] [CrossRef]
  21. Wang, P.X.; Yao, Y.Y. CE3: A three-way clustering method based on mathematical morphology. Knowl.-Based Syst. 2018, 155, 54–65. [Google Scholar] [CrossRef]
  22. Tang, Y.J.; Zhang, X. Low-complexity resource-shareable parallel generalized integrated interleaved encoder. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 69, 694–706. [Google Scholar] [CrossRef]
  23. Ding, W.; Nayak, J.; Naik, B.; Pelusi, D.; Mishra, M. Fuzzy and real-coded chemical reaction optimization for intrusion detection in industrial big data environment. IEEE Trans. Ind. Inform. 2020, 17, 4298–4307. [Google Scholar] [CrossRef]
  24. Jia, X.; Shang, L.; Zhou, B.; Yao, Y. Generalized attribute reduct in rough set theory. Knowl.-Based Syst. 2016, 91, 204–218. [Google Scholar] [CrossRef]
  25. Ju, H.; Yang, X.; Yu, H.; Li, T.; Yu, D.J.; Yang, J. Cost-sensitive rough set approach. Inf. Sci. 2016, 355, 282–298. [Google Scholar] [CrossRef]
  26. Qian, Y.; Liang, J.; Pedrycz, W.; Dang, C. An efficient accelerator for attribute reduction from incomplete data in rough set framework. Pattern Recognit. 2011, 44, 1658–1670. [Google Scholar] [CrossRef]
  27. Ba, J.; Wang, P.; Yang, X.; Yu, H.; Yu, D. Glee: A granularity filter for feature selection. Eng. Appl. Artif. Intell. 2023, 122, 106080. [Google Scholar] [CrossRef]
  28. Gong, Z.; Liu, Y.; Xu, T.; Wang, P.; Yang, X. Unsupervised attribute reduction: Improving effectiveness and efficiency. Int. J. Mach. Learn. Cybern. 2022, 13, 3645–3662. [Google Scholar] [CrossRef]
  29. Jiang, Z.; Liu, K.; Song, J.; Yang, X.; Li, J.; Qian, Y. Accelerator for crosswise computing reduct. Appl. Soft Comput. 2021, 98, 106740. [Google Scholar] [CrossRef]
  30. Chen, Y.; Wang, P.; Yang, X.; Mi, J.; Liu, D. Granular ball guided selector for attribute reduction. Knowl.-Based Syst. 2021, 229, 107326. [Google Scholar] [CrossRef]
  31. Qian, W.; Xiong, C.; Qian, Y.; Wang, Y. Label enhancement-based feature selection via fuzzy neighborhood discrimination index. Knowl.-Based Syst. 2022, 250, 109119. [Google Scholar] [CrossRef]
  32. Wang, C.; Huang, Y.; Shao, M.; Hu, Q.; Chen, D. Feature selection based on neighborhood self-information. IEEE Trans. Cybern. 2019, 50, 4031–4042. [Google Scholar] [CrossRef] [PubMed]
  33. Jin, C.; Li, F.; Hu, Q. Knowledge change rate-based attribute importance measure and its performance analysis. Knowl.-Based Syst. 2017, 119, 59–67. [Google Scholar] [CrossRef]
  34. Qian, Y.; Liang, J.; Pedrycz, W.; Dang, C. Positive approximation: An accelerator for attribute reduction in rough set theory. Artif. Intell. 2010, 174, 597–618. [Google Scholar] [CrossRef]
  35. Hu, Q.; Yu, D.; Xie, Z.; Li, X. EROS: Ensemble rough subspaces. Pattern Recognit. 2007, 40, 3728–3739. [Google Scholar] [CrossRef]
  36. Liu, K.; Li, T.; Yang, X.; Yang, X.; Liu, D.; Zhang, P.; Wang, J. Granular cabin: An efficient solution to neighborhood learning in big data. Inf. Sci. 2022, 583, 189–201. [Google Scholar] [CrossRef]
  37. Pashaei, E.; Pashaei, E. Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data. Neural Comput. Appl. 2023, 35, 353–374. [Google Scholar] [CrossRef]
  38. Tang, Y.; Su, H.; Jin, T.; Flesch, R.C.C. Adaptive PID Control Approach Considering Simulated Annealing Algorithm for Thermal Damage of Brain Tumor During Magnetic Hyperthermia. IEEE Trans. Instrum. Meas. 2023, 72, 1–8. [Google Scholar] [CrossRef]
  39. Hu, Q.; Pedrycz, W.; Yu, D.; Lang, J. Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2009, 40, 137–150. [Google Scholar]
  40. Yang, X.; Yao, Y. Ensemble selector for attribute reduction. Appl. Soft Comput. 2018, 70, 1–11. [Google Scholar] [CrossRef]
  41. Niu, J.; Chen, D.; Li, J.; Wang, H. A dynamic rule-based classification model via granular computing. Inf. Sci. 2022, 584, 325–341. [Google Scholar] [CrossRef]
  42. Yang, X.; Li, T.; Liu, D.; Fujita, H. A temporal-spatial composite sequential approach of three-way granular computing. Inf. Sci. 2019, 486, 171–189. [Google Scholar] [CrossRef]
  43. Han, Z.; Huang, Q.; Zhang, J.; Huang, C.; Wang, H.; Huang, X. GA-GWNN: Detecting anomalies of online learners by granular computing and graph wavelet convolutional neural network. Appl. Intell. 2022, 52, 13162–13183. [Google Scholar] [CrossRef]
  44. Xu, K.; Pedrycz, W.; Li, Z. Granular computing: An augmented scheme of degranulation through a modified partition matrix. Fuzzy Sets Syst. 2022, 440, 131–148. [Google Scholar] [CrossRef]
  45. Rao, X.; Liu, K.; Song, J.; Yang, X.; Qian, Y. Gaussian kernel fuzzy rough based attribute reduction: An acceleration approach. J. Intell. Fuzzy Syst. 2020, 39, 679–695. [Google Scholar] [CrossRef]
  46. Yang, B. Fuzzy covering-based rough set on two different universes and its application. Artif. Intell. Rev. 2022, 55, 4717–4753. [Google Scholar] [CrossRef]
  47. Sun, L.; Wang, T.; Ding, W.; Xu, J.; Lin, Y. Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification. Inf. Sci. 2021, 578, 887–912. [Google Scholar] [CrossRef]
  48. Chen, Y.; Yang, X.; Li, J.; Wang, P.; Qian, Y. Fusing attribute reduction accelerators. Inf. Sci. 2022, 587, 354–370. [Google Scholar] [CrossRef]
  49. Liang, J.; Shi, Z. The information entropy, rough entropy and knowledge granulation in rough set theory. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2004, 12, 37–46. [Google Scholar] [CrossRef]
  50. Xu, J.; Yang, J.; Ma, Y.; Qu, K.; Kang, Y. Feature selection method for color image steganalysis based on fuzzy neighborhood conditional entropy. Appl. Intell. 2022, 52, 9388–9405. [Google Scholar] [CrossRef]
  51. Sang, B.; Chen, H.; Yang, L.; Li, T.; Xu, W. Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets. IEEE Trans. Fuzzy Syst. 2021, 30, 1683–1697. [Google Scholar] [CrossRef]
  52. Américo, A.; Khouzani, M.; Malacaria, P. Conditional entropy and data processing: An axiomatic approach based on core-concavity. IEEE Trans. Inf. Theory 2020, 66, 5537–5547. [Google Scholar] [CrossRef]
  53. Gao, C.; Zhou, J.; Miao, D.; Yue, X.; Wan, J. Granular-conditional-entropy-based attribute reduction for partially labeled data with proxy labels. Inf. Sci. 2021, 580, 111–128. [Google Scholar] [CrossRef]
  54. Zhang, X.; Mei, C.; Chen, D.; Li, J. Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy. Pattern Recognit. 2016, 56, 1–15. [Google Scholar] [CrossRef]
  55. Ko, Y.C.; Fujita, H. An evidential analytics for buried information in big data samples: Case study of semiconductor manufacturing. Inf. Sci. 2019, 486, 190–203. [Google Scholar] [CrossRef]
  56. Huang, H.; Oh, S.K.; Wu, C.K.; Pedrycz, W. Double iterative learning-based polynomial based-RBFNNs driven by the aid of support vector-based kernel fuzzy clustering and least absolute shrinkage deviations. Fuzzy Sets Syst. 2022, 443, 30–49. [Google Scholar] [CrossRef]
  57. Yao, Y.; Zhao, Y.; Wang, J. On reduct construction algorithms. Trans. Comput. Sci. II 2008, 100–117. [Google Scholar]
  58. Chen, Z.; Liu, K.; Yang, X.; Fujita, H. Random sampling accelerator for attribute reduction. Int. J. Approx. Reason. 2022, 140, 75–91. [Google Scholar] [CrossRef]
  59. Fokianos, K.; Leucht, A.; Neumann, M.H. On integrated l 1 convergence rate of an isotonic regression estimator for multivariate observations. IEEE Trans. Inf. Theory 2020, 66, 6389–6402. [Google Scholar] [CrossRef]
  60. Wang, H.; Liao, H.; Ma, X.; Bao, R. Remaining useful life prediction and optimal maintenance time determination for a single unit using isotonic regression and gamma process model. Reliab. Eng. Syst. Saf. 2021, 210, 107504. [Google Scholar] [CrossRef]
  61. Balinski, M.L. A competitive (dual) simplex method for the assignment problem. Math. Program. 1986, 34, 125–141. [Google Scholar] [CrossRef]
  62. Ayer, M.; Brunk, H.D.; Ewing, G.M.; Reid, W.T.; Silverman, E. An empirical distribution function for sampling with incomplete information. Ann. Math. Stat. 1955, 26, 641–647. [Google Scholar] [CrossRef]
  63. Oh, H.; Nam, H. Maximum rate scheduling with adaptive modulation in mixed impulsive noise and additive white Gaussian noise environments. IEEE Trans. Wirel. Commun. 2021, 20, 3308–3320. [Google Scholar] [CrossRef]
  64. Hu, Q.; Yu, D.; Liu, J.; Wu, C. Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 2008, 178, 3577–3594. [Google Scholar] [CrossRef]
  65. Wu, T.F.; Fan, J.C.; Wang, P.X. An improved three-way clustering based on ensemble strategy. Mathematics 2022, 10, 1457. [Google Scholar] [CrossRef]
  66. Chen, Y.; Liu, K.; Song, J.; Fujita, H.; Yang, X.; Qian, Y. Attribute group for attribute reduction. Inf. Sci. 2020, 535, 64–80. [Google Scholar] [CrossRef]
  67. Ye, D.; Chen, Z.; Ma, S. A novel and better fitness evaluation for rough set based minimum attribute reduction problem. Inf. Sci. 2013, 222, 413–423. [Google Scholar] [CrossRef]
  68. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
  69. Breiman, L. Classification and Regression Trees; Routledge: Cambridge, MA, USA, 2017. [Google Scholar]
  70. Fu, C.; Zhou, S.; Zhang, D.; Chen, L. Relative Density-Based Intuitionistic Fuzzy SVM for Class Imbalance Learning. Entropy 2022, 25, 34. [Google Scholar] [CrossRef]
  71. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  72. Rao, X.; Yang, X.; Yang, X.; Chen, X.; Liu, D.; Qian, Y. Quickly calculating reduct: An attribute relationship based approach. Knowl.-Based Syst. 2020, 200, 106014. [Google Scholar] [CrossRef]
  73. Cao, F.; Ye, H.; Wang, D. A probabilistic learning algorithm for robust modeling using neural networks with random weights. Inf. Sci. 2015, 313, 62–78. [Google Scholar] [CrossRef]
  74. Xu, S.; Ju, H.; Shang, L.; Pedrycz, W.; Yang, X.; Li, C. Label distribution learning: A local collaborative mechanism. Int. J. Approx. Reason. 2020, 121, 59–84. [Google Scholar] [CrossRef]
Figure 1. Classification accuracies (KNN).
Figure 1. Classification accuracies (KNN).
Mathematics 11 02084 g001
Figure 2. Classification accuracies (CART).
Figure 2. Classification accuracies (CART).
Mathematics 11 02084 g002
Figure 3. Classification accuracies (SVM).
Figure 3. Classification accuracies (SVM).
Mathematics 11 02084 g003
Figure 4. Classification stabilities (KNN).
Figure 4. Classification stabilities (KNN).
Mathematics 11 02084 g004
Figure 5. Classification stabilities (CART).
Figure 5. Classification stabilities (CART).
Mathematics 11 02084 g005
Figure 6. Classification stabilities (SVM).
Figure 6. Classification stabilities (SVM).
Mathematics 11 02084 g006
Table 1. A specific calculation process.
Table 1. A specific calculation process.
k12 345
n k 2514 102022
P ^ k 0.40.5 0.60.30.5
n k ( 1 ) 393022
P ^ k ( 1 ) 0.4360.40.5
n k ( 2 ) 3952
P ^ k ( 2 ) 0.4360.442
Table 2. A Toy Data.
Table 2. A Toy Data.
f 1 f 2 f 3 f 4 f 5 f 6 f 7 f 8 f 9 f 10 f 11 d
x 1 0.33050.45330.02400.02600.13780.04860.11050.19060.41860.24150.2608 l 1
x 2 0.33880.34660.03610.01530.09960.04860.12200.17910.44960.13480.2028 l 1
x 3 0.30570.28000.21680.08430.10290.05550.22110.20600.48830.32580.3623 l 1
x 4 0.33050.14000.17460.03910.05810.13880.25570.08520.40310.07300.5072 l 1
x 5 0.32230.23330.60240.29670.03820.14230.36400.19870.44180.15730.5797 l 2
x 6 0.30570.16670.16860.06590.05480.06940.34330.12990.49610.19660.4202 l 2
x 7 0.33050.33330.01200.02140.10630.02770.02760.18680.49610.19660.2173 l 3
x 8 0.38840.13330.33730.01840.13780.11800.22350.18870.44960.29770.3623 l 3
x 9 0.23140.05330.24090.01380.05810.16310.31560.07880.63560.16850.6376 l 3
x 10 0.19830.38660.28910.00920.03320.09720.15890.04020.47280.09550.6956 l 4
x 11 0.24790.12000.25300.01680.06640.13880.26720.11350.58130.14600.3623 l 4
x 12 0.21480.16000.21080.06440.03480.11450.21880.07880.49610.21340.6521 l 4
Table 3. Data description.
Table 3. Data description.
IDDatasets# Samples# Features# LabelsDomainFeature Type
1Breast Cancer Wisconsin (Diagnostic)569322LifeReal
2Cardiotocography21262110MedicineReal
3Contraceptive Method147393LifeInteger & Real
4Diabetic Retinopathy Debrecen1151192BiologyInteger & Real
5Forest Type Mapping523274GeographyInteger & Real
6Ionosphere351342PhysicalInteger & Real
7Libras Movement3609015N/AReal
8LSVT Voice Rehabilitation1263092LifeReal
9Musk (Version 1)4761682PhysicalInteger
10Parkinson Multiple Sound Recording1208262MedicineReal
11Pen-Based Recognition of Handwritten Digits10,9921610ComputerInteger
12QSAR Biodegradation1055412BiologyInteger & Real
13Quality Assessment of Digital Colposcopies287622LifeReal
14Sonar208602PhysicsReal
15SPECTF Heart267442BiologyReal
16Statlog (Image Segmentation)2310187GeographyReal
17Statlog (Vehicle Silhouettes)846184PhysicalInteger
18Steel Plates Faults1941332PhysicalInteger & Real
19Synthetic Control Chart Time Series600606N/AReal
20Twonorm7400202HistoricalReal
21Urban Land Cover6751479GeographyReal
22Wall-Following Robot Navigation5456244ComputerReal
23Website Phishing1353102ComputerInteger
24Wine Quality6497117PhysicalReal
25Wireless Indoor Localization200074ComputerReal
Table 4. Joint distribution of classification results.
Table 4. Joint distribution of classification results.
Pre red τ ( x ) = d ( x ) Pre red τ ( x ) d ( x )
P r e r e d τ ( x ) = d ( x ) ψ 1 ψ 2
P r e r e d τ ( x ) d ( x ) ψ 3 ψ 4
Table 5. The elapsed time of deriving reducts (noisy label ratio of 10% and 20%)(s).
Table 5. The elapsed time of deriving reducts (noisy label ratio of 10% and 20%)(s).
β IDIRFFRKCRFGSSIAGESARNFEFR
β = 10 10.05360.77820.71630.43090.19330.23664.3213
20.344128.875512.18769.90883.23363.791237.8901
30.06880.93081.02160.42890.22930.22862.9642
40.09371.78502.17460.79780.43940.47189.2432
50.02851.16190.65900.75160.20870.26222.8259
60.02070.31100.26390.41310.07760.10741.4439
70.08771.34120.92881.63580.37570.465013.4194
80.06920.94970.95420.75420.28870.321010.4971
90.16623.30973.07872.28360.81850.954456.8696
100.13162.81133.01061.33940.61090.770616.3750
1125.2519251.15374083.2167139.325180.3628111.3182864.9137
120.15823.78134.25921.73900.83260.980228.2335
130.01760.39750.34120.44150.13020.15091.8344
140.02050.35440.31530.31230.08450.11211.4849
150.01990.36560.30790.17350.09120.11741.6289
160.359737.41849.75425.03822.61342.837926.8831
170.05061.57410.93030.55770.25690.30813.6622
180.134411.45367.11144.71262.37474.121122.7714
190.214416.15643.16143.84652.46175.154549.2311
2018.5196174.6279838.624888.315656.751477.2156557.3168
210.308520.28975.99095.22052.85394.602076.5475
222.730783.4015167.897739.846611.035414.2905263.3195
230.06080.58490.84420.29170.17190.16672.9786
242.588059.9067148.245419.20419.113410.051198.9501
250.12481.71582.41560.98290.59930.47568.1322
AVE2.064928.2174255.936513.15017.04849.580486.5495
β = 20 10.04520.59090.54240.30480.14880.19323.3916
20.343126.103911.06419.65483.04253.567435.6682
30.06940.91991.00030.45120.22550.22172.8017
40.09311.69362.08750.72220.41990.43129.0950
50.02920.88070.51020.72460.16620.20432.2947
60.02020.29560.24490.41470.06960.09791.3670
70.07361.39630.84921.48920.28400.381112.5079
80.06610.90410.85490.77620.27010.29839.9450
90.16503.22862.98902.43500.79400.916055.3699
100.13082.25092.44611.10420.52760.611413.6189
1117.5276268.28134033.1561142.813483.4129121.3421871.3185
120.16413.26253.67971.25140.72450.860425.0494
130.00280.41470.27340.38570.10750.06351.9477
140.01990.33600.30260.32730.07780.09891.4706
150.01960.34360.29940.15200.08310.10291.5284
160.366617.29109.84116.21312.50542.653526.5666
170.05151.55930.92360.54820.23090.29573.5708
180.134811.46587.06044.68482.30964.094822.8890
190.213816.22873.02043.77832.35235.112248.4254
2012.1955170.19891922.189180.746651.593070.7186558.6663
210.308519.94186.209312.25432.86794.250977.9080
222.750078.1211152.913740.253311.809613.1762242.1188
230.06290.56960.85560.29180.16950.16712.8427
244.145581.0404165.818929.702813.071014.3930159.9506
250.11271.61592.37840.93810.53370.41577.8566
AVE1.564528.3574253.260413.69677.11199.786787.9268
Table 6. The elapsed time of deriving reducts (noisy label ratio of 30% and 40%)(s).
Table 6. The elapsed time of deriving reducts (noisy label ratio of 30% and 40%)(s).
β IDIRFFRKCRFGSSIAGESARNFEFR
β = 30 10.04740.59900.54940.27360.13600.18673.5187
20.337425.003410.61299.87912.81903.420732.9282
30.06840.89960.94260.44940.22000.21262.6851
40.09661.65002.09180.73030.39520.39708.8716
50.02930.89550.53940.74720.15500.20052.2598
60.02110.30430.26110.42640.07370.10261.3970
70.07321.43370.79811.47890.23500.286812.3809
80.07150.94600.92890.35750.27450.313010.0617
90.17463.38343.11072.73930.80360.963357.5430
100.13192.14732.37230.97700.47890.594213.1255
1116.1564267.49834023.1564138.498380.9853119.2531870.4638
120.16423.19123.57691.33670.63140.815924.4300
130.00590.32920.22290.33930.01790.03442.0627
140.02000.32180.27300.33490.07550.09351.3827
150.02060.34010.28870.14740.07410.09561.4982
160.357116.858610.03746.88242.41722.532726.8768
170.05191.44220.85710.58230.22000.27713.3948
180.137211.38847.07114.61522.19144.100622.8787
190.205516.18672.96793.67062.25345.115947.4915
2010.4985168.68511901.993381.516658.684475.3100540.0020
210.309519.82916.43645.25142.76273.710381.5410
222.834176.7897152.569542.36499.930612.9020237.2487
230.05860.53110.80190.26880.16350.15632.5599
244.327780.8440171.611829.123013.070914.2647161.9921
250.10001.66252.22420.82860.44910.41818.1859
AVE1.451928.1264252.251813.35287.18079.830387.0712
β = 40 10.04770.57500.51430.26590.12710.17333.2591
20.341624.364310.40609.72292.64693.313232.6536
30.06920.87520.96550.43410.21960.21252.6402
40.09361.53871.93160.69980.34330.36518.2427
50.02970.89270.50930.75160.15520.20402.2548
60.02010.28530.23430.32210.06340.08561.3058
70.05911.36330.75131.45810.20670.169712.6555
80.06780.88980.81150.33150.26060.28949.5256
90.18713.44833.35492.66390.84150.986759.1489
100.14202.29632.50920.97480.48250.639313.6530
1119.5612277.15664057.2135144.235680.9851119.3544871.1983
120.18143.14533.55661.34750.59050.800723.6727
130.00290.31440.19550.23470.00450.06952.0031
140.02270.32320.27360.34190.07450.09391.3458
150.01960.31470.26920.13520.07180.08851.4129
160.588523.779518.624210.08133.32443.661945.9315
170.04971.43680.85280.53970.21220.25523.3861
180.125511.45356.98744.60262.12674.112723.1883
190.194416.15172.96653.54032.24405.057446.6051
209.7851160.53441855.453178.210159.511171.1651520.6933
210.311818.68136.17235.15302.60323.389776.1348
222.780973.5303151.615541.79678.967812.0370232.5254
230.05880.50550.82700.28310.16720.15662.3377
244.387280.4337172.575430.021813.416814.1845167.2978
250.09081.66172.23390.76340.46510.35797.0799
AVE1.568728.2381252.072213.55657.20459.649086.8061
Table 7. The speed-up ratio related to the elapsed time of obtaining reducts (noisy label ratios of 10% and 20%).
Table 7. The speed-up ratio related to the elapsed time of obtaining reducts (noisy label ratios of 10% and 20%).
β IDIRFFR & KCRIRFFR & FGSIRFFR & SIIRFFR & AGIRFFR & ESARIRFFR & NFEFR
β = 10 10.93110.92520.87560.72270.77350.9876
20.98810.97180.96530.89360.90920.9909
30.92610.93270.83960.70000.69900.9768
40.94750.95690.88260.78680.80140.9899
50.97550.95680.96210.86340.89130.9899
60.93340.92160.94990.73320.80730.9857
70.93460.90560.94640.76660.81140.9935
80.92710.92750.90820.76030.78440.9934
90.94980.94600.92720.79690.82590.9971
100.95320.95630.90170.78460.82920.9920
110.89950.99380.81880.68580.77320.9708
120.95820.96290.90900.81000.83860.9944
130.95570.94840.96010.86480.88340.9904
140.94220.93500.93440.75740.81710.9862
150.94560.93540.88530.78180.83050.9878
160.99040.96310.92860.86240.87330.9866
170.96790.94560.90930.80300.83580.9862
180.98830.98110.97150.94340.96740.9941
190.98670.93220.94430.91290.95840.9956
200.89390.99040.79030.67370.76020.9668
210.98480.94850.94090.89190.93300.9960
220.96730.98370.93150.75260.80890.9896
230.89610.92800.79160.64630.63530.9796
240.95680.99830.86520.71600.74250.9738
250.92730.94830.87300.79180.73760.9847
AVE0.94910.95180.90450.78810.82110.9872
β = 20 10.92350.91670.85170.69620.76600.9867
20.98690.96900.96450.88720.90380.9904
30.92460.93060.84620.69220.68700.9752
40.94500.95540.87110.77830.78410.9898
50.96680.94280.95970.82430.85710.9873
60.93170.91750.95130.70980.79370.9852
70.94730.91330.95060.74080.80690.9941
80.92690.92270.91480.75530.77840.9934
90.94890.94480.93220.79220.81990.9970
100.94190.94650.88150.75210.78610.9904
110.93470.99570.87730.78990.85560.9799
120.94970.95540.86890.77350.80930.9934
130.99320.98980.99270.97400.95590.9986
140.94080.93420.93920.74420.79880.9865
150.94300.93450.87110.76410.80950.9872
160.97880.96270.94100.85370.86180.9862
170.96700.94420.90610.77700.82580.9856
180.98820.98090.97120.94160.96710.9941
190.98680.92920.94340.90910.95820.9956
200.92830.99370.84900.76360.82750.9782
210.98450.95030.97480.89240.92740.9960
220.96480.98200.93170.76710.79130.9886
230.88960.92650.78440.62890.62360.9779
240.94880.97500.86040.68280.71200.9741
250.93030.95260.87990.78880.72890.9857
AVE0.95090.95060.90860.78720.81740.9879
Table 8. The speed-up ratio related to the elapsed time of obtaining reducts (noisy label ratios of 30% and 40%).
Table 8. The speed-up ratio related to the elapsed time of obtaining reducts (noisy label ratios of 30% and 40%).
β IDIRFFR & KCRIRFFR & FGSIRFFR & SIIRFFR & AGIRFFR & ESARIRFFR & NFEFR
β = 30 10.92090.91370.82680.65150.74610.9865
20.98650.96820.96580.88030.90140.9898
30.92400.92740.84780.68910.67830.9745
40.94150.95380.86770.75560.75670.9891
50.96730.94570.96080.81100.85390.9870
60.93070.91920.95050.71370.79430.9849
70.94890.90830.95050.68850.74480.9941
80.92440.92300.80000.73950.77160.9929
90.94840.94390.93630.78270.81870.9970
100.93860.94440.86500.72460.77800.9900
110.93960.99600.88330.80050.86450.9814
120.94850.95410.87720.73990.79870.9933
130.98210.97350.98260.67040.82850.9971
140.93780.92670.94030.73510.78610.9855
150.93940.92860.86020.72200.78450.9863
160.97880.96440.94810.85230.85900.9867
170.96400.93940.91090.76410.81270.9847
180.98800.98060.97030.93740.96650.9940
190.98730.93080.94400.90880.95980.9957
200.93780.99450.87120.82110.86060.9806
210.98440.95190.94110.88800.91660.9962
220.96310.98140.93310.71460.78030.9881
230.88970.92690.78200.64160.62510.9771
240.94650.97480.85140.66890.69660.9733
250.93980.95500.87930.77730.76080.9878
AVE0.95030.94910.90180.76310.80580.9877
β = 40 10.91700.90730.82060.62470.72480.9854
20.98600.96720.96490.87090.89690.9895
30.92090.92830.84060.68490.67440.9738
40.93920.95150.86620.72740.74360.9886
50.96670.94170.96050.80860.85440.9868
60.92950.91420.93760.68300.76520.9846
70.95660.92130.95950.71410.65170.9953
80.92380.91650.79550.73980.76570.9929
90.94570.94420.92980.77770.81040.9968
100.93820.94340.85430.70570.77790.9896
110.92940.99520.86440.75850.83610.9775
120.94230.94900.86540.69280.77340.9923
130.99080.98520.98760.35560.95830.9986
140.92980.91700.93360.69530.75830.9831
150.93770.92720.85500.72700.77850.9861
160.97530.96840.94160.82300.83930.9872
170.96540.94170.90790.76580.80530.9853
180.98900.98200.97270.94100.96950.9946
190.98800.93450.94510.91340.96160.9958
200.93900.99470.87490.83560.86250.9812
210.98330.94950.93950.88020.90800.9959
220.96220.98170.93350.68990.76900.9880
230.88370.92890.79230.64830.62450.9748
240.94550.97460.85390.67300.69070.9738
250.94540.95940.88110.80480.74630.9872
AVE0.94920.94900.89910.74160.79780.9874
Table 9. The elapsed time of deriving reducts (label noise ratio of 10% to 20%)(s).
Table 9. The elapsed time of deriving reducts (label noise ratio of 10% to 20%)(s).
ID β = 10 β = 20
IRFFRQRSARDBSARIRFFRQRSARDBSAR
10.06280.17690.33370.05210.13240.2892
20.36142.41252.91800.34072.62812.7269
30.05600.13060.29000.11770.12680.2862
40.09280.29870.56970.14530.30360.5502
50.06160.17440.32320.05650.19170.1507
60.05100.17120.19320.01550.16400.1852
70.41651.64511.25640.38271.61131.2226
80.07520.28140.35450.08020.16430.3359
90.15610.62011.24690.20570.59561.2224
100.13430.47870.89310.13000.39540.8098
1116.218676.805799.589917.516751.8558125.6400
120.14990.63120.86400.21790.52310.7559
130.31561.26151.05610.38871.33461.1292
140.03620.07710.08910.02420.08520.0824
150.02030.07900.18900.05430.09530.1809
160.37122.23133.01150.36552.12332.9035
170.04960.21820.38560.07410.19220.3596
181.20153.16452.91311.14253.10552.8541
190.21560.91151.21980.22940.92531.2336
2011.770250.608578.104312.233135.450172.9459
210.32132.67172.66610.38022.68572.6801
222.779110.072317.72852.75876.846518.5027
230.07470.13820.53560.10730.13580.5332
242.58128.54609.72984.202112.503613.6874
250.12520.66670.58710.19280.73430.6547
AVE1.50796.578910.00191.65664.996410.0769
Table 10. The elapsed time of deriving reducts (label noise ratio of 30% to 40%)(s).
Table 10. The elapsed time of deriving reducts (label noise ratio of 30% to 40%)(s).
ID β = 30 β = 40
IRFFRQRSARDBSARIRFFRQRSARDBSAR
10.08870.11960.27640.07330.11070.2675
20.40252.40462.50340.41112.23252.3313
30.09820.12130.28070.07900.12090.2803
40.13270.27890.52550.08070.22700.4736
50.10230.18050.13950.03390.18070.1397
60.07080.16810.18930.02550.15780.1790
70.32201.55061.16190.37361.60221.2135
80.10980.16870.34030.09200.15480.3264
90.23610.60521.23200.16840.64311.2699
100.19980.34670.76110.21170.35030.7647
1116.216549.4282123.212419.570949.428123.2122
120.24310.43000.66280.18110.38910.6219
130.28931.23521.02980.29761.24351.0381
140.03010.08290.08010.01200.08190.0791
150.09070.08630.17190.02420.08400.1696
160.38992.03512.81530.66802.94233.7225
170.08940.18130.34870.08020.17350.3409
181.12153.08452.83311.03302.99602.7446
190.23950.93541.24370.29700.99291.3012
2010.501342.541580.03739.771343.368280.8640
210.37602.58052.57490.32252.42102.4154
222.86214.967516.62372.79414.004715.6609
230.12310.12980.52720.11490.13350.5309
244.406712.503513.68734.377412.849414.0332
250.13700.67850.59890.03730.57880.4992
AVE1.55525.073810.15431.64525.098710.1792
Table 11. The speed-up ratio related to the elapsed time of obtaining reducts (label noise ratios of 10% to 40%).
Table 11. The speed-up ratio related to the elapsed time of obtaining reducts (label noise ratios of 10% to 40%).
ID β = 10 β = 20 β = 30 β = 40
IRFFR vs. QRSARIRFFR vs. DBSARIRFFR vs. QRSARIRFFR vs. DBSARIRFFR vs. QRSARIRFFR vs. DBSARIRFFR vs. QRSARIRFFR vs. DBSAR
10.64500.81180.60650.81980.25840.67910.33790.7260
20.85020.87610.87040.87510.83260.83920.81590.8237
30.57120.80690.07180.58870.19040.65020.34660.7182
40.68930.83710.52140.73590.52420.74750.64450.8296
50.64680.80940.70530.62510.43320.26670.81240.7573
60.70210.73600.90550.91630.57880.62600.83840.8575
70.74680.66850.76250.68700.79230.72290.76680.6921
80.73280.78790.51190.76120.34910.67730.40570.7181
90.74830.87480.65460.83170.60990.80840.73810.8674
100.71940.84960.67120.83950.42370.73750.39570.7232
110.78880.83710.66220.86060.67190.86840.60410.8412
120.76250.82650.58340.71170.43470.63320.53460.7088
130.74980.70120.70880.65580.76580.71910.76070.7133
140.53050.59370.71600.70630.63690.62420.85350.8483
150.74300.89260.43020.6998-0.05100.47240.71190.8573
160.83360.87670.82790.87410.80840.86150.77300.8206
170.77270.87140.61450.79390.50690.74360.53780.7647
180.62030.58760.63210.59970.63640.60410.65520.6236
190.76350.82320.75210.81400.74400.80740.70090.7717
200.76740.84930.65490.83230.75320.86880.77470.8792
210.87970.87950.85840.85810.85430.85400.86680.8665
220.72410.84320.59710.85090.42380.82780.30230.8216
230.45950.86050.20990.79880.05160.76650.13930.7836
240.69800.73470.66390.69300.64760.67800.65930.6881
250.81220.78670.73740.70550.79810.77120.93560.9253
AVE0.71830.80090.63720.76540.54700.71420.63650.7851
Table 12. The KNN classification accuracies (label noise ratio of 10% to 40%).
Table 12. The KNN classification accuracies (label noise ratio of 10% to 40%).
ID β = 10 β = 20 β = 30 β = 40
IRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSAR
10.89840.91540.91590.92010.92670.91460.88730.90360.89830.90050.85340.8503
20.55010.52610.51210.50740.41440.39270.51280.34970.32910.48400.31370.2751
30.40330.43060.43010.41860.42740.41750.40180.41220.41190.41110.40320.4044
40.56650.55440.55220.58840.53730.53890.56740.52800.52780.58660.53180.5328
50.74210.55160.74010.73610.67950.63570.75320.47250.54760.75400.75480.7990
60.75980.78200.83080.81830.78050.74240.76960.75260.65070.73410.77000.7345
70.83100.82180.80490.84370.78500.85000.82500.78430.78170.80080.76800.7418
80.74370.68970.68150.71330.71210.73470.66090.69170.64190.66850.65290.6795
90.77440.78130.76250.73340.58490.57240.75080.58550.57520.73460.57950.5710
100.63410.59450.64300.62120.54620.54310.60990.54230.53820.60580.53680.5221
110.80110.79880.78500.70150.71450.71020.66320.69880.67120.67420.68680.6403
120.70300.74800.69990.71340.72040.74510.61830.62760.60980.61560.61970.6387
130.88120.90220.91540.79010.64040.86110.77610.57830.84880.71560.57180.7685
140.68480.65520.69110.64850.61450.64030.56520.56580.57650.58800.56520.5815
150.72950.71780.71610.68860.68980.68320.80410.77330.77100.74010.73020.7368
160.76040.70710.83580.76180.56410.71490.75070.44320.67250.70280.68660.5991
170.59070.56390.53590.57120.47400.48040.54630.41620.41260.51700.38250.3773
180.83650.82940.81610.78140.78800.82240.76170.76630.75250.72960.72600.7261
190.65630.59590.63400.61780.58030.57930.65120.66030.63460.61390.55070.5957
200.83160.88100.84540.79880.82700.81360.77800.77330.77900.72800.73020.7080
210.60580.50300.53750.55690.48560.56610.53260.44160.48720.55250.46750.4905
220.81160.73510.71320.76840.64470.51090.75920.56650.44910.76310.53150.4636
230.76510.87040.87180.74610.84950.85110.78650.82070.82840.73150.76240.7642
240.43540.44090.41650.40230.41470.41770.43510.40630.41000.41180.38140.3787
250.58290.57790.59050.59940.54220.57700.58890.54200.52820.57150.51870.5205
AVE0.70320.68700.69910.68190.63770.65260.67020.60410.61330.65340.60300.6040
↑ 2.36%↑ 0.59% ↑ 6.93%↑ 4.49%↑ 10.94%↑ 9.28% ↑ 8.36%↑ 8.18%
↑ indicates that the performance of IRFFR is better than the comparative method; ↓ indicates that the performance of IRFFR is worse than the comparative method.
Table 13. The CART classification accuracies (label noise ratio of 10% to 40%).
Table 13. The CART classification accuracies (label noise ratio of 10% to 40%).
ID β = 10 β = 20 β = 30 β = 40
IRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSAR
10.89080.90940.90930.90830.93000.92910.89130.91140.91090.86430.89280.8923
20.58990.63230.62390.54840.51410.49420.55280.43520.44460.52810.40280.3634
30.42040.44010.44090.41530.42750.42820.41400.41580.41210.41500.40350.4037
40.57990.56620.57360.58360.55070.56500.57420.52960.53510.58200.52310.5368
50.75900.58120.78930.74930.54410.73170.72400.51850.66370.66960.51040.6359
60.76580.79550.81700.78500.80380.74550.78440.80160.70910.81930.79530.7384
70.83180.81560.80110.83980.78090.84530.81940.78170.78230.79940.76260.7423
80.72010.67490.67310.68010.65130.63630.68930.71290.65390.70690.62810.6575
90.73660.62950.65670.71850.62810.63220.64510.64150.63630.62520.63140.6366
100.60120.61120.61170.59460.56660.56210.61130.55940.54000.56540.54880.5247
110.91660.92560.91020.90060.85600.86100.85710.79740.80820.81440.76240.7833
120.71320.65530.64130.65220.61070.57830.61790.60230.57360.55630.56320.5613
130.78200.89600.91160.78620.63630.85640.77050.57570.84940.71420.56640.7690
140.72770.64990.64960.69170.60280.65420.65140.56720.58840.68920.57090.5962
150.72160.71170.71800.70560.70040.70240.77160.76040.78840.73250.71740.7316
160.79210.76140.91910.81020.68930.88520.79000.56580.88190.76820.50400.8148
170.64670.63970.60250.61460.60260.60130.59960.52470.52670.60300.51740.4734
180.80730.82320.81230.77750.78390.81770.75610.76370.75310.72820.72060.7266
190.65710.58970.63020.61390.57620.57460.64560.65770.63520.61250.54530.5962
200.89710.89710.93240.86440.84410.85120.85340.81360.84690.83700.78740.8169
210.71230.63620.66970.68730.62560.70170.68110.59900.65780.69050.61850.6744
220.91350.94100.92610.89360.88270.59170.90240.85160.54190.89400.81200.5406
230.80410.83480.83500.77100.77510.77300.76390.68530.68630.70710.64160.6545
240.46750.49590.47600.47110.47520.47810.46650.45530.45650.44250.42040.4246
250.58370.57170.58670.59550.53810.57230.58330.53940.52880.57010.51330.5210
AVE0.72150.70740.72470.70630.66380.68270.69670.64270.65640.67740.61440.6326
↑ 1.99%↓ 0.044% ↑ 6.40%↑ 3.46% ↑ 8.40%↑ 6.14% ↑ 10.25%↑ 7.08%
Table 14. The SVM classification accuracies (label noise ratio of 10% to 40%).
Table 14. The SVM classification accuracies (label noise ratio of 10% to 40%).
ID β = 10 β = 20 β = 30 β = 40
IRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSAR
10.88720.89600.89170.90580.92120.92080.88020.92070.92300.86860.86070.8559
20.62410.64750.65520.60730.58480.58000.58550.55120.55360.58720.56140.5597
30.60800.58730.59040.59090.57180.57140.55700.54080.54020.56870.54080.5402
40.56990.56340.57380.58190.53180.56050.56580.51520.50890.56270.50570.5033
50.74800.81470.79520.73780.65090.71890.73200.64390.63340.73750.62830.6016
60.81800.80730.78820.82620.77460.83350.80190.75750.76240.79200.75500.7246
70.64330.58140.61730.60030.56990.56280.62810.63350.61530.60510.53770.5785
80.69320.58330.57940.65320.57570.57170.67280.56600.56470.67510.56220.5598
90.60200.58120.58220.58820.54050.53460.58040.53220.52610.58520.52420.5032
100.83270.83370.81880.80350.79150.78280.77770.74780.75160.72130.70640.6955
110.67030.58800.59960.66880.55260.53990.68510.54850.52360.62880.52210.5029
120.65800.58530.60480.64330.57530.58260.61340.58250.56340.60950.53040.5376
130.73230.71290.73300.72890.75030.74560.72380.73800.75950.70770.71450.7400
140.76820.88770.89870.77260.63000.84460.75300.55150.82950.70680.55880.7513
150.67380.66660.67480.67040.63090.64690.61990.57540.59590.61980.55580.5551
160.79350.81490.79940.76390.77760.80590.73860.73950.73320.72080.71300.7089
170.71120.60520.65680.68950.66470.72090.67900.64180.67210.66460.63600.6697
180.87040.87360.74960.84650.82840.73220.84960.77790.62610.84490.72920.6319
190.85290.81050.81690.79750.70290.69850.75010.60290.59870.67640.56280.5568
200.57600.58690.59080.58140.58240.58750.57570.56830.57730.56870.56910.5638
210.81750.81250.79060.82740.77960.83360.80750.76280.76800.79330.75640.7304
220.76770.89290.90110.77380.63500.84470.75860.55680.83510.70810.56020.7571
230.79300.82010.80180.76510.78260.80600.74420.74480.73880.72210.71440.7147
240.64280.58660.61970.60150.57490.56290.63370.63880.62090.60640.53910.5843
250.56940.56860.57620.58310.53680.56060.57140.52050.51450.56400.50710.5091
AVE0.71690.70830.70820.70440.66070.68600.69140.63830.65350.67380.61410.6254
↑ 1.21%↑ 1.23% ↑ 6.61%↑ 2.68% ↑ 8.32%↑ 5.80% ↑ 9.72%↑ 7.74%
Table 15. The KNN classification stabilities (label noise ratio of 10% to 40%).
Table 15. The KNN classification stabilities (label noise ratio of 10% to 40%).
ID β = 10 β = 20 β = 30 β = 40
IRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSAR
10.89290.90850.91020.91090.88830.88860.87650.90450.89120.89310.83590.8229
20.60870.60480.60710.59330.58200.58910.58030.59470.60070.57960.60900.6326
30.62760.60290.60240.62270.57500.57380.60280.55240.55290.61450.54790.5456
40.60100.58950.59260.58540.52750.53730.58760.53120.52850.57870.51960.5267
50.75180.70380.75720.71990.62400.61560.72380.60670.55550.72540.59260.5493
60.76770.80020.87240.85840.79300.84620.76830.76700.66800.79840.77300.8045
70.82890.81610.79870.83700.78110.84170.82120.78080.77950.79850.75920.7410
80.68490.66210.64350.63290.60130.57750.64010.63850.59510.56370.60010.6031
90.74390.54550.54570.71730.52950.51460.71970.52280.52390.69050.53420.5323
100.65630.59620.59650.60470.52530.51990.60480.52940.53360.61030.53620.5316
110.88850.88210.87470.85430.83600.84020.80290.77630.78020.76200.73580.7104
120.67800.59650.63530.68220.59320.57980.70450.61140.59670.66410.58740.6669
130.77910.89650.90920.78340.63650.85280.77230.57480.84660.71330.56300.7677
140.72310.63450.62620.67510.58480.56670.60230.59460.56080.61290.56060.5369
150.74840.75780.76400.76640.74450.74800.77640.79040.75240.75930.75550.7999
160.75150.82970.65260.73050.58160.69900.71270.58360.66860.65170.60200.6273
170.64550.62650.63790.64490.57290.56110.59970.55720.56800.59500.54690.5591
180.80440.82370.80990.77470.78410.81410.75790.76280.75030.72730.71720.7253
190.65420.59020.62780.61110.57640.57100.64740.65680.63240.61160.54190.5949
200.73960.74430.77580.70180.73650.71530.68900.68890.68620.65790.65010.6387
210.63680.56740.58950.64040.61290.60690.60180.57890.57070.61650.58660.5777
220.76660.70810.71520.71280.61710.71260.70650.58340.60130.70950.56540.6123
230.73970.87930.87790.72650.84170.84910.77370.78790.79810.67630.72300.7206
240.57920.58840.59100.58230.57420.57650.58110.57150.57370.57590.57910.5851
250.58080.57220.58430.59270.53830.56870.58510.53850.52600.56920.50990.5197
AVE0.71520.70110.70390.70250.65030.67060.68950.64340.64560.67020.62130.6373
↑ 2.01%↑ 1.61% ↑ 8.03%↑ 4.76% ↑ 7.17%↑ 6.82% ↑ 7.87%↑ 5.16%
Table 16. The CART classification stabilities (label noise ratio of 10% to 40%).
Table 16. The CART classification stabilities (label noise ratio of 10% to 40%).
ID β = 10 β = 20 β = 30 β = 40
IRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSAR
10.89160.89670.88870.90440.92020.91950.88460.92570.92560.87120.86430.8637
20.62850.64820.65220.60590.58380.57870.58990.55620.55620.58980.56500.5675
30.61240.58800.58740.58950.57080.57010.56140.54580.54280.57130.54440.5480
40.57430.56410.57080.58050.53080.55920.57020.52020.51150.56530.50930.5111
50.75240.81540.79220.73640.64990.71760.73640.64890.63600.74010.63190.6094
60.82240.80800.78520.82480.77360.83220.80630.76250.76500.79460.75860.7324
70.82190.81320.78760.82600.77860.83230.81190.76780.77060.79590.76000.7382
80.64770.58210.61430.59890.56890.56150.63250.63850.61790.60770.54130.5863
90.69760.58400.57640.65180.57470.57040.67720.57100.56730.67770.56580.5676
100.60640.58190.57920.58680.53950.53330.58480.53720.52870.58780.52780.5110
110.83710.83440.81580.80210.79050.78150.78210.75280.75420.72390.71000.7033
120.67470.58870.59660.66740.55160.53860.68950.55350.52620.63140.52570.5107
130.77210.89360.89810.77240.63400.84340.76300.56180.83770.71070.56380.7649
140.66240.58600.60180.64190.57430.58130.61780.58750.56600.61210.53400.5454
150.73670.71360.73000.72750.74930.74430.72820.74300.76210.71030.71810.7478
160.77260.88840.89570.77120.62900.84330.75740.55650.83210.70940.56240.7591
170.67820.66730.67180.66900.62990.64560.62430.58040.59850.62240.55940.5629
180.79740.82080.79880.76370.78160.80470.74860.74980.74140.72470.71800.7225
190.64720.58730.61670.60010.57390.56160.63810.64380.62350.60900.54270.5921
200.79790.81560.79640.76250.77660.80460.74300.74450.73580.72340.71660.7167
210.71560.60590.65380.68810.66370.71960.68340.64680.67470.66720.63960.6775
220.87480.87430.74660.84510.82740.73090.85400.78290.62870.84750.73280.6397
230.85730.81120.81390.79610.70190.69720.75450.60790.60130.67900.56640.5646
240.58040.58760.58780.58000.58140.58620.58010.57330.57990.57130.57270.5716
250.57380.56930.57320.58170.53580.55930.57580.52550.51710.56660.51070.5169
AVE0.72130.70900.70520.70300.65970.68470.69580.64340.65600.67640.61770.6332
↑ 1.73%↑ 2.28% ↑ 6.56%↑ 2.67% ↑ 8.14%↑ 6.07% ↑ 9.50%↑ 6.82%
Table 17. The SVM classification stabilities (label noise ratio of 10% to 40%).
Table 17. The SVM classification stabilities (label noise ratio of 10% to 40%).
ID β = 10 β = 20 β = 30 β = 40
IRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSARIRFFRQRSARDBSAR
10.89080.91700.91500.90700.93800.92590.89240.91230.91170.86160.89390.8957
20.58990.63990.62960.54710.52210.49100.55390.43610.44540.52540.40390.3668
30.42040.44770.44660.41400.43550.42500.41510.41670.41290.41230.40460.4071
40.57990.57380.57930.58230.55870.56180.57530.53050.53590.57930.52420.5402
50.75900.58880.79500.74800.55210.72850.72510.51940.66450.66690.51150.6393
60.76580.80310.82270.78370.81180.74230.78550.80250.70990.81660.79640.7418
70.72010.68250.67880.67880.65930.63310.69040.71380.65470.70420.62920.6609
80.73660.63710.66240.71720.63610.62900.64620.64240.63710.62250.63250.6400
90.60120.61880.61740.59330.57460.55890.61240.56030.54080.56270.54990.5281
100.91660.93320.91590.89930.86400.85780.85820.79830.80900.81170.76350.7867
110.71320.66290.64700.65090.61870.57510.61900.60320.57440.55360.56430.5647
120.72770.65750.65530.69040.61080.65100.65250.56810.58920.68650.57200.5996
130.72160.71930.72370.70430.70840.69920.77270.76130.78920.72980.71850.7350
140.79210.76900.92480.80890.69730.88200.79110.56670.88270.76550.50510.8182
150.64670.64730.60820.61330.61060.59810.60070.52560.52750.60030.51850.4768
160.89710.90470.93810.86310.85210.84800.85450.81450.84770.83430.78850.8203
170.71230.64380.67540.68600.63360.69850.68220.59990.65860.68780.61960.6778
180.91350.94860.93180.89230.89070.58850.90350.85250.54270.89130.81310.5440
190.80410.84240.84070.76970.78310.76980.76500.68620.68710.70440.64270.6579
200.46750.50350.48170.46980.48320.47490.46760.45620.45730.43980.42150.4280
210.83180.82320.80680.83850.78890.84210.82050.78260.78310.79670.76370.7457
220.78200.90360.91730.78490.64430.85320.77160.57660.85020.71150.56750.7724
230.80730.83080.81800.77620.79190.81450.75720.76460.75390.72550.72170.7300
240.65710.59730.63590.61260.58420.57140.64670.65860.63600.60980.54640.5996
250.58370.57930.59240.59420.54610.56910.58440.54030.52960.56740.51440.5244
AVE0.72150.71500.73040.70500.67180.67950.69780.64350.65720.67470.61550.6360
↑ 0.91%↑ 1.22% ↑ 4.94%↑ 3.75% ↑ 8.44%↑ 6.18% ↑ 9.62%↑ 6.08%
Table 18. Counts of wins, ties, and losses regarding the classification stabilities.
Table 18. Counts of wins, ties, and losses regarding the classification stabilities.
Win/Tie/Loss β = 10 β = 20 β = 30 β = 40
IRFFR vs. QRSARIRFFR vs. DBSARIRFFR vs. QRSARIRFFR vs. DBSARIRFFR vs. QRSARIRFFR vs. DBSARIRFFR vs. QRSARIRFFR vs. DBSAR
KNN(16/0/9)(15/0/10)(22/0/3)(20/0/5)(19/0/6)(21/0/4)(21/0/4)(17/0/8)
CART(17/0/8)(19/0/6)(20/0/5)(15/0/10)(19/0/6)(21/0/4)(23/0/2)(20/0/5)
SVM(12/0/13)(10/0/15)(18/0/7)(16/0/9)(19/0/6)(21/0/4)(22/0/3)(18/0/7)
Table 19. Counts of wins, ties and losses regarding classification accuracies.
Table 19. Counts of wins, ties and losses regarding classification accuracies.
Win/Tie/Loss β = 10 β = 20 β = 30 β = 40
IRFFR vs. QRSARIRFFR vs. DBSARIRFFR vs. QRSARIRFFR vs. DBSARIRFFR vs. QRSARIRFFR vs. DBSARIRFFR vs. QRSARIRFFR vs. DBSAR
KNN(16/0/9)(15/0/10)(16/0/9)(15/0/10)(16/0/9)(18/0/7)(19/0/6)(19/0/6)
CART(13/1/11)(11/0/14)(19/0/6)(16/0/9)(19/0/6)(21/0/4)(22/0/3)(20/0/5)
SVM(15/0/10)(13/0/12)(20/0/5)(15/0/10)(19/0/6)(20/0/5)(23/0/2)(21/0/4)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yin, Z.; Fan, Y.; Wang, P.; Chen, J. Parallel Selector for Feature Reduction. Mathematics 2023, 11, 2084. https://doi.org/10.3390/math11092084

AMA Style

Yin Z, Fan Y, Wang P, Chen J. Parallel Selector for Feature Reduction. Mathematics. 2023; 11(9):2084. https://doi.org/10.3390/math11092084

Chicago/Turabian Style

Yin, Zhenyu, Yan Fan, Pingxin Wang, and Jianjun Chen. 2023. "Parallel Selector for Feature Reduction" Mathematics 11, no. 9: 2084. https://doi.org/10.3390/math11092084

APA Style

Yin, Z., Fan, Y., Wang, P., & Chen, J. (2023). Parallel Selector for Feature Reduction. Mathematics, 11(9), 2084. https://doi.org/10.3390/math11092084

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop