1. Introduction
With the rapid increase in data, attributes have become redundant and uncertain. Uncertainty mainly consists of the five following aspects: incompleteness, inconsistency, incompatibility, fuzziness and randomness. Consequently, finding valuable information from high-dimensional data is a challenge for the research field.
With the intention of effectively disposing of ambiguous, incomplete, and inaccurate data, Polish scholar Pawlak first put forward rough set theory [
1] in 1982, which has been extensively adopted in data mining, pattern recognition, decision analysis [
2,
3,
4], and other domains. Based on rough set theory, many extensions and improvements have been proposed, such as neighborhood rough set [
5], fuzzy rough set [
6], decision-theoretic rough set [
7], and Pythagorean fuzzy set [
8]. Attribute reduction [
9,
10,
11,
12,
13], as a common dimensional reduction method, can efficaciously remove redundant components in information systems, choose the optimal minimum attribute subset and further improve the effectiveness of data knowledge discovery. Obviously, attribute reduction has grown to be a paramount research branch of rough set theory.
Generally speaking, simplified searching strategies can be split into two general classes: exhaustive search and heuristic search [
14]. In the process of data analysis, the final reduction result is directly related to the given constraint which can be implemented by constructing different measurement criteria.
As a mature heuristics-based search, The forward greedy strategy has a wide range of measures [
15,
16,
17], such as approximation quality and conditional entropy [
18,
19,
20,
21]. These measures are especially used for assessing attributes and exporting reduction results. However, in the exploration of attribute reduction, many researchers only take into account the single view measure to determine the constraint. For instance, Jiang et al. [
22] studied the supervised neighborhood attribute reduction; Zhang et al. [
23] investigated a semi-supervised attribute reduction method that combined the collaborative learning theory; and Yuan et al. [
24] introduced a fuzzy complementary entropy measure and proposed an unsupervised attribute reduction algorithm for mixed data. In order to fully consider the diversity of evaluation, multi-view measures are necessary to be proposed in attribute reduction.
The neighborhood rough set provides a flexible granular representation, but it requires determining the size of a neighborhood radius through grid search frequently, which is time-consuming. To overcome this problem, many strategies for determining the radius without parameters have been introduced. As an example, Xia et al. [
25] put forward the concept of granular balls and enhanced the efficacy of a classifier based on granular computing by generating granular balls; Zhou et al. [
26] proposed the concept of a gap neighborhood when resolving the problem of online feature selection, which can automatically determine the neighborhood size according to the distance difference between samples.
Through the above discussion, in order to effectively obtain the salient features of multiple views [
25,
26,
27] and improve the classification performance of attribute reduction, we propose a new strategy in this paper: forward greedy searching to
-reduct based on granular ball. The key of our strategy includes three phases: (1) grouping the samples in the whole universe (the universe is a finite set of all samples) based on generated symmetric granular balls; (2) attribute search for sample groups based on guidance; and (3) attribute reduction based on multiple perspectives. The first stage is to automatically create granular balls in accordance with the distribution of data itself, and merge granular balls which have small number of samples. Therefore, it can realize the division of the universe. In addition, the second stage is guidance-based evaluations, aiming to compress the search space related to candidate attributes. Therefore, the time needed for attribute reduction can be made faster because fewer candidate attributes need to be evaluated. Finally, the third stage blends supervised and unsupervised perspectives [
28] and uses the quality-to-entropy ratio as a measure to attribute reduction. Therefore, it is feasible to identify attributes and labels with accuracy, and quantitatively characterize the uncertainty of data itself [
29,
30].
To sum up, the main contributions of our research are: (1) decreasing the size of samples by grouping different granular balls which adaptively generate; (2) enhancing attribute reduction efficiencies by achieving guidance-based search; and (3) utilizing the quality-to-entropy ratio which combines two perspectives to improve the accuracy of recognizing eligible attributes.
The remainder of this paper is organized as follows.
Section 2 introduces the basic concepts of rough set, granular computing, and attribute reduction.
Section 3 describes the fundamental framework and specific procedures of the new proposed method. Comparative experimental results of datasets and analysis are reported in
Section 4. Finally,
Section 5 is a summary of the algorithm and points for further work.
2. Preliminaries
2.1. Neighborhood Rough Set
Formally, a decision system can be defined as a binary group, expressed as : the universe of discourse is a set of non-empty finite samples; is the set of all conditional attributes; and d is the decision attribute. According to the decision values of all samples, it is not difficult to obtain a partition like which induced by decision attribute d on universe U: ; , is the label of sample . It is especially worth noting that is a relation of equivalence with symmetry, reflexivity, and transitivity. The following definitions are the form of conventional rough sets.
Definition 1. For a given decision system , a given radius , , , the neighborhood of is defined as:in which represents the distance function between sample and with respect to A. Immediately, from Definition 1, it can be obviously known that the size of the generated neighborhood relies on the value of given , i.e., the neighborhood becomes larger as the value of increases.
As the fundamental units of the neighborhood rough set, the specific definitions of upper and lower approximations are given in the following Definition 2.
Definition 2. For a given decision system , , , the lower and upper approximations of X are defined as The neighborhood rough set is built on the foundation of the standard rough set. It can not only deal with complex data, but also possess a multi-granularity structure by giving various radii. However, finding the appropriate radius generally requires a large number of trials or a certain parameter searching strategy, which is very time-consuming.
2.2. Granular Ball Computing
Considering that using the neighborhood relationship takes significant time to obtain the optimal radius, Xia et al. [
31] proposed the concept of granular ball. In rough set theory, granule is the division of a sample set, and granular ball is based on the concept of granule. Xia et al. [
25] regard hyper-ball with a completely symmetrical structure as granular ball.
The granular ball has a straightforward geometric shape with two parameters, i.e., center and radius. Compared with the neighborhood, the granular ball method has higher searching efficiency and robustness. The detailed definitions are as follows.
Definition 3. For a given decision system , , , is a granular ball induced by conditional attribute set A if and only if C is the center point of , r is the average of distances from all samples in the granular ball to C. The C and r of the granular ball are expressed as followsin which indicates the number of samples in the granular ball. In the following, is defined as the set of all granular balls induced by conditional attribute set A on universe U.
Definition 4. For a given decision system , , , is recorded as the overall label of , i.e., is the label corresponding to samples with the same label and maximum proportion in the granular ball.
Definition 5. For a given decision system , , , the average purity of is defined asin which indicates the label of the sample . Furthermore, can be recorded as the mean purity of all granular balls induced by conditional attribute set A.
In the process of generating granular balls, the main idea is using an iterative two-means algorithm. The concrete procedures are given as follows.
(1) Consider the entire universe U as an initial granular ball and set (n is the number of existing granular balls).
(2) Cluster each cluster by the two-means algorithm.
(3) Compute the center point of each cluster and the average distance between each cluster’s samples and the center point.
(4) Obtain the granular ball and calculate the granular ball’s purity.
(5) Traverse all currently existent granular balls; if each granular ball’s purity is below the given threshold, end this step; otherwise, return to (2).
On the basis of the aforementioned method of obtaining granular balls, Xia et al. [
31] further put forward the concept of granular ball rough set, as shown in Definition 6.
Definition 6. For a given decision system , , , according to the conditional attribute set A, the upper and lower approximations of are, respectively, defined as 2.3. Attribute Reduction
Rough set is a powerful tool to handle fuzzy data, and we need to deal with high-dimensional data through attribute reduction. By searching the minimum attribute subset which satisfies the given constraints, attribute reduction can not only reduce the dimension, but also enhance the generalization performance.
To date, various kinds of attribute reduction have been proposed for different requirements [
9,
10,
11,
12,
13,
18,
32], whereas Yao et al. [
33] indicated that the majority of them have analogous structures. There are two mainstream learning perspectives, i.e., supervised learning and unsupervised learning. Then, we pick the approximation quality [
34] and conditional entropy [
19,
35,
36,
37,
38,
39] as two custom measures to better comprehend and investigate the essence of attribute reduction in terms of the neighborhood rough set.
2.3.1. Supervised Attribute Reduction
Supervised attribute reduction refers to the process of screening attributes using given labels in datasets so as to determine the important subsets of attributes which can best distinguish different categories.
Definition 7. For a given decision system and a radius , , the supervised approximation quality of d in terms of A is defined asin which is the cardinality of set X. Apparently, it is not difficult to obtain that holds. The approximation quality reflects the proportion of samples in the lower approximation of the decision class, and it is used to describe the dependency between attributes. Note that by Definition 7, the degree of dependency increases as the value of approximation quality increases. Generally speaking, the majority of samples in U can be told apart from each other.
Definition 8. For a given decision system and a radius , , the supervised conditional entropy of d based on A is defined as It is proven that
holds [
19]. As another important measure of the neighborhood rough set, conditional entropy reflects the discriminating performance of conditional attribute set
A over decision attribute
d. Following Definition 8, it is obvious that, as the value of the conditional entropy decreases, the discrimination of
A relative to
d increases.
Definition 9. For a given decision system and a constraint condition , which is associated with measure ρ on the universe U, , A is deemed as a -reduct if and only if
- (1)
A meets ,
- (2)
, does not meet .
From Definition 9, it is uncomplicated to conclude that A is an ideal and minimal subset which satisfies the constraint condition. Without loss of generality, the constraint is closely related to the used measure. We will discuss it from the two following aspects:
- (1)
If the measure is of approximation quality [
28,
40], the constraint condition may be
;
- (2)
If the measure is conditional entropy [
41], the constraint condition may be
.
2.3.2. Unsupervised Attribute Reduction
As we all know, supervised attribute reduction depends on the labels of samples to a great extent, so it is time-consuming to obtain the labels of samples. However, unsupervised attribute reduction does not need obtain such labels.
In an unsupervised perspective, if approximate quality or conditional entropy is still needed as a measure, how to make labels for samples is an urgent problem. In order to solve the problem, Yang et al. [
42] used the conditional attribute information of samples to construct pseudo-labels. Based on the pseudo-label strategy, it is not arduous to give the following definitions.
Definition 10. For a given unsupervised decision and a radius , , , the unsupervised approximation quality in terms of A is defined asin which is a pseudo-label decision that records conditional attribute a to contain the pseudo-labels of samples. In analogy with Definition 7, apparently holds. The approximate quality in Definition 10 represents the correlation between a set of attributes and a single attribute. Naturally, the higher the value of unsupervised approximation quality is, the greater the degree of such correlation.
Definition 11. For a given unsupervised decision and a radius , , , the unsupervised conditional entropy with respect to A is defined asin which is a derived decision that employs conditional attribute a to contain the pseudo-labels of samples. Similarly to Definition 8, the constant holds in an unsupervised perspective. Undoubtedly, the certainty of the pseudo-label neighborhood judgment system increases as the value of the conditional entropy decreases.
Definition 12. For a given unsupervised decision , a measure ρ and is a constraint condition, , A is deemed as a ρ-reduct if and only if:
- (1)
A meets the constraint ;
- (2)
, does not meet the constraint .
Analogous to Definition 9, the constraint condition determined by will depend on the type of measure. The constraint condition may be if the unsupervised approximation quality is used as a measure; it may be if the unsupervised conditional entropy is used as a measure.
4. Experimental Analysis
4.1. Datasets
We use 16 UCI datasets for verification to demonstrate the effectiveness of our forward greedy searching to
-reduct based on the granular ball (GBFGS-
).
Table 1 provides a thorough explanation of the various datasets.
4.2. Experimental Configuration
All experiments were conducted on a personal computer with Windows 10, Intel Core i7-10510U CPU(2.30 GHz) and 8.00 GB memory. The programming environment is MATLAB R2020a.
In the following experiment, the two-means algorithm was used to iteratively create granular balls,
k-means clustering [
44,
49] was utilized to create pseudo-labels of samples, and quality-to-entropy ratio was the measure used in attribute reduction. It is rather remarkable that the value of
k should be consistent with the number of decision classes in the data. In addition, the result of the neighborhood rough set largely depends on the given radius. In order to demonstrate the applicability and universality of our proposed method, all experiments employed 20 radii with a step size of 0.02, which are 0.02, 0.04, …, 0.40.
Moreover, the deduction simplification process was verified by 10-fold cross-validation. That is to say, for each radius, the samples in universe U were divided into ten groups, i.e., , then, nine of them were used as training groups and the rest was used as the test group. Repeat 10-fold cross-validation process for 10 times to ensure each group serves as a test group, so as to test the classification performance and obtain a reliable and stable model.
Finally, we used K-nearest neighbor (KNN, K = 3) [
50,
51], support vector machine (SVM) [
52] and classification and regression tree (CART) [
53] to compare our proposed method with six progressive algorithms in terms of attribute reduction as well as with the algorithm without applying any attribute reduction methods (No-reduct classification). The performance of the derived reducers was mainly tested from the aspects of classification stability, classification accuracy, reduced stability, and elapsed time. The attribute reduction algorithms used for comparison are as follows:
(1) Dissimilarity Based Searching for Attribute Reduction (DBSAR) [
54];
(2) Knowledge Change Rate (KCR) [
55];
(3) Attribute Group (AG) [
56];
(4) Ensemble Selector For Attribute Reduction (ESAR) [
47];
(5) Multi-criterion Neighborhood Attribute Reduction (MNAR) [
32];
(6) Robust Attribute Reduction Based On Rough Sets (RARR) [
57].
4.3. Comparison of Classification Accuracy
In this section, we will use KNN, SVM and CART to predict the test samples to weigh up the classification accuracy of each algorithm. Immediately, for attribute reduction algorithms, given a decision system
, the classification accuracy applied to reduction is defined as
in which
is the prediction label made using reduct
for
.
Table 2 displays the detailed classification accuracy results for each algorithm on 16 datasets and
Figure 3 illustrates the radar charts for each dataset under the three classifiers with three different colors. The following conclusions can easily be reached by observing
Table 2 and
Figure 3.
- (1)
For the majority of datasets, regardless of whether the KNN, SVM, or CART classifier is used, the classification accuracies related to GBSAR- outperform other comparison algorithms. Taking the dataset “Parkinson Speech (ID: 7)” as an example, when KNN classifier is adopted, the classification accuracies of GBSAR-, DBSAR, BCKCR, AG, ESAR, MMAR, RAAR, and No-reduct classification each are 0.7259, 0.7063, 0.7008, 0.7093, 0.7031, 0.7253, 0.7095, and 0.6984, respectively; when using the SVM classifier, the classification accuracies of GBSAR-, DBSAR, BCKCR, AG, ESAR, MMAR, RAAR, and No-reduct classification are 0.6661, 0.6532, 0.6521, 0.6548, 0.6543, 0.6639, 0.6539, and 0.6488, respectively; by employing CART, the classification accuracies of GBSAR-, DBSAR, BCKCR, AG, ESAR, MMAR, RAAR, and No-reduct classification are 0.6433, 0.6429, 0.6420, 0.6413, 0.6424, 0.6307, 0.6421, and 0.6419 respectively. Therefore, the simplification derived from our GBSAR- can offer an effective categorization performance.
- (2)
From the average classification accuracy of each algorithm, the classification accuracy associated with GBSAR- is comparable or even more significant than that of DBSAR, BCKCR, AG, ESAR, MMAR, RAAR, and No-reduct classification. When using the KNN classifier, GBSAR-’s classification accuracy is 0.8258, which is at most 32.21% higher than those of others; when the SVM classifier is utilized, GBSAR-’s classification accuracy is 0.7903, which is at most 34.12% higher than those of others; by employing CART, GBSAR-’s classification accuracy is 0.8090, which is at most 27.35% higher than those of others.
4.4. Comparison of Classification Stability
In this section, similarly to
Section 4.3, we will evaluate the classification stability of each algorithm under KNN, SVM, and CART based on six advanced attribute reduction algorithms and a classification algorithm without applying any attribute reduction. Higher values of classification stability imply that the predicted label result is more stable and less susceptible to interference from the training samples.
Following the use of three classifiers on 16 datasets,
Table 3 and
Figure 4 show the classification stability findings of each algorithm. The following conclusions can easily be drawn in
Table 3 and
Figure 4.
- (1)
For most datasets, our GBSAR- algorithm plays a leading role in classification stability compared with other algorithms. Moreover, predictions based on the features related to GBSAR- gain absolute advantages for some datasets. Consider the dataset “Twonorm (ID: 12)” as an example: when the KNN classifier is used, the classification accuracies of GBSAR-, DBSAR, BCKCR, AG, ESAR, MMAR, RAAR, and No-reduct classification are 0.9300, 0.8934, 0.8747, 0.8744, 0.8809, 0.5300, 0.7139, and 0.8772, respectively; when adopting the SVM classifier, the classification accuracies of GBSAR-, DBSAR, BCKCR, AG, ESAR, MMAR, RAAR, and No-reduct classification are 0.9693, 0.9333, 0.9140, 0.9116, 0.9224, 0.5582, 0.8164, and 0.9458, respectively; when using CART, the classification accuracies of GBSAR-, DBSAR, BCKCR, AG, ESAR, MMAR, RAAR, and No-reduct classification are 0.7723, 0.7600, 0.7531, 0.7491, 0.7564, 0.5216, 0.6803, and 0.7512, respectively. Therefore, from the standpoint of classifier stability, GBSAR- can indeed provide a more stable classification performance.
- (2)
In terms of average classification accuracy, the classification stability connected with GBSAR- is far superior to the other algorithms. Moreover, when employing KNN classifier, the classification stability of GBSAR- is 0.9015, which is at most 24.52% higher than those of other methods; the classification stability of GBSAR- using SVM classifier is 0.9336, which is at most 14.67% higher than those of others; the classification stability of GBSAR- through the use of CART classifier is 0.8301, which is at most 12.92% higher than those of others.
4.5. Comparison of Reduced Stability
In this section, we will show the reduced stability of the attribute reduction corresponding to 16 datasets. The specific results are given in
Table 4.
The information shown in
Table 4 indicates that the reduced stability of GBSAR-
is slightly lower than RARR, but still in a leading position. Obviously, compared with DBSAR, BCKCR, AG, ESAR, and MMAR, the average reduced stability value of GBSAR-
is increased by 17.21%, 27.74%, 46.53%, 10.99%, and 111.21%, while it only decreases by 0.93% compared with RARR.
In general, although the reduced stability of our GBSAR- is not inferior to the result of RAAR for many datasets, its result is better than the six advanced algorithms in terms of attribute reduction in some cases. For instance, as far as the dataset “Climate Model Simulation Crashes (ID: 2)” is concerned, the reduced stabilities of GBSAR-, DBSAR, BCKCR, AG, ESAR, MMAR, and RAAR are 0.6605, 0.3265, 0.5284, 0.3483, 0.5814, 0.0545, and 0.3773, respectively. Compared with other algorithms, the result of GBSAR- is improved by 102.29%, 25.00%, 89.64%, 13.61%, 1111.93%, and 75.06%, respectively.
Therefore, it should be pointed out that using GBSAR- is more conducive to selecting attributes which are more suitable for sample changes.
4.6. Comparisons of Elapsed Time
In this section, we will compare the time taken to derive a simplification using different algorithms. The detailed results are reported in
Table 5.
Following a thorough analysis of
Table 5, it is not difficult to come to the findings that are listed below.
Considering the reduced stability mentioned in
Section 4.5 and the reduced length conflict with each other, it can be concluded that the higher the value of reduced stability, the longer the reduced length. Apparently, the reduced length of GBSAR-
is longer, which indicates that, in the simplification process, our algorithm needs to be strengthened in terms of the time speed.
From the view of the average elapsed time, it is worth mentioning that the value of GBSAR- is 56.42% and 65.42% lower than BCKCR and RARR, respectively. Taking the dataset “Pen-Based Recognition of Handwritten Digits (ID: 8)” as an example, the speed-up ratios of GBSAR- algorithm reached 0.9777, 7.6960, 1.3714, 1.9355, 0.0891, and 8.4364, respectively, when the elapsed times of GBSAR-, DBSAR, BCKCR, AG, ESAR, MMAR, and RAAR each are 175.1957, 171.2967, 1348.3062, 240.2581, 339.0988, 15.6138, and 1478.0267 s. Therefore, the elapsed time of GBSAR- for attribute reduction is lower than that of AG and ESAR under some circumstances.
From the above discussion, it is observed that, even though the elapsed time of our new algorithm is better than BCKCR and RARR in some datasets, GBSAR-’s speed performance still has to be improved.
5. Conclusions and Future Perspectives
In this paper, we propose a new searching strategy that differs from conventional algorithms in the following aspects. On the one hand, by automatically generating granular balls, there is no time consumption for radius optimization. On the other hand, guidance-based searching is designed to compress the attribute searching space. In addition, the quality-to-entropy ratio can overcome the limitations and predictability of the single-attribute measure method.
Through experiments on 16 UCI datasets, it is not formidable to reveal that our proposed strategy has quite a positive classification performance and strong stability in the process of exporting reduction.
Further research can be conducted for the two following aspects:
- (1)
Using the fused measure may increase the time of selecting the best attribute. Therefore, more accelerators [
11] can be added to further improve the efficiency and reduce the time consumption.
- (2)
The searching strategy proposed in this paper is a general module. Therefore, other measures based on the rough set can be substituted for the quality-to-entropy ratio, so as to compare the classification performance under various measures.