Optimizing Attribute Reduction in Multi-Granularity Data through a Hybrid Supervised–Unsupervised Model

Fan, Zeyuan; Chen, Jianjun; Cui, Hongyang; Song, Jingjing; Xu, Taihua

doi:10.3390/math12101434

Open AccessArticle

Optimizing Attribute Reduction in Multi-Granularity Data through a Hybrid Supervised–Unsupervised Model

by

Zeyuan Fan

,

Jianjun Chen

^*,

Hongyang Cui

,

Jingjing Song

and

Taihua Xu

School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212100, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(10), 1434; https://doi.org/10.3390/math12101434

Submission received: 22 March 2024 / Revised: 5 May 2024 / Accepted: 5 May 2024 / Published: 7 May 2024

(This article belongs to the Special Issue Mathematical and Computing Sciences for Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Attribute reduction is a core technique in the rough set domain and an important step in data preprocessing. Researchers have proposed numerous innovative methods to enhance the capability of attribute reduction, such as the emergence of multi-granularity rough set models, which can effectively process distributed and multi-granularity data. However, these innovative methods still have numerous shortcomings, such as addressing complex constraints and conducting multi-angle effectiveness evaluations. Based on the multi-granularity model, this study proposes a new method of attribute reduction, namely using multi-granularity neighborhood information gain ratio as the measurement criterion. This method combines both supervised and unsupervised perspectives, and by integrating multi-granularity technology with neighborhood rough set theory, constructs a model that can adapt to multi-level data features. This novel method stands out by addressing complex constraints and facilitating multi-perspective effectiveness evaluations. It has several advantages: (1) it combines supervised and unsupervised learning methods, allowing for nuanced data interpretation and enhanced attribute selection; (2) by incorporating multi-granularity structures, the algorithm can analyze data at various levels of granularity. This allows for a more detailed understanding of data characteristics at each level, which can be crucial for complex datasets; and (3) by using neighborhood relations instead of indiscernibility relations, the method effectively handles uncertain and fuzzy data, making it suitable for real-world datasets that often contain imprecise or incomplete information. It not only selects the optimal granularity level or attribute set based on specific requirements, but also demonstrates its versatility and robustness through extensive experiments on 15 UCI datasets. Comparative analyses against six established attribute reduction algorithms confirms the superior reliability and consistency of our proposed method. This research not only enhances the understanding of attribute reduction mechanisms, but also sets a new benchmark for future explorations in the field.

Keywords:

rough sets; attribute reduction; multi-granularity; information gain

MSC:

03B52; 68T37; 62H30; 18B05

1. Introduction

In this era of information explosion, data are growing exponentially in both dimension and volume, which leads to the attributes of data becoming redundant and vague. How to find valuable information from massive data has become challenging. Rough set theory, introduced by Pawlak [1] in 1982 as a simple and efficient method for data mining, can deal with fuzzy, incomplete, and inaccurate data [2].

The traditional model of rough sets mainly focuses on describing the uncertainty and fuzziness of data through binary relations [3]. In recent years, multi-granularity rough set models have been proposed to fully mine the multiple granularity levels of target information, extending the traditional single binary relation to multiple binary relations, with the work of Qian et al. [4] being representative. This model has provided a new solution for rough set theory in dealing with distributed data and multi-granularity data. Afterward, researchers continuously improved Qian’s multi-granularity rough set model. Some of the improvements combine multi-granularity rough sets with decision-theoretic rough sets to form a multi-granularity decision-theoretic rough set model [5]. In addition, there is research combining multi-granularity rough sets with the three-way decision model, proposing a multi-granularity three-way decision model [6]. Targeting the granulation of attributes and attribute values, Xu proposed an improved multi-granularity rough set model [7]. To expand the applicability of multi-granularity rough sets, Lin et al. integrated the neighborhood relation into the multi-granularity rough set model, proposing the neighborhood multi-granularity rough set. The introduction of this model has made the multi-granularity rough set research branch a hot topic of study [8]. These rough set models can effectively reduce data dimensionality, achieved by attribute reduction [9].

Attribute reduction can be achieved through supervised or unsupervised constraints, and research on constraints from both supervised and unsupervised perspectives has been extensively explored [10]. Specifically, some studies propose attribute reduction constraints based on measures from only one perspective, using these constraints to find qualified reductions. For instance, Jiang et al. [11] and Yuan et al. [12] concentrated on attribute reduction through the lens of supervised information granulation and related supervised metrics, respectively; meanwhile, Yang et al. [13] proposed a concept known as fuzzy complementary entropy for attribute reduction within an unsupervised model; The algorithm discussed by Jain and Som [14] introduces a sophisticated multigranular rough set model that utilizes an intuitionistic fuzzy beta covering approach; Ji et al. [15] developed an extended rough sets model based on a fuzzy granular ball to enhance attribute reduction effectiveness. However, whether considering supervised measures or unsupervised measures, single-perspective based measures exhibit inherent constraints. Firstly, measures relying on a single perspective may overlook the multifaceted evaluation of data, leading to the neglect of some important attributes [16]. This is because when only one fixed measure is used for the attribute reduction of data, the importance of each attribute is judged only based on its criterion. However, if other measures are needed for evaluation, then relying only on that criterion may no longer yield accurate results. Secondly, relying only on a single-perspective measure may not fully capture the characteristics of data under complex conditions, resulting in the selection of attributes that are neither accurate nor complete. For instance, if conditional entropy is used as a measure to evaluate attributes [17], the derived reduction may only possess the single feature required for evaluation, without fully considering other types of uncertainty features and learning capabilities.

To solve the limitations of the attribute reduction mentioned above, this paper introduces a new measure that merges both supervised and unsupervised perspectives, leading to a novel rough set model. The model proposed in this paper has the following advantages: (1) it integrates multi-granularity and neighborhood rough sets, making the model more adaptable to data features at different levels; and (2) for attribute sets of different granularities, it introduces a fusion strategy, selecting the optimal granularity level or attribute set according to the needs of different tasks and datasets, which can be flexibly adjusted based on specific circumstances.

The rest of this paper is organized as follows. Section 2 reviews related basic concepts. Section 3 provides a detailed introduction to the basic framework and algorithm design of the proposed method. In Section 4, the accuracy of our method is calculated and discussed through experiments. Finally, Section 5 concludes this paper and depicts some future works.

2. Preliminaries

2.1. Neighborhood Rough Sets

Neighborhood rough sets were proposed by Hu et al. as an improvement over traditional rough sets [18]. The key distinction lies in that neighborhood rough sets are established on the basis of neighborhood relations, as opposed to relations of indiscernibility [19]. Hence, the neighborhood rough set model is capable of processing both discrete and continuous data [20]. Moreover, the partitioning of neighborhoods granulates the sample space, which can reflect the discriminative power of different attributes on the samples [21].

Within the framework of rough set theory, a decision system is characterized by a tuple, represented by

D S = (U, A T)

, where U denotes a finite collection of samples and

A T

encompasses a suite of conditional attributes, including a decision attribute d [22]. The attribute d captures the sample labels [23]. For every x in U and every a in

A T

,

a (x)

signifies the value of x for the conditional attribute a, and

d (x)

represents the label of x. Utilizing d, one can derive an equivalence relation on U:

I N D (d) = {(x, y) \in U \times U : d (x) = d (y)}

(1)

Pursuant to

I N D (d)

, it leads to a division of

U / I N D (d) = X_{1}, X_{2}, \dots, X_{q} (q \geq 2)

. Each

X_{k}

within

U / I N D (d)

is recognized as the k-th decision category. Notably, the decision category that includes the sample x can be similarly referred to as

{[x]}_{d}

.

In rough set methods, binary relations are often used for information granulation, among which neighborhood relations, as one of the most effective binary relations, have received extensive attention. The formation of neighborhood relations is as follows:

N_{δ}^{A} = {(x, y) \in U \times U : r_{A} (x, y) \leq δ}

(2)

where

r_{A}

is a distance function regarding

A \subseteq A T

,

r \geq 0

is a radius.

r_{A} = \sum_{a \in A} {(a (x) - a (y))}^{2}

(3)

Based on

I N D (d)

, a segmentation of

U / I N D (d) = X_{1}, X_{2}, . . ., X_{q} (q \geq 2)

can be initiated. For every

X_{k}

within

U / I N D (d)

, it is identified as the k-th decision group. In particular, the decision group that encompasses sample x may also be represented as

{[x]}_{d}

.

In alignment with Equation (2), the vicinity of a sample x is established as follows:

δ_{A} (x) = {y \in U : r_{A} (x, y) \leq δ}

(4)

From the perspective of granular computing [24,25], both

I N D (d)

and

N_{δ A}

are derivations of information granules [26]. The most significant difference between these two types of information granules lies in their intrinsic mechanisms, i.e., the binary relations used. Based on the outcomes of these information granules, the concepts of lower and upper approximations within the context of neighborhood rough sets, as the fundamental units, were also proposed by Cheng et al.

2.2. Multi-Granularity Rough Sets

For multi-granularity rough sets [27,28], given

D S = (U, A T)

, where

A T = {A_{k} | k \in {1, 2, \dots, m}}

is a set of attributes, and the family of attribute subsets on

A T

is represented by

{A_{1}, A_{2}, \dots, A_{m}}

[29].

{[x]}_{A_{i}}

is an equivalence class of x under

A_{i}

, for any

X \subseteq U

, the optimistic multi-granularity lower and upper approximations of

A_{i}

with respect to X are defined as follows:

\underset{̲}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{O}} (x) = {x \in U ∣ {[x]}_{A_{1}} \subseteq X \lor {[x]}_{A_{2}} \subseteq X \lor \dots \lor {[x]}_{A_{m}} \subseteq X},

(5)

\bar{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{O}} (x) = (\underset{̲}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{O}} (x^{c}))^{c}

(6)

If

\underset{̲}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{O}} (x) \neq \bar{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{O}} (x)

, then

\underset{̲}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{O}} (x)

and

\bar{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{O}} (x)

are called optimistic multi-granularity rough sets.

Given

D S = (U, A T)

, where

A T = {A_{k} | k \in {1, 2, \dots, m}}

is a set of attributes, and the family of attribute subsets on

A T

is represented by

{A_{1}, A_{2}, \dots, A_{m}}

.

{[x]}_{A_{i}}

is an equivalence class of x under

A_{i}

, for any

X \subseteq U

, the pessimistic multi-granularity [30] lower and upper approximations of

A_{i}

with respect to X are defined as follows:

\underset{̲}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{P}} (x) = {x \in U ∣ {[x]}_{A_{1}} \subseteq X \land {[x]}_{A_{2}} \subseteq X \land \dots \land {[x]}_{A_{m}} \subseteq X},

(7)

\bar{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{P}} (x) = (\underset{̲}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{P}} (x^{c}))^{c}

(8)

If

\underset{̲}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{P}} (x) \neq \bar{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{P}} (x)

, then

\underset{̲}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{P}} (x)

and

\bar{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} A_{i}^{P}} (x)

are called pessimistic multi-granularity rough sets.

In the pursuit of refining data analysis, particularly when addressing complex and heterogeneous data sets, the application of multi-granularity rough sets provides a transformative framework. This approach offers a flexible methodology for representing data across various levels of granularity, allowing analysts to dissect large and diverse datasets into more comprehensible and manageable segments. This adaptability is crucial in environments where data exhibits varying degrees of precision, stemming from different sources or capturing differing phenomena.

2.3. Multi-Granularity Neighborhood Rough Sets

In the literature [31], Lin et al. proposed two types of neighborhood multi-granularity rough sets, which can be applied to deal with incomplete systems containing numerical and categorical attributes [32]. To simplify the problem, when dealing with incomplete systems, only the application of neighborhood multi-granularity rough sets to numerical data are considered.

Given

D S = (U, A T)

, where

A T = {A_{k} ∣ k \in {1, 2, \dots, m}}

,

U = {x_{i} ∣ i \in {1, 2, \dots, n}}

,

X \subseteq U

, in the optimistic neighborhood multi-granularity rough sets, the neighborhood multi-granularity approximation of X is defined as:

\underset{̲}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} N_{i}^{O}} (x) = {x_{i} \in U ∣ δ A_{1} (x_{i}) \subseteq X \lor δ A_{2} (x_{i}) \subseteq X \lor \dots \lor δ A_{k} (x_{i}) \subseteq X}

(9)

\bar{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} N_{i}^{O}} (x) = {x_{i} \in U ∣ δ A_{1} (x_{i}) \cap X \neq \emptyset \land δ A_{2} (x_{i}) \cap X \neq \emptyset \land \dots \land δ A_{m} (x_{i}) \cap X \neq \emptyset}

(10)

where

δ A_{k} (x_{i})

is the neighborhood granularity of

x_{i}

, based on the granularity structure

A_{k}

.

Given

D S = (U, A T)

, where

A T = {A_{k} ∣ k i n {1, 2, \dots, m}}

,

U = {x_{i} ∣ i i n {1, 2, \dots, n}}

,

X \subseteq U

, in the pessimistic neighborhood multi-granularity rough sets, the neighborhood multi-granularity approximation of X is defined as:

\underset{̲}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} N_{i}^{P}} (x) = {x_{i} \in U ∣ δ A_{1} (x_{i}) \subseteq X \land δ A_{2} (x_{i}) \subseteq X \land \dots \land δ A_{k} (x_{i}) \subseteq X}

(11)

\bar{\sum_{\begin{matrix} i = 1 \end{matrix}}^{m} N_{i}^{P}} (x) = {x_{i} i n U ∣ δ A_{1} (x_{i}) \cap X \neq \emptyset \lor δ A_{2} (x_{i}) \cap X \neq \emptyset \lor \dots \lor δ A_{k} (x_{i}) \cap X \neq \emptyset}

(12)

where

δ A_{k} (x_{i})

is the neighborhood granularity of

x_{i}

, based on the granularity structure

A_{k}

.

The incorporation of multi-granularity neighborhood rough sets extends this concept by emphasizing local contexts and the spatial or temporal relationships inherent within the data. By focusing on the neighborhoods around each data point, these sets are particularly adept at mitigating the influence of noise and anomalies, significantly enhancing the robustness of the analysis. The neighborhood-based approach also facilitates adaptive threshold settings, crucial for accurately defining the granularity level in datasets where this parameter is not readily apparent.

2.4. Supervised Attribute Reduction

It is well known that neighborhood rough sets are often used in supervised learning tasks, especially in enhancing generalization performance and reducing classifier complexity [33]. The advantage of attribute reduction lies in its easy adaptation to different practical application requirements, hence a variety of forms of attribute reduction have emerged in recent years. For neighborhood rough sets, information gain and split information value are two metrics that can be used to further explore the forms of attribute reduction.

Given the data

D S = 〈 U, A T, d, δ 〉

, for any

A \subseteq A T

, the neighborhood information gain of D based on A is defined as:

I G_{N R S} (d, A) = H_{N R S} - H_{N R S} (d, A)

(13)

Here,

H_{N R S}

is the entropy of the entire dataset D, calculated based on the distribution under neighborhood lower or upper approximation [34].

H_{N R S} (d, A)

is the expected value of uncertainty considering attribute A, defined as:

H_{N R S} (d, A) = - \frac{1}{| U |} \sum_{x \in U} | δ_{s} (x) \cap {[x]}_{d} | log \frac{| δ_{s} (x) \cap {[x]}_{d} |}{| δ_{s} (x) |}

(14)

Given the data

D S = 〈 U, A T, d, δ 〉

, for any

A \subseteq A T

, the neighborhood split information value of d based on A is defined as:

S I_{N R S} (d, A) = - \sum_{j = 1}^{n} \frac{| δ_{A} (X_{j}) |}{| U |} {log}_{2} \frac{| δ_{A} (X_{j}) |}{| U |}

(15)

Here,

δ_{A} (X_{j})

represents the sample set

X_{j}

within the neighborhood formed by attribute A, and n is the number of different neighborhoods formed by A [35].

The combination of neighborhood information gain and split information value helps to more comprehensively assess the impact of attributes on dataset classification, thereby making more effective decisions in attribute reduction.

2.5. Unsupervised Attribute Reduction

It is widely recognized that supervised attribute reduction necessitates the use of sample labels, which are time-consuming and expensive to obtain in many practical tasks [36]. In contrast, unsupervised attribute reduction does not require these labels, hence it has received more attention recently.

In unsupervised attribute reduction, if it is necessary to measure the importance of attributes, one can construct models by introducing pseudo-label strategies and using information gain and split information as metrics.

Given unsupervised data

I S = 〈 U, A T 〉

and

δ

, for any

A \subseteq A T

, the unsupervised information gain based on A is defined as:

I G_{N R S} (d, A) = H_{N R S} - H_{N R S} (d, A)

(16)

where

H_{N R S} (d, A)

is the expected value of uncertainty considering attribute A, defined as:

H_{N R S} (d, A) = \frac{1}{| A |} \sum_{a \in A} (H_{N R S} (d, A) (d^{a}))

(17)

d_{a}

denotes the pseudo-label decision for samples generated using conditional attribute a.

Given unsupervised data

I S = 〈 U, A T 〉

and

δ

, for any

A \subseteq A T

, the unsupervised split information based on A is defined as:

S I_{N R S} (d, A) = \frac{1}{| A |} \sum_{a \in A} (S I_{N R S} (d, A) (d^{a}))

(18)

Here,

d_{a}

is a pseudo-label decision, recorded by using conditional attribute a for sample pseudo-labels.

These definitions provide a new method for evaluating attribute importance in an unsupervised setting. Information gain reflects the contribution of an attribute to data classification, while split information measures the degree of confusion introduced by an attribute in the division of the dataset. This approach helps in more effective attribute selection and reduction in unsupervised learning.

3. Proposed Method

3.1. Definition of Multi-Granularity Neighborhood Information Gain Ratio

Considering a dataset

D S = 〈 U, A T, d, δ 〉

, with U representing the sample set,

A T

indicating the attribute set, d denoting the decision attribute, and

δ

specifying the neighborhood radius.

For any

A \subseteq A T

, the multi-granularity neighborhood information gain ratio is defined as:

ϵ_{A} (d) = \frac{S I_{N R S} (d, A)}{e^{|I G_{N R S} (d, A)|}} \times W_{A}

(19)

where

S I_{N R S} (d, A)

is the neighborhood split information quantity based on A,

e^{|I G_{N R S} (d, A)|}

is the information gain for decision attribute d based on attribute A with the base of the natural logarithm, and

W_{A}

is the granularity space coefficient of attribute A in the multi-granularity structure, reflecting its importance in the multi-granularity structure.

For the calculation of the granularity space coefficient, given a set of granularities

G_{1}, G_{2}, . . ., G_{n}

, the performance of attribute A under each granularity can be measured by the quantitative indicator

P_{G_{i}} (A)

. The definition of the granularity space coefficient

W_{A}

is as follows:

W_{A} = \frac{\sum_{i = 1}^{n} β_{i} \cdot P_{G_{i}} (A)}{\sum_{i = 1}^{n} β_{i}}

(20)

where

β_{i}

is the granularity space allocated to each granularity

G_{i}

, reflecting the importance of different granularities. These granularity spaces are usually determined based on the specific background knowledge or experimental verification of the problem.

The granularities

G_{1}, G_{2}, . . ., G_{n}

in the multi-granularity structure are determined according to the data characteristics, problem requirements, etc. [37], and each granularity reflects different levels or details of the data. When calculating the granularity space coefficient, the performance of the attribute under different granularities is considered, in order to more accurately reflect its importance in the multi-granularity structure.

The neighborhood rough set is a method for dealing with uncertain and fuzzy data, which uses neighborhood relations instead of the indiscernible relations in traditional rough sets. In this method, data are decomposed into different granularities, each representing different levels or details of the data. Information gain ratio is a method for measuring the importance of attributes in data classification. It is based on the concept of information entropy and evaluates the classification capability of an attribute by comparing the entropy change in the dataset with and without the attribute.

Therefore,

ϵ

combines these two concepts, i.e., neighborhood information gain at different granularities and the split information value of attributes, to evaluate the importance of attributes in multi-granularity data analysis. The structure of the

ϵ

-reduct part is shown in Figure 1. This method not only considers the information gain of attributes, but also their performance at different granularities, thus providing a more comprehensive method of attribute evaluation.

Given a decision system

D S

and a threshold

θ \in [0, 1]

, an attribute A is considered significant if it satisfies the following conditions:

$\frac{ϵ_{A} (d)}{ϵ_{A T} (d)} \geq θ$ ;
There is no proper subset $A^{'}$ of attribute A such that $\frac{ϵ_{A}^{'} (d)}{ϵ_{A T} (d)} < θ$ .

In this definition, significant attributes are determined based on their contribution to the information gain ratio, aiming to select attributes that are informative, yet not redundant for the decision-making process. This method is based on greedy search techniques for attribute reduction, and helps identify attributes that significantly impact the decision outcome.

Given a dataset

D S = 〈 U, A T, d 〉

, where U is the set of objects,

A T

is the set of conditional attributes, and d is the decision attribute. For any attribute subset

A \subseteq A T

and any

a \in A T - A

(i.e., any attribute not in A), the significance of attribute a regarding the multi-granularity neighborhood information gain ratio is defined as follows:

S i g_{ϵ_{a} (d)} = ϵ_{A \cup {a}} (d) - ϵ_{A} (d)

(21)

The aforementioned significance function suggests that an increase in value enhances the importance of a conditional attribute, making it more probable to be included in the reduction set. For example, if

S i g_{ϵ_{a} 1 (d)} < S i g_{ϵ_{a} 2 (d)}

, where

a 1, a 2 \in A T - A

, then

ϵ_{A \cup {a 1}} (d) < ϵ_{A \cup {a 2}} (d)

. Such a result indicates that choosing

a 2

to join A would lead to a higher multi-granularity neighborhood information gain ratio compared to

a 1

.

Given the foregoing, it is not formidable to conclude that

ϵ

-reduct has the following benefits.

Multi-level data analysis: By incorporating multi-granularity structures, the algorithm can analyze data at various levels of granularity. This allows for a more detailed understanding of data characteristics at each level, which can be crucial for complex datasets.
Comprehensive attribute evaluation: The algorithm evaluates attributes not only based on information gain, but also considering their performance across different granularities through the granularity space coefficient. This provides a holistic measure of attribute importance that accounts for varied data resolutions and contexts.
Handling uncertainty and fuzziness: by using neighborhood relations instead of indiscernibility relations, the method effectively handles uncertain and fuzzy data, making it suitable for real-world datasets that often contain imprecise or incomplete information.

However, while having various advantages, it may also has certain limitations like computational complexity due to the computation of neighborhood information gain ratio for each attribute across multiple granularities. These have endowed it with infinite potential and room for development.

3.2. Detailed Algorithm

Based on the significance function, Algorithm 1 is designed to find the

ϵ

-reduct.

To streamline the analysis of the computational complexity for Algorithm 1, we initiate by applying k-means clustering to generate pseudo labels for the samples. With T denoting the iteration count for k-means clustering and k indicating the cluster count, the complexity of creating pseudo labels is

O (k \cdot T \cdot | U | \cdot | A T |)

, where

| U |

is the total number of samples, and

| A T |

signifies the attribute count. Subsequently, the calculation of

ϵ_{A \cup a} (d)

occurs no more than

(1 + | A T |) \cdot | A T | / 2

times. In conclusion, the computational complexity of Algorithm 1 equates to

O (\frac{{| U |}^{2} \cdot {| A T |}^{3}}{2} + k \cdot T \cdot | U | \cdot | A T |)

.

Algorithm 1 Forward greedy searching for

ϵ

-reduct with neighborhood rough set (NRS-

ϵ

)

Input: A decision system

D S = (U, A T, d)

, a neighborhood radius

δ

, a significance threshold

θ

.
Output: An

ϵ

-reduct A.

1:: Initialize $A = \emptyset$ ;
2:: Calculate initial neighborhood rough set characteristics for $D S$ ;
3:: for each attribute $a \in A T$ do
4:: Generate neighborhood relations $N_{δ}^{a}$ for a;
5:: end for
6:: repeat
7:: for each $a \in A T - A$ do
8:: Calculate the neighborhood information gain $I G_{N R S} (d, A \cup {a})$ ;
9:: Calculate the neighborhood split information $S I_{N R S} (d, A \cup {a})$ ;
10:: Compute $ϵ_{A \cup {a}} (d) = \frac{S I_{N R S} (d, A \cup {a})}{e^{| I G_{N R S} (d, A \cup {a}) |}} \times W_{A \cup {a}}$ ;
11:: end for
12:: Select attribute $b = arg max {ϵ_{A \cup {a}} (d) : a \in A T - A}$ ;
13:: Update $A = A \cup {b}$ ;
14:: until No attribute can improve the significance threshold $θ$ or $\frac{ϵ_{A} (d)}{ϵ_{A T} (d)} \geq θ$ return A

4. Experimental Analysis

4.1. Dataset Description

To evaluate the performance of the proposed measure, 15 UCI datasets are used in this experiment. These datasets were carefully selected after a thorough review to meet the multi-granular criteria required by our method, accommodating both supervised and unsupervised learning scenarios. Table 1 summarizes the statistical information of these datasets.

4.2. Experimental Configuration

The experiment was performed on a personal computer running Windows 11, featuring an Intel Core i5-12500H processor (2.50 GHz) with 16.00 GB RAM. MATLAB R2023a served as the development environment.

In this experiment, a double means algorithm was adopted to recursively allocate attribute granularity space, while utilizing the k-means clustering method [38] to generate pseudo labels for samples, and the information gain ratio as the criterion for evaluating attribute reduction. Notably, the selected k-value needs to match the number of decision categories in the dataset. Moreover, the effect of the neighborhood rough set is significantly influenced by the preset radius size. To demonstrate the effectiveness and applicability of the proposed method, a series of experiments are designed using 20 different radius values, incremented by 0.02, ranging from 0.02 to 0.40. Through 10-fold cross-validation, the simplified reasoning process is validated. Specifically, for each specific radius, the dataset was divided into ten subsets, nine for training and one for testing. This cross-validation process was repeated 10 times to ensure that each subset had the opportunity to serve as the test set, thereby evaluating the classification performance and ensuring the reliability and stability of the model.

In the experiment, the proposed measure is compared with six advanced attribute reduction algorithms as well as with the algorithm without applying any attribute reduction methods (no reduct) using Regression Trees (CART) [20], K-Nearest Neighbors (KNN, K = 3) [39], and Support Vector Machines (SVM) [40]. The performance of the reducer is evaluated in aspects of the stability, accuracy, and timeliness of classification, as well as the stability of reduction. The attribute reduction algorithms included for comparison are:

MapReduce-Based Attribute Reduction Algorithm (MARA) [41];

Robust Attribute Reduction Based On Rough Sets (RARR) [42];

Bipolar Fuzzy Relation System Attribute Reduction Algorithms (BFRS) [43];

Attribute Group (AG) [44];

Separability-Based Evaluation Function (SEF) [45];

Genetic Algorithm-based Attribute Reduction (GAAR) [46].

4.3. Comparison of Classification Accuracy

In this part, the classification accuracy of each algorithm is evaluated using KNN, SVM, and CART for predicting test samples. Regarding attribute reduction algorithms, within a decision system

D S

, the definition of classification accuracy post-reduction is as follows:

A c c_{r e d} = \frac{| {x_{i} \in r e d | {P r e}_{r e d} (x_{i}) = d (x_{i})} |}{| U |},

(22)

where

{Pre}_{r e d} (x_{i})

is the predicted label for

x_{i}

using the reduced set

r e d

.

Table 2 and Figure 2 present the specific classification accuracy outcomes for each algorithm across 15 datasets. From these observations, several insights can be readily inferred:

For most datasets, the classification accuracy associated with NRS- $ϵ$ is superior to other comparison algorithms, regardless of whether the KNN, SVM, or CART classifier is used. For example, in the “Car Evaluation (ID: 6)” dataset, when using the CART classifier, the classification accuracies of NRS- $ϵ$ , MARA, RARR, BFRS, AG, SEF, and GAAR are 0.5039, 0.4529, 0.4157, 0.494, 0.4886, 0.4909, 0.4719, respectively; when using the KNN classifier, the classification accuracies of NRS- $ϵ$ , MARA, RARR, BFRS, AG, SEF, and GAAR are 0.6977, 0.6584, 0.535, 0.6747, 0.6675, 0.6586, 0.6579, respectively; when using SVM, the classification accuracies of NRS- $ϵ$ , MARA, RARR, BFRS, AG, SEF, and GAAR are 0.5455, 0.4307, 0.368, 0.4737, 0.4698, 0.4718, 0.4923, respectively. Therefore, NRS- $ϵ$ derived simplifications can provide effective classification performance.
Examining the average classification accuracy per algorithm reveals that the accuracy associated with NRS- $ϵ$ is on par with, if not exceeding, that of MARA, RARR, BFRS, AG, SEF, and GAAR. When using the CART classifier, the average classification accuracy of NRS- $ϵ$ is 0.8012, up to 29.28% higher than other algorithms; when using the KNN classifier, the average classification accuracy of NRS- $ϵ$ is 0.8169, up to 34.48% higher than other algorithms; when using SVM, the average classification accuracy of NRS- $ϵ$ is 0.80116, up to 36.38% higher than other algorithms.

4.4. Comparison of Classification Stability

Similar to the evaluation of classification accuracy, this section explores the classification stability obtained by analyzing the classification results of seven different algorithms, including experiments with CART, KNN, and SVM classifiers. In a decision system

D S = 〈 U, A T, d, δ 〉

, assume the set U is equally divided into z mutually exclusive groups of the same size (using 10-fold cross-validation, so

z = 10

); that is,

U_{1}, \dots, U_{τ}, \dots, U_{z}

(

1 \leq τ \leq z

). Then, the classification stability based on redundancy reduction

{red}_{τ}

(obtained by removing

U_{τ}

from the set U) can be represented as:

{S t a b}_{c l a s s} = \frac{2}{z \cdot (z - 1)} \sum_{τ = 1}^{z - 1} \sum_{τ^{'} = τ + 1}^{z} E x a ({r e d}_{τ}, {r e d}_{τ^{'}})

(23)

where

Exa ({r e d}_{τ}, {r e d}_{τ^{'}})

measures the consistency between two classification results, which can be defined according to Table 3.

In Table 3,

{P r e r e d}_{τ} (x)

represents the predicted label of x obtained by

{red}_{τ}

. The symbols

ψ_{1}

,

ψ_{2}

,

ψ_{3}

, and

ψ_{4}

, respectively, represent the number of samples that satisfy the corresponding conditions in Table 4. Based on this,

E x a ({r e d}_{τ}, {r e d}_{τ^{'}})

is defined as

E x a ({r e d}_{τ}, {r e d}_{τ^{'}}) = \frac{ψ_{1} + ψ_{4}}{ψ_{1} + ψ_{2} + ψ_{3} + ψ_{4}} .

(24)

The classification stability index reflects the degree of deviation of prediction labels when data perturbation occurs. Higher values of classification stability mean more stable prediction labels, indicating better quality of the corresponding reduction. Improvements in classification stability mean increased stability of prediction label results and reduced interference with training samples. After analyzing the 15 datasets using these three classifiers, Table 4 and Figure 3 present the findings of each algorithm in terms of classification stability. It should be noted that the classification stability index reflects the degree of change in prediction labels when data are perturbed. Higher classification stability values indicate more stable prediction labels, meaning the related redundancy reduction has a higher quality.

Across many datasets, the NRS- $ϵ$ algorithm exhibits leading performance compared to other algorithms in terms of classification stability, playing a leading role. For example, in the “Iris Plants Database (ID: 2)” dataset, significant differences in classification accuracy were observed under different classifiers for NRS- $ϵ$ and other algorithms: when using the CART classifier, the accuracy of NRS- $ϵ$ reached 0.7364, while MARA, RARR, BFRS, AG, SEF, and GAAR had accuracies of 0.6794, 0.7122, 0.6981, 0.7244, 0.7130, and 0.7276, respectively; when using the KNN classifier, the accuracy of NRS- $ϵ$ was 0.8357, with MARA, RARR, BFRS, AG, SEF, and GAAR algorithms having accuracies of 0.6380, 0.8349, 0.8155, 0.8145, 0.8253, and 0.8246, respectively; when using the SVM classifier, the accuracies of NRS- $ϵ$ , MARA, RARR, BFRS, AG, SEF, and GAAR were 0.8918, 0.6581, 0.8774, 0.8771, 0.8748, 0.8852, 0.8783, respectively.
Regarding average classification accuracy, the stability of NRS- $ϵ$ markedly surpasses that of competing algorithms. Specifically, when using the CART classifier, the classification stability of NRS- $ϵ$ was 0.8228, up to 12.51% higher than other methods; when using the KNN classifier, its classification stability was 0.8972, up to 25.14% higher than other methods; and through the SVM classifier, the classification stability of NRS- $ϵ$ was 0.9295, up to 14.61% higher than other methods.

4.5. Comparisons of Elapsed Time

In this section, the time required for attribute reduction by different algorithms is compared. The results are shown in Table 5.

An increase in the value of dimensionality reduction stability correlates with an extended length of reduction. Through an in-depth analysis of the table below, the findings listed below can be easily derived. The reduction length of NRS-

ϵ

is longer, suggesting that there is a need to enhance the algorithm’s time efficiency throughout the simplification process.

When analyzing the average processing time performance of algorithms, from the perspective of average time consumed, it is noteworthy that the value of NRS-

ϵ

is reduced by 97.23% and 48.86% compared to RARR and GAAR, respectively. Taking the dataset “Car Evaluation (ID: 6)” as an example, the time consumed by NRS-

ϵ

, MARA, RARR, BFRS, AG, SEF, and GAAR are 122.1212 s, 6.9838 s, 421.1056 s, 154.8219 s, 31.4599 s, 33.3661 s, and 54.0532 s, respectively. Hence, under certain conditions, the time NRS-

ϵ

takes for attribute reduction is less compared to RARR and BFRS.

Based on the discussion, it is evident that while our novel algorithm exhibits better time efficiency compared to RARR and BFRS on certain datasets, the speed of NRS-

ϵ

requires further enhancement.

4.6. Comparison of Attribute Dimensionality Reduction Stability

In this section, the attribute dimensionality reduction stability related to 15 datasets is presented. Table 6 shows that the dimensionality reduction stability of NRS-

ϵ

is slightly lower than GAAR and SEF, but still maintains a leading position. Compared to MARA, RARR, BFRS, and AG, the average dimensionality reduction stability value of NRS-

ϵ

has increased by 100.2%, 49.89%, 27.19%, and 14.15%, respectively, while it only decreased by 19.323% and 6.677% compared to GAAR and SEF.

Although NRS-

ϵ

does not fall short of GAAR and SEF’s results in terms of dimensionality reduction stability on many datasets, in some cases, its results in attribute dimensionality reduction are superior to the six advanced algorithms. For example, for the “Letter Recognition (ID: 15)” dataset, the dimensionality reduction stability of NRS-

ϵ

, MARA, RARR, BFRS, AG, SEF, and GAAR were 0.8608, 0.6001, 0.4011, 0.7882, 0.6549, 0.7723, and 0.7442, respectively. Compared to other algorithms, the results of NRS-

ϵ

improved by 43.47%, 115.2%, 9.211%, 31.44%, 11.46%, and 15.67%, respectively. Thus, it is important to recognize that employing NRS-

ϵ

favors the selection of attributes better aligned with variations in samples.

5. Conclusions and Future Expectations

In this study, we introduced a novel attribute reduction strategy designed to address the challenges associated with high-dimensional data analysis. This strategy innovatively combines multi-granularity modeling with both supervised and unsupervised learning frameworks, enhancing its adaptability and effectiveness across various levels of data complexity.

This model’s integration of multi-granularity aspects distinguishes it from conventional attribute reduction methods by providing enhanced flexibility and adaptability to different data feature levels. This allows for more precise and effective handling of complex, high-dimensional datasets. The application of our proposed strategy across 15 UCI datasets has demonstrated not only exceptional classification performance, but also robust stability during the dimensionality reduction process. These results substantiate the practical utility and effectiveness of our approach in diverse data scenarios. While the strategy marks a significant advancement in attribute reduction, it does present challenges, primarily related to computational efficiency. The sophisticated nature of the integrated measurement methods, though beneficial for attribute selection quality, substantially increases the computational time required. This aspect can be particularly limiting in time-sensitive applications. To enhance the practicality and efficiency of our attribute reduction strategy, future research efforts could focus on:

Implementing acceleration technologies could significantly reduce the computational burden, making the strategy more feasible for larger or more complex datasets.
Exploring alternative rough set-based fundamental measurements could provide deeper insights into their impact on classification performance. This exploration may lead to the discovery of even more effective attribute reduction techniques.

By addressing these limitations and exploring these suggested future research directions, we can further refine our attribute reduction strategy, potentially setting a new benchmark in the field. Our findings not only contribute to the existing body of knowledge, but also pave the way for future explorations aimed at enhancing data preprocessing techniques in the era of big data.

Author Contributions

Conceptualization, J.C.; methodology, J.C.; software, Z.F.; validation, J.S.; formal analysis, H.C.; investigation, T.X.; resources, J.S.; data curation, T.X.; writing—original draft preparation, Z.F.; writing—review and editing, T.X.; visualization, Z.F.; supervision, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62006099), Industry-school Cooperative Education Program of the Ministry of Education (Grant No. 202101363034).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare there are no conflicts of interest.

References

Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
Chen, H.; Li, T.; Luo, C.; Horng, S.J.; Wang, G. A decision-theoretic rough set approach for dynamic data mining. IEEE Trans. Fuzzy Syst. 2015, 23, 1958–1970. [Google Scholar] [CrossRef]
Dowlatshahi, M.; Derhami, V.; Nezamabadi-pour, H. Ensemble of filter-based rankers to guide an epsilon-greedy swarm optimizer for high-dimensional feature subset selection. Information 2017, 8, 152. [Google Scholar] [CrossRef]
Qian, Y.; Liang, J.; Wei, Z.; Dang, C. Information granularity in fuzzy binary GrC model. IEEE Trans. Fuzzy Syst. 2010, 19, 253–264. [Google Scholar] [CrossRef]
Qian, Y.; Li, F.; Liang, J.; Liu, B.; Dang, C. Space structure and clustering of categorical data. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 2047–2059. [Google Scholar] [CrossRef] [PubMed]
Qian, J.; Liu, C.; Miao, D.; Yue, X. Sequential three-way decisions via multi-granularity. Inf. Sci. 2020, 507, 606–629. [Google Scholar] [CrossRef]
Wan, S.; Wang, F.; Dong, J. A preference degree for intuitionistic fuzzy values and application to multi-attribute group decision making. Inf. Sci. 2016, 370, 127–146. [Google Scholar] [CrossRef]
Zhang, Q.; Liu, J.; Yang, F.; Sun, Q.; Yao, Z. Subjective weight determination method of evaluation index based on intuitionistic fuzzy set theory. In Proceedings of the 2022 34th Chinese Control and Decision Conference (CCDC), Hefei, China, 15–17 August 2022; pp. 2858–2861. [Google Scholar] [CrossRef]
Chen, Y.; Wang, P.; Yang, X.; Yu, H. Bee: Towards a robust attribute reduction. Int. J. Mach. Learn. Cybern. 2022, 13, 3927–3962. [Google Scholar] [CrossRef]
Liu, K.; Yang, X.; Fujita, H.; Liu, D.; Yang, X.; Qian, Y. An efficient selector for multi-granularity attribute reduction. Inf. Sci. 2019, 505, 457–472. [Google Scholar] [CrossRef]
Jiang, Z.; Liu, K.; Yang, X.; Yu, H.; Fujita, H.; Qian, Y. Accelerator for supervised neighborhood based attribute reduction. Int. J. Approx. Reason. 2020, 119, 122–150. [Google Scholar] [CrossRef]
Yuan, Z.; Chen, H.; Li, T.; Yu, Z.; Sang, B.; Luo, C. Unsupervised attribute reduction for mixed data based on fuzzy rough sets. Inf. Sci. 2021, 572, 67–87. [Google Scholar] [CrossRef]
Yang, X.; Yao, Y. Ensemble selector for attribute reduction. Appl. Soft Comput. 2018, 70, 1–11. [Google Scholar] [CrossRef]
Jain, P.; Som, T. Multigranular rough set model based on robust intuitionistic fuzzy covering with application to feature selection. Int. J. Approx. Reason. 2023, 156, 16–37. [Google Scholar] [CrossRef]
Ji, X.; Peng, J.H.; Zhao, P.; Yao, S. Extended rough sets model based on fuzzy granular ball and its attribute reduction. Inf. Sci. 2023, 481, 119071. [Google Scholar] [CrossRef]
Yang, Y.; Chen, D.; Wang, H. Active sample selection based incremental algorithm for attribute reduction with rough sets. IEEE Trans. Fuzzy Syst. 2016, 25, 825–838. [Google Scholar] [CrossRef]
Qian, Y.; Liang, J.; Pedrycz, W.; Dang, C. Positive approximation: An accelerator for attribute reduction in rough set theory. Artif. Intell. 2010, 174, 597–618. [Google Scholar] [CrossRef]
Hu, Q.; Pedrycz, W.; Yu, D.; Lang, J. Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans. Syst. Man Cybern. Part (Cybernetics) 2009, 40, 137–150. [Google Scholar] [CrossRef]
Li, J.; Yang, X.; Song, X.; Li, J.; Wang, P.; Yu, D. Neighborhood attribute reduction: A multi-criterion approach. Int. J. Mach. Learn. Cybern. 2019, 10, 731–742. [Google Scholar] [CrossRef]
Wang, J.; Liu, Y.; Chen, J.; Yang, X. An Ensemble Framework to Forest Optimization Based Reduct Searching. Symmetry 2022, 14, 1277. [Google Scholar] [CrossRef]
Xu, E.; Gao, X.; Tan, W. Attributes Reduction Based On Rough Set. In Proceedings of the 2006 International Conference on Machine Learning and Cybernetics, Dalian, China, 13–16 August 2006; pp. 1438–1442. [Google Scholar] [CrossRef]
Xu, S.; Ju, H.; Shang, L.; Pedrycz, W.; Yang, X.; Li, C. Label distribution learning: A local collaborative mechanism. Int. J. Approx. Reason. 2020, 121, 59–84. [Google Scholar] [CrossRef]
Xu, X.; Niu, Y.; Niu, Y. Research on attribute reduction algorithm based on Rough Set Theory and genetic algorithms. In Proceedings of the 2011 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), Zhengzhou, China, 8–10 August 2011; pp. 524–527. [Google Scholar] [CrossRef]
Yang, X.; Xu, S.; Dou, H.; Song, X.; Yu, H.; Yang, J. Multigranulation rough set: A multiset based strategy. Int. J. Comput. Intell. Syst. 2017, 10, 277–292. [Google Scholar] [CrossRef]
Yang, X.; Liang, S.; Yu, H.; Gao, S.; Qian, Y. Pseudo-label neighborhood rough set: Measures and attribute reductions. Int. J. Approx. Reason. 2019, 105, 112–129. [Google Scholar] [CrossRef]
Dai, J.; Hu, H.; Wu, W.; Qian, Y.; Huang, D. Maximal-discernibility-pair-based approach to attribute reduction in fuzzy rough sets. IEEE Trans. Fuzzy Syst. 2017, 26, 2174–2187. [Google Scholar] [CrossRef]
Qian, Y.; Li, S.; Liang, J.; Shi, Z.; Wang, F. Pessimistic rough set based decisions: A multigranulation fusion strategy. Inf. Sci. 2014, 264, 196–210. [Google Scholar] [CrossRef]
Qian, Y.; Liang, J.; Yao, Y.; Dang, C. MGRS: A multi-granulation rough set. Inf. Sci. 2010, 180, 949–970. [Google Scholar] [CrossRef]
Pan, Y.; Xu, W.; Ran, Q. An incremental approach to feature selection using the weighted dominance-based neighborhood rough sets. Int. J. Mach. Learn. Cybern. 2023, 14, 1217–1233. [Google Scholar] [CrossRef]
Qian, Y.; Zhang, H.; Sang, Y.; Liang, J. Multigranulation decision-theoretic rough sets. Int. J. Approx. Reason. 2014, 55, 225–237. [Google Scholar] [CrossRef]
Lin, G.; Qian, Y.; Li, J. NMGRS: Neighborhood-based multigranulation rough sets. Int. J. Approx. Reason. 2012, 53, 1080–1093. [Google Scholar] [CrossRef]
Song, M.; Chen, J.; Song, J.; Xu, T.; Fan, Y. Forward Greedy Searching to κ-Reduct Based on Granular Ball. Symmetry 2023, 15, 996. [Google Scholar] [CrossRef]
Xing, T.; Chen, J.; Xu, T.; Fan, Y. Fusing Supervised and Unsupervised Measures for Attribute Reduction. Intell. Autom. Soft Comput. 2023, 37, 561. [Google Scholar] [CrossRef]
Dai, J.; Wang, W.; Tian, H.; Liu, L. Attribute selection based on a new conditional entropy for incomplete decision systems. Knowl. Based Syst. 2013, 39, 207–213. [Google Scholar] [CrossRef]
Liang, J.; Zhao, X.; Li, D.; Cao, F.; Dang, C. Determining the number of clusters using information entropy for mixed data. Pattern Recognit. 2012, 45, 2251–2265. [Google Scholar] [CrossRef]
Yin, Z.; Fan, Y.; Wang, P.; Chen, J. Parallel Selector for Feature Reduction. Mathematics 2023, 11, 2084. [Google Scholar] [CrossRef]
Chen, Y.; Wang, P.; Yang, X.; Mi, J.; Liu, D. Granular ball guided selector for attribute reduction. Knowl. Based Syst. 2021, 229, 107326. [Google Scholar] [CrossRef]
Wang, P.; Shi, H.; Yang, X.; Mi, J. Three-way k-means: Integrating k-means and three-way decision. Int. J. Mach. Learn. Cybern. 2019, 10, 2767–2777. [Google Scholar] [CrossRef]
Fukunaga, K.; Narendra, P. A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Comput. 1975, 100, 750–753. [Google Scholar] [CrossRef]
Chang, C.; Lin, C. LIBSVM: A library for support vector machines. Acm Trans. Intell. Syst. Technol. (Tist) 2011, 2, 1–27. [Google Scholar] [CrossRef]
Yin, L.; Li, J.; Jiang, Z.; Ding, J.; Xu, X. An efficient attribute reduction algorithm using MapReduce. J. Inf. Sci. 2021, 47, 101–117. [Google Scholar] [CrossRef]
Dong, L.; Chen, D.; Wang, N.; Lu, Z. Key energy-consumption feature selection of thermal power systems based on robust attribute reduction with rough sets. Inf. Sci. 2020, 532, 61–71. [Google Scholar] [CrossRef]
Ali, G.; Akram, M.; Alcantud, J. Attributes reductions of bipolar fuzzy relation decision systems. Neural Comput. Appl. 2020, 32, 10051–10071. [Google Scholar] [CrossRef]
Chen, Y.; Liu, K.; Song, J.; Fujita, H.; Yang, X.; Qian, Y. Attribute group for attribute reduction. Inf. Sci. 2020, 535, 64–80. [Google Scholar] [CrossRef]
Hu, M.; Tsang, E.; Guo, Y.; Xu, W. Fast and robust attribute reduction based on the separability in fuzzy decision systems. IEEE Trans. Cybern. 2021, 52, 5559–5572. [Google Scholar] [CrossRef] [PubMed]
Iqbal, F.; Hashmi, J.; Fung, B.; Batool, R.; Khattak, A.; Aleem, S.; Hung, P. A hybrid framework for sentiment analysis using genetic algorithm based feature reduction. IEEE Access 2019, 7, 14637–14652. [Google Scholar] [CrossRef]

Figure 1. The structure of the

ϵ

-reduct part.

Figure 1. The structure of the

ϵ

-reduct part.

Figure 2. Classification accuracies of three classifiers.

Figure 3. Classification stabilities of three classifiers.

Table 1. Dataset descriptions.

ID	Datasets	Samples	Attributes	Labels
1	Adult Income	48,842	14	2
2	Iris Plants Database	150	4	3
3	Wine	178	13	3
4	Breast Cancer Wisconsin (Original)	699	10	2
5	Climate Model Simulation Crashes	540	20	2
6	Car Evaluation	1728	6	4
7	Human Activity Recognition Using Smartphones	10,299	561	6
8	Statlog (Image Segmentation)	2310	18	7
9	Yeast	1484	8	10
10	Seeds	210	7	3
11	Ultrasonic Flowmeter Diagnostics-Meter D	180	43	4
12	Spambase	4601	57	2
13	Mushroom	8124	22	2
14	Heart Disease	303	75	5
15	Letter Recognition	20,000	16	26

Table 2. The comparisons of the classification accuracies.

				CART
ID	NRS- $ϵ$	MARA	RARR	BFRS	AG	SEF	GAAR	NO REDUCT
1	0.8514	0.5849	0.5911	0.8469	0.8511	0.8494	0.8452	0.8377
2	0.6795	0.5078	0.6392	0.6604	0.6749	0.6516	0.6573	0.6715
3	0.7312	0.6342	0.7584	0.7213	0.7671	0.7568	0.7687	0.7019
4	0.9507	0.5655	0.1499	0.9503	0.9502	0.9466	0.9445	0.9434
5	0.8783	0.8057	0.8779	0.8468	0.8532	0.8340	0.8184	0.8528
6	0.5039	0.4529	0.4157	0.4940	0.4886	0.4909	0.4719	0.5006
7	0.9848	0.7403	0.6217	0.9831	0.9845	0.9825	0.9820	0.9818
8	0.9271	0.3540	0.3046	0.9187	0.9262	0.9245	0.9259	0.9007
9	0.8101	0.7125	0.8015	0.8014	0.8097	0.7944	0.8075	0.7884
10	0.8091	0.4674	0.7979	0.7947	0.8047	0.8023	0.8022	0.8005
11	0.9281	0.8612	0.8769	0.9072	0.9140	0.9276	0.8937	0.8917
12	0.8206	0.5887	0.5967	0.8161	0.8161	0.8161	0.8169	0.8016
13	0.8834	0.8115	0.8828	0.8652	0.8665	0.8670	0.8794	0.8769
14	0.6158	0.5784	0.6048	0.6067	0.6095	0.6053	0.6138	0.6117
15	0.6434	0.6308	0.6422	0.6430	0.6414	0.6421	0.6425	0.6482
Average	0.8012	0.6197	0.6374	0.7903	0.7972	0.7927	0.7913	0.7873
rate		29.27% ↑	25.68% ↑	1.37% ↑	0.49% ↑	1.06% ↑	1.24% ↑	1.77% ↑
				KNN
1	0.8898	0.5177	0.4930	0.8898	0.8935	0.8891	0.8891	0.8794
2	0.6547	0.4723	0.6140	0.6465	0.6529	0.6387	0.6410	0.6438
3	0.6802	0.5942	0.6880	0.6703	0.6930	0.6702	0.7005	0.6925
4	0.9445	0.5269	0.1590	0.9436	0.9414	0.9419	0.9344	0.9261
5	0.8392	0.7890	0.8390	0.7659	0.8015	0.7768	0.7701	0.7349
6	0.6977	0.6584	0.5350	0.6747	0.6675	0.6586	0.6579	0.6098
7	0.8597	0.7315	0.5796	0.8620	0.8671	0.8691	0.8632	0.8672
8	0.9743	0.2733	0.2094	0.9671	0.9684	0.9659	0.9658	0.9176
9	0.8730	0.7087	0.8685	0.8671	0.8729	0.8605	0.8650	0.8714
10	0.7655	0.3996	0.7533	0.7527	0.7634	0.7577	0.7603	0.7476
11	0.9267	0.8927	0.8943	0.9081	0.9121	0.9266	0.9042	0.8905
12	0.9132	0.6035	0.6066	0.8998	0.8981	0.8959	0.8972	0.9027
13	0.8948	0.7087	0.8945	0.8700	0.8871	0.8791	0.8865	0.8845
14	0.6148	0.5105	0.6122	0.6079	0.6115	0.6145	0.6104	0.6112
15	0.7260	0.7254	0.7096	0.7064	0.7094	0.7009	0.7032	0.6994
Average	0.8169	0.6075	0.6304	0.8021	0.8093	0.8030	0.8033	0.7919
rate		34.47% ↑	29.59% ↑	1.84 % ↑	0.94% ↑	1.73% ↑	1.70% ↑	3.16% ↑
				SVM
1	0.8616	0.5751	0.5741	0.8573	0.8612	0.8613	0.8580	0.8194
2	0.6343	0.3989	0.5935	0.6231	0.6314	0.6047	0.6142	0.6241
3	0.7388	0.4890	0.7211	0.7270	0.7614	0.7538	0.7400	0.5212
4	0.9280	0.4741	0.1499	0.9169	0.9108	0.9150	0.9106	0.8723
5	0.6392	0.5834	0.6390	0.5834	0.5912	0.5618	0.5962	0.6231
6	0.5455	0.4307	0.3680	0.4737	0.4698	0.4718	0.4923	0.5298
7	0.6809	0.6021	0.4575	0.6636	0.6756	0.6573	0.6603	0.5643
8	0.9354	0.3234	0.3016	0.9129	0.9193	0.9146	0.9187	0.8759
9	0.8718	0.6420	0.8636	0.8672	0.8672	0.8549	0.8651	0.8831
10	0.7724	0.4580	0.7565	0.7461	0.7543	0.7453	0.7535	0.7478
11	0.9209	0.9075	0.9078	0.9092	0.9105	0.9201	0.9075	0.9052
12	0.9402	0.6625	0.6774	0.9267	0.9240	0.9218	0.9249	0.9324
13	0.8940	0.8001	0.8938	0.8371	0.8645	0.8367	0.8827	0.8560
14	0.6372	0.5436	0.6357	0.6281	0.6286	0.6228	0.6281	0.6405
15	0.6662	0.6641	0.6544	0.6533	0.6549	0.6522	0.6544	0.6248
Average	0.7778	0.5703	0.6129	0.75502	0.7621	0.7530	0.7604	0.7347
rate		36.38% ↑	26.90% ↑	3.01% ↑	2.05% ↑	3.29% ↑	2.27% ↑	5.87% ↑

Table 3. Joint distribution of classification results.

	${Prered}_{τ} (x) = d (x)$	${Prered}_{τ} (x) \neq d (x)$
${P r e r e d}_{τ^{'}} (x) = d (x)$	$ψ_{1}$	$ψ_{2}$
${P r e r e d}_{τ^{'}} (x) \neq d (x)$	$ψ_{3}$	$ψ_{4}$

Table 4. The comparisons of classification stabilities.

				CART
ID	NRS- $ϵ$	MARA	RARR	BFRS	AG	SEF	GAAR	NO REDUCT
1	0.8583	0.8663	0.8369	0.8580	0.8566	0.8538	0.8558	0.8551
2	0.7364	0.6794	0.7122	0.6981	0.7244	0.7130	0.7276	0.7130
3	0.7222	0.7453	0.7761	0.7546	0.7553	0.7431	0.7500	0.7495
4	0.9485	0.6265	0.7733	0.9423	0.9454	0.9480	0.9391	0.8747
5	0.8837	0.8279	0.8834	0.8370	0.8473	0.8432	0.8407	0.8518
6	0.6469	0.6557	0.6584	0.6690	0.6473	0.6397	0.6632	0.6543
7	0.9812	0.9334	0.8934	0.9797	0.9806	0.9809	0.9792	0.9611
8	0.9259	0.6674	0.9501	0.9141	0.9041	0.9088	0.9159	0.8837
9	0.9017	0.7772	0.9014	0.8727	0.8880	0.8906	0.8948	0.8752
10	0.8367	0.7041	0.8151	0.8359	0.8082	0.8269	0.8283	0.8078
11	0.9246	0.8853	0.8994	0.9204	0.9054	0.9014	0.9032	0.9056
12	0.7724	0.5217	0.6804	0.7532	0.7492	0.7601	0.7565	0.7134
13	0.9145	0.9144	0.9037	0.8680	0.8515	0.8331	0.8940	0.8827
14	0.6465	0.5331	0.6395	0.6402	0.6372	0.6441	0.6348	0.6250
15	0.6420	0.6316	0.6376	0.6383	0.6339	0.6412	0.6393	0.6377
Average	0.8228	0.7313	0.7974	0.8121	0.8090	0.8085	0.8148	0.7994
rate		12.51% ↑	3.18% ↑	1.31% ↑	1.71% ↑	1.76%↑	0.97% ↑	2.93% ↑
				KNN
1	0.9460	0.8825	0.8377	0.9427	0.9378	0.9428	0.9387	0.9186
2	0.8357	0.6380	0.8349	0.8155	0.8145	0.8253	0.8246	0.7841
3	0.7977	0.6518	0.8102	0.7494	0.7526	0.7656	0.7867	0.7591
4	0.9924	0.5949	0.9908	0.9737	0.9689	0.9771	0.9660	0.9234
5	0.9452	0.7945	0.9443	0.8743	0.8709	0.8532	0.8996	0.8830
6	0.8031	0.7612	0.7018	0.7555	0.7539	0.7597	0.7825	0.7597
7	0.9062	0.9357	0.9044	0.9024	0.8903	0.9065	0.9002	0.9065
8	0.9786	0.6906	0.8601	0.9663	0.9592	0.9678	0.9627	0.9122
9	0.9341	0.6991	0.9284	0.9102	0.9233	0.9296	0.9245	0.8927
10	0.8854	0.6707	0.8761	0.8749	0.8541	0.8823	0.8707	0.8449
11	0.9706	0.9705	0.9657	0.9358	0.9309	0.9396	0.9457	0.9514
12	0.9299	0.5301	0.7138	0.8746	0.8745	0.8935	0.8808	0.8140
13	0.9222	0.6058	0.9362	0.8632	0.8705	0.8662	0.8935	0.8511
14	0.7942	0.4714	0.7889	0.7682	0.7727	0.7934	0.7813	0.7386
15	0.8164	0.8574	0.8273	0.8067	0.8063	0.8250	0.8141	0.8206
Average	0.8972	0.7170	0.8614	0.8676	0.8653	0.8752	0.8781	0.8507
rate		25.14%↑	4.16%↑	3.41%↑	3.68% ↑	2.51% ↑	2.17% ↑	5.47% ↑
				SVM
1	0.9739	0.8679	0.8877	0.9733	0.9671	0.9698	0.9675	0.9439
2	0.8918	0.6581	0.8774	0.8771	0.8748	0.8852	0.8783	0.8490
3	0.8754	0.6134	0.9050	0.8575	0.8662	0.8668	0.8477	0.8331
4	0.9807	0.7282	0.7456	0.9749	0.9678	0.9797	0.9641	0.9059
5	0.7782	0.7557	0.7779	0.7476	0.7543	0.7576	0.7576	0.7613
6	0.7926	0.6862	0.7394	0.7609	0.7490	0.7519	0.7729	0.7504
7	0.9247	0.9992	1.0001	0.9311	0.9015	0.9376	0.9287	0.9461
8	0.9709	0.6467	0.9567	0.9501	0.9267	0.9478	0.9459	0.9064
9	0.9670	0.7753	0.9655	0.9536	0.9566	0.9646	0.9639	0.9352
10	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	0.9895
11	0.9975	1.0000	0.9995	0.9846	0.9945	0.9974	1.0000	0.9962
12	0.9694	0.5583	0.8165	0.9141	0.9117	0.9334	0.9225	0.8608
13	0.9338	0.9144	0.9587	0.8968	0.8978	0.8711	0.9314	0.9149
14	0.9245	1.0000	0.9240	0.9221	0.9208	0.9323	0.9262	0.9357
15	0.9619	0.9611	0.9455	0.9382	0.9260	0.9362	0.9421	0.9444
Average	0.9295	0.8110	0.9000	0.9121	0.9077	0.9154	0.9166	0.8989
rate		14.61% ↑	3.30% ↑	1.90%↑	2.41% ↑	1.54% ↑	1.41% ↑	3.42% ↑

Table 5. The elapsed time of all seven algorithms.

ID	NRS- $ϵ$	MARA	RARR	BFRS	AG	SEF	GAAR
1	429.7783	0.8807	6.6227	57.0501	7.2915	6.5293	26.3625
2	2.4335	0.6513	0.8982	0.9413	0.1933	0.2173	0.3533
3	21.0802	0.3232	0.7559	1.9937	0.2217	0.1986	0.6452
4	7.0519	1.3462	4.2944	7.0359	1.4769	1.1096	1.9095
5	14.0703	0.2167	5.3504	11.7332	1.9639	1.8331	3.4963
6	122.1212	6.9838	421.1056	154.8219	31.4599	33.3661	54.0532
7	233.6055	81.7964	21.1919	80.4803	20.2694	18.389	35.7563
8	36.3329	0.5928	28.5184	66.8708	11.0547	9.1725	15.1738
9	70.3101	5.7716	532.7057	286.4753	49.7015	38.066	73.976
10	7.9717	0.6485	1.8745	9.1748	1.5362	1.1173	2.0991
11	2.3576	0.5797	0.6115	1.0526	0.2311	0.2115	0.3641
12	18.2902	0.5314	6.0787	43.3079	8.0851	5.6702	11.0026
13	167.9126	15.3875	1605.0476	1466.822	254.7792	157.5132	327.6303
14	8.5942	0.6286	0.7663	3.7765	0.5923	0.5887	0.9772
15	345.9941	10.34	1946.5721	1403.8344	207.5309	167.8453	310.1552
Average	99.1936	8.4452	305.4930	239.6914	39.7592	29.4552	57.5970
rate		1001% ↑	97.23% ↓	27.45% ↑	503.1 % ↑	34.98% ↑	48.86 % ↓

Table 6. The stabilities of all seven algorithms.

ID	NRS- $ϵ$	MARA	RARR	BFRS	AG	SEF	GAAR
1	0.6587	0.1275	0.4888	0.1535	0.1232	0.2059	0.6033
2	0.5761	0.6308	0.9277	0.2903	0.2941	0.3647	0.5004
3	0.8504	0.2051	0.9506	0.6211	0.4254	0.6466	0.9209
4	0.9271	0.4013	0.9224	0.7869	0.8033	0.9356	0.898
5	0.8246	1.0000	0.9045	0.7659	0.7281	0.8687	0.8007
6	0.9006	0.1498	0.6007	0.5958	0.5818	0.7154	0.6374
7	0.5577	0.4933	0.7277	0.2996	0.1764	0.3813	0.5346
8	0.9153	0.3103	1.0000	0.8698	0.6712	0.8967	0.7945
9	0.8959	1.0000	1.0000	0.7955	0.7183	0.8788	0.8257
10	0.8246	0.4051	0.8588	0.8033	0.7261	0.8280	0.815
11	0.6606	0.0546	0.3773	0.5284	0.3484	0.3265	0.5815
12	0.9146	0.2001	0.9917	0.8328	0.8259	0.9281	0.8708
13	0.9054	0.3037	1.0000	0.7155	0.5962	0.7672	0.7502
14	0.7816	0.1602	0.8678	0.6017	0.6038	0.7438	0.7165
15	0.8608	0.6001	0.4011	0.7882	0.6549	0.7723	0.7442
Average	0.8034	0.4014	0.8012	0.6299	0.5518	0.6840	0.7329
rate		100.2% ↑	49.89% ↓	27.19% ↑	14.15% ↑	19.323% ↓	6.677% ↓

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fan, Z.; Chen, J.; Cui, H.; Song, J.; Xu, T. Optimizing Attribute Reduction in Multi-Granularity Data through a Hybrid Supervised–Unsupervised Model. Mathematics 2024, 12, 1434. https://doi.org/10.3390/math12101434

AMA Style

Fan Z, Chen J, Cui H, Song J, Xu T. Optimizing Attribute Reduction in Multi-Granularity Data through a Hybrid Supervised–Unsupervised Model. Mathematics. 2024; 12(10):1434. https://doi.org/10.3390/math12101434

Chicago/Turabian Style

Fan, Zeyuan, Jianjun Chen, Hongyang Cui, Jingjing Song, and Taihua Xu. 2024. "Optimizing Attribute Reduction in Multi-Granularity Data through a Hybrid Supervised–Unsupervised Model" Mathematics 12, no. 10: 1434. https://doi.org/10.3390/math12101434

APA Style

Fan, Z., Chen, J., Cui, H., Song, J., & Xu, T. (2024). Optimizing Attribute Reduction in Multi-Granularity Data through a Hybrid Supervised–Unsupervised Model. Mathematics, 12(10), 1434. https://doi.org/10.3390/math12101434

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Attribute Reduction in Multi-Granularity Data through a Hybrid Supervised–Unsupervised Model

Abstract

1. Introduction

2. Preliminaries

2.1. Neighborhood Rough Sets

2.2. Multi-Granularity Rough Sets

2.3. Multi-Granularity Neighborhood Rough Sets

2.4. Supervised Attribute Reduction

2.5. Unsupervised Attribute Reduction

3. Proposed Method

3.1. Definition of Multi-Granularity Neighborhood Information Gain Ratio

3.2. Detailed Algorithm

4. Experimental Analysis

4.1. Dataset Description

4.2. Experimental Configuration

4.3. Comparison of Classification Accuracy

4.4. Comparison of Classification Stability

4.5. Comparisons of Elapsed Time

4.6. Comparison of Attribute Dimensionality Reduction Stability

5. Conclusions and Future Expectations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI