Next Article in Journal
Improving the Performance and Stability of TIC and ICE
Next Article in Special Issue
Change-Point Detection for Multi-Way Tensor-Based Frameworks
Previous Article in Journal
Compressive Sensing via Variational Bayesian Inference under Two Widely Used Priors: Modeling, Comparison and Discussion
Previous Article in Special Issue
Bayesian Analysis of Tweedie Compound Poisson Partial Linear Mixed Models with Nonignorable Missing Response and Covariates
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy

School of Computer Science and Technology, Anhui University of Technology, Ma’anshan 243032, China
Entropy 2023, 25(3), 510; https://doi.org/10.3390/e25030510
Submission received: 6 February 2023 / Revised: 12 March 2023 / Accepted: 14 March 2023 / Published: 16 March 2023
(This article belongs to the Special Issue Statistical Methods for Modeling High-Dimensional and Complex Data)

Abstract

:
Since the Fuzzy C-Means algorithm is incapable of considering the influence of different features and exponential constraints on high-dimensional and complex data, a fuzzy clustering algorithm based on non-Euclidean distance combining feature weights and entropy weights is proposed. The proposed algorithm is based on the Fuzzy C-Means soft clustering algorithm to deal with high-dimensional and complex data. The objective function of the new algorithm is modified with the help of two different entropy terms and a non-Euclidean way of computing the distance. The distance calculation formula enhances the efficiency of extracting the contribution of different features. The first entropy term helps to minimize the clusters’ dispersion and maximize the negative entropy to control the clustering process, which also promotes the association between the samples. The second entropy term helps to control the weights of features since different features have different weights in the clustering process. Experiments on real-world datasets indicate that the proposed algorithm gives better clustering results than other algorithms. The experiments demonstrate the proposed algorithm’s robustness by analyzing the parameters’ sensitivity and comparing the computational distance formulas. In summary, the improved algorithm improves classification performance under noisy interference and high-dimensional datasets, increases computational efficiency, performs well in real-world high-dimensional datasets, and encourages the development of robust noise-resistant high-dimensional fuzzy clustering algorithms.

1. Introduction

In the field of machine learning and data mining, research on clustering has always attracted extensive attention [1,2,3,4]. Clustering methods are primarily categorized as partition-based, density-based, and hierarchical clustering methods [5,6,7]. Partition-based clustering methods classify different samples on the basis of features. In general, density-based clustering methods classify samples on the basis of the number of samples at each location. The samples are considered related (contained and included) and classified by the hierarchy of different samples in hierarchical clustering methods. Hierarchical clustering is suitable for small datasets but not large datasets. Partition-based clustering is one of the most used clustering methods for large dataset. Density-based clustering divides data points into high-density and low-density regions and is suitable for large datasets. In recent years, three-way soft clustering has been a new direction in clustering research, which attributes samples in the positive region as belonging to the cluster, samples in the boundary region as partially belonging to the cluster, and samples in the negative region as not belonging to the cluster [8,9,10]. Clustering divides the objects in the set into different clusters based on a certain standard (distance) to increase the intra-cluster similarity and reduce the inter-cluster similarity. Clustering algorithms are divided into soft and hard clustering on the basis of clustering research. Hard clustering specifies that a sample can be divided into only one cluster, while soft clustering allows a sample to be divided into different clusters. For two primary clustering methods, K-means and FCM clustering methods are the most widely used [11,12]. These two algorithms have driven the research on clustering, and much research has been performed to improve them. However, noise and initial cluster centers often influence the clustering results. Moreover, the clustering result is often significantly reduced when the clustering algorithm faces high-dimensional and complex data. A good clustering algorithm is supposed to be high-performance, robust, and scalable.
To resolve these difficulties, several improved clustering algorithms have been proposed [13,14,15,16]. The K-means method divides each sample into a specific cluster. However, these clusters often have overlapping and fuzzy divisions in practical applications. To solve the uncertain data objects, soft clustering algorithms are introduced. Bezdek [17] introduced the fuzzy set theory into K-means and proposed the FCM algorithm, which uses the membership function to verify the membership relationship between objects and clusters. Furthermore, it has demonstrated efficient performance in different applications [18,19]. However, the degree of the membership function does not always correspond to the cluster to which it belongs. The FCM algorithm is partition-based. The advantage of the partition-based clustering method is that the convergence is fast, and the disadvantage is that it requires that the number of clusters can be reasonably estimated and that the choice of initial centers and noise can have a significant impact on the clustering results. In these two traditional methods, all features are given the same weight and are easily influenced by noise [20,21]. These methods are also susceptible to random initial cluster centers, and poor initialization will likely result in local optimal solution generation [22,23]. In the face of high-dimensional and complex data, high-dimensional data are usually very sparse in space, and the sample size always seems very small compared with the dimensionality of the space. The features of clusters are not obvious. Traditional clustering algorithms cannot guarantee robustness; hence, it is extremely important to fully use the features’ properties.
To address these issues, it is critical to remember that different weights should be assigned to different features. In previous studies, the weights corresponding to the features were assigned in one of two ways. The first method is to assign weights to the features on a global scale. Throughout the process, a specific feature is given only one weight. The second method is to assign features local weights, which means that features in a dataset have different weights in different clusters. Numerous studies have demonstrated that the second method outperforms global weighting [24]. Therefore, the SCAD algorithm considers different feature weights in the different clusters and simultaneously clusters the data with feature discrimination [25]. However, the conventional FCM algorithms, including improved algorithms, constrain related variables through exponential regularization, which may lead to consistent results and low precision when dealing with sparse and noisy data (the denominator is 0). Entropy is proposed to go for better-constrained features in the clustering process. Entropy-Weighting K-Means is particularly prominent method among local weight-based and entropy-weight algorithms [26]. EWKM pioneered the form of entropy weight and applied it to membership to better constrain the objective function. The algorithm introduced the form of entropy weight in response to this defect to incentivize more features that contribute to identifying clusters. The newly introduced approach is more efficient in dealing with various data types, particularly high-dimensional sparse data. Entropy weights can be used to assign relevant feature weights to a cluster, whereas fuzzy partitions can help identify the best partitions. This method, however, ignores the limitations of Euclidean distance and may result in inconsistent classification [24,27]. Moreover, the weights of this algorithm are very dependent on the initial clustering centers and are sensitive to changes related to the centers. If the initial cluster center varies, the algorithm will not be robust and will not converge. Therefore, improving the clustering algorithm should focus on these aspects. To address the effect of noise and features with different weights in different clusters, certainly improved algorithms for classification criteria based on non-Euclidean distances have been proposed [28,29,30]. Some algorithms using entropy weights have also been proposed in unsupervised clustering studies. Both Entropy K-Means [31] and U-K-Means [32] algorithms use feature weights and entropy weights in unsupervised learning. However, the distance formula used in their objective functions is Euclidean distance, which does not perform well in the face of high-dimensional and noisy data. This can eventually lead to noise and irrelevant features influencing the overall clustering results. In recent years, there has been extensive interest in research on feature weights and entropy weights for fuzzy clustering. However, these algorithms frequently only consider the Euclidean distance and do not constrain features or use fuzzy partitioning [33,34,35].
This paper proposes an advanced FCM algorithm to handle high-dimensional sparse and noisy data to solve the flaws mentioned above. The proposed algorithm’s new findings use entropy control features and partitions, and adding non-Euclidean distance division eliminates noise interference. Moreover, the entropy weight is introduced to the membership and weight variables to enhance its efficiency in processing different datasets. In improving clustering in the face of high-dimensional and complex data, the intra-cluster dispersion is minimized, and the negative weight entropy is maximized to motivate features to help identify clusters. Furthermore, the proposed method updates the membership degrees of samples in different clusters and the feature weights in different clusters during each iteration so that the objective function converges rapidly. This algorithm avoids this issue by efficiently handling high-dimensional data and noise with feature weights and non-Euclidean distance formulas. Furthermore, entropy weights are used to constrain variables, which can be more advantageous than exponential constraints in some cases. Extensive experimental results on real datasets suggest that the proposed algorithm performs better in clustering. The proposed algorithms on high-dimensional and complex datasets also exhibit high performance. Furthermore, the algorithm exhibits robustness and stability in the experiment.
The remainder of the paper comprises the following sections: Section 2 introduces several of the most classic clustering methods. In Section 3, the proposed algorithm, its convergence proof, and its complexity analysis are provided. The performance of the proposed algorithm and other clustering algorithms is compared and evaluated using different clustering metrics in Section 4. Lastly, Section 5 presents a summary of the paper.

2. Related Work

2.1. The K-Means Algorithms

For a given dataset X N × M , N denotes the number of samples, M denotes the number of features, K represents the number of clusters, x i j denotes the j th feature in the i th sample, c i j denotes the cluster center of the j th feature in the i th cluster, and u i j denotes whether the i th sample belongs to the k th cluster. ( x i j c l j ) 2 is the Euclidean distance between the i th sample and the j th cluster at the l th feature. The objective function can be defined as
P ( U , C ) = i = 1 N j = 1 M l = 1 K u i l ( x i l c l j ) 2 ,
subject to
l = 1 K u i l = 1 .
The K-means can be minimized by continuously iterating the following equations:
u i l = { 1 , j = 1 M ( x i j z l j ) 2 j = 1 M ( x i j z t j ) 2 , 1 t K , 0 , e l s e .
c l j = i = 1 N u i l x i j i = 1 N u i l ,
where z l j denotes the value of the j th feature in the l th clustering center, and t denotes all the clustering centers in the clustering process.
The proposed K-means algorithm has significantly contributed to the study of clustering. However, when noise is encountered, the noise dimension is also considered in the results while computing the distance between the samples, leading to decreased clustering accuracy. Furthermore, as the dimensionality of the dataset increases, so does the number of outlier points and the dispersion between samples, affecting the clustering center and changes the clustering results. K-means clustering is often inefficient when dealing with high-dimensional sparse and noisy data. Moreover, it is susceptible to the initial clustering center.

2.2. The Weighting K-Means Algorithms

WK-Means generalizes the K-means and introduces a new algorithm to solve the noise data [30] skillfully. It considers that different features must have different weights so that the effect of the noise dimension can be ignored as much as possible when evaluating the distance between samples. Therefore, noise far from the cluster centroid is given a smaller weight and has less influence on the cluster centroid. Therefore, the clustering accuracy is improved. The objective function of WK-Means is as follows:
P ( U , C , W ) = i = 1 N j = 1 M l = 1 K u i l w j β ( x i l c l j ) 2 ,
subject to
{ l = 1 K u i l = 1 , j = 1 M w j = 1 .
The WK-means can be minimized by continuously iterating the following equations:
u i l = { 1 , w j β j = 1 M ( x i j c l j ) w j β j = 1 M ( x i j c t j ) 2 , 1 t K , 0 , e l s e .
c l j = i = 1 N u i l x i j i = 1 N u i l .
w j = 1 t = 1 M [ D j D t ] 1 β 1 , β > 1 o r β 0 ,
where
D j = l = 1 K i = 1 N u i l ( x i j c l j ) 2 .
In Equation (5), w i j indicates whether the i th sample belongs to the k th cluster, w j denotes the weight of the j th feature, and β represents a fuzzy constant, usually taken as 2.
Although the WK-means algorithm was the first to use weights, targeting global weights performs poorly in some data, and hard clustering may result in results being fixed. It can be deduced that the weight difference between features is not always visible when the dataset is high-dimensional. The FCM algorithm based on fuzzy ideas significantly reduces the singularity of clustering and has several advantages in terms of high-latitude data.

2.3. Fuzzy C-Means Algorithm

As the most representative soft clustering method, FCM defines a new method for dividing the clusters. Each sample has a different degree of membership for each cluster, and its cluster assignment is determined by the degree of membership. The objective function can be defined as
P ( U , C ) = i = 1 N j = 1 M l = 1 K u i l m ( x i l c l j ) 2 ,
subject to
l = 1 K u i l = 1 .
The FCM is minimized by continuously iterating the following equations:
c i = i = 1 N u i j m x i j i = 1 N u i j m .
u i j = [ t = 1 K ( ( x i j c l j ) 2 ( x i j c t j ) 2 ) 2 m 1 ] 1 ,
where u i j represents the degree of the i th data’s membership to the j th cluster, and m is the number of fuzzy factors.
Although the FCM algorithm is the most representative soft clustering algorithm, it still has numerous flaws and shortcomings. For example, the algorithm only counts the “nearest” neighbor samples, and the number of samples in a category is very large. The algorithm prioritizes that sample, which affects the clustering results. Moreover, fuzzy factors as exponential forms often do not constrain the features well, leading to the fact that it tends to perform much worse when combined with the Euclidean distance formula.

3. The Proposed Algorithm

In this section, a novel Fuzzy-C-Means-based entropy weighting algorithm is proposed. Motivated by the shortcomings of traditional Fuzzy-C-Means clustering algorithm, a new algorithm is presented that includes local feature weighting, the use of entropy weights acting on features, and the degree of membership to improve clustering’s sensitivity and accuracy to random class centers. Furthermore, because the Euclidean distance is susceptible to noise and outliers [29], a non-Euclidean distance is introduced. The new distance formula makes the algorithm more robust and fully uses the dataset’s features to get the clustering result. The objective function can be defined as
F ( U , C , W ) = i = 1 N j = 1 K l = 1 M u i j w j l ( 1 exp ( δ l ( x i l c j l ) 2 ) + λ i = 1 N j = 1 K u i j log u i j + γ j = 1 K l = 1 M w j l log w j l .
In Equation (15), U = [ u i j ] is an N by K matrix, in which u i j denotes the degree of the i th sample’s membership to the center of the j th cluster; C = [ c j l ] is a K by M matrix, where c j l represents the center of the j th cluster and is defined by u i j . Moreover, W = [ w j l ] is a K by M matrix, where w j l denotes the weight of the l th feature in the j th cluster. U is the membership matrix of each sample to cluster, containing N samples and K clusters. C is the feature center matrix of each cluster, containing K clusters and M features. W is the feature weight matrix of each sample to cluster, containing K clusters and M features. The term ( 1 exp ( δ l ( x i l c j l ) 2 ) denotes a non-Euclidean distance metric between the i th sample and the j th cluster in the l th feature and is defined as follows:
δ l = 1 var l , var l = i = 1 N ( x i j x l ¯ ) 2 N , x l ¯ = i = 1 N x i l N ,
where δ l denotes the inverse of the variance of the l th feature of the data,
subject to
{ j = 1 K u i j = 1 , u i j ( 0 , 1 ] , 1 i N , l = 1 M w j l = 1 , w j l ( 0 , 1 ] , 1 j K .
Minimizing F in Equation (15) with the constraints forms a class of constrained nonlinear optimization problems. The usual approach toward optimization of F is to introduce partial optimization for U , C , and W . First, U and C are fixed, and the reduced F is minimized with respect to W . Next, U and W are fixed, and the reduced F is minimized with respect to C . Followed by this, W and C are fixed, and the reduced F is minimized to solve U . After the results are calculated iteratively, the solution can be drawn.
The Lagrange multiplier technique is used to solve the following unconstrained minimization problem:
F ( U , C , W ) = i = 1 N j = 1 K l = 1 M u i j w j l ( 1 exp ( δ l ( x i l c j l ) 2 ) + λ i = 1 N j = 1 K u i j log u i j + γ j = 1 K l = 1 M w j l log w j l α ( j = 1 K u i j 1 ) β ( l = 1 M w j l 1 ) ,
where α and β are Lagrange multipliers. By setting the gradient of F with respect to α , β , u i j , c j l , and w j l to zero,
F α = ( j = 1 K u i j 1 ) = 0 .
F β = ( l = 1 M w j l 1 ) = 0 .
F u i j = l = 1 M w j l ( 1 exp ( δ l ( x i l c j l ) 2 ) + λ ( 1 + log u i j ) α = 0 .
F w j l = i = 1 N u i j ( 1 exp ( δ l ( x i l c j l ) 2 ) + γ ( 1 + log w j l ) β = 0 .
From Equations (21) and (22),
u i j = exp ( α λ ) exp ( D j l λ ) exp ( 1 ) ,
w j l = exp ( β γ ) exp ( D i j γ ) exp ( 1 ) ,
where
D i j = l = 1 M w j l ( 1 exp ( δ l ( x i l c j l ) 2 ) .
D i j = l = 1 M u i j ( 1 exp ( δ l ( x i l c j l ) 2 ) .
From Equations (19) and (23),
t = 1 K u i t = 1 = t = 1 K exp ( α λ ) exp ( D t l λ ) exp ( 1 ) ,
where it follows that
exp ( α λ ) = 1 t = 1 K exp ( D t l λ ) exp ( 1 ) ,
which can be substituted into Equation (23),
u i j = exp ( D j l λ ) t = 1 K exp ( D t l λ ) .
From Equations (20) and (24),
t = 1 M w j t = 1 = t = 1 M exp ( β γ ) exp ( D t l γ ) exp ( 1 ) ,
where it follows that
exp ( β γ ) = 1 t = 1 M exp ( D t l γ ) exp ( 1 ) ,
which can be substituted into Equation (24),
w j l = exp ( D i j γ ) t = 1 M exp ( D i t γ ) .
For the clustering centers,
F c j l = i = 1 N u i j w j l ( 1 exp ( δ l ( x i l c j l ) 2 ) c j l = 0 ,
where it follows that
i = 1 N u i j w j l 2 δ l ( x i l c j l ) exp ( δ l ( x i l c j l ) 2 ) = 0 ,
which gives
c j l = i = 1 N u i j w j l δ l exp ( δ l ( x i l c j l ) 2 ) x i l i = 1 N u i j w j l δ l exp ( δ l ( x i l c j l ) 2 ) .
It is evident that Equation (35) is independent of the parameters λ and γ . The interdependence of both terms promotes the detection of a better partition during the clustering process. The proposed algorithm minimizes Equation (15), using Equations (29), (32), and (35).

3.1. Parameter Selection

The values of λ and γ are essential for the proposed algorithm since they affect the significance of the second and third terms in Equation (15) relative to the first term. Initially, λ plays two roles in the clustering process; when λ is large, this results in a smaller value of u i j in Equation (29) and, thus, the second term has a more significant influence to minimize Equation (15). Therefore, it tries to assign more than one sample cluster to make the second term more negative while clustering. The membership entropy value becomes larger when the membership value u i j of a sample to all clusters is equal. After the position of the samples is fixed, all the clustering centers move to the same position for an enormous entropy value. Second, when λ is large, this results in an immense value of u i j in Equation (29). Therefore, the first term plays a key role in minimizing Equation (15). The local feature weights are controlled by γ . Since γ is positive, the value of w j l is inversely proportional to i = 1 M D i j . A smaller value of this term results in a larger w j l . If γ is large, the third parameter controls the partitions, and all feature weights are assigned 1 / M in different clusters.
Assuming that F t denotes the value of Equation (15) after the run, F t + 1 denotes the value after the next completion. The proposed algorithm is summarized below (Algorithm 1).
Algorithm 1. Proposed Clustering Algorithm.
Input: Dataset X N × M , the number of clusters K, the values of parameters λ and γ . Randomly set K cluster centers, generate a set of initial weights, set t = 0, the maximum iteration is M A X , and local minimum v a l u e = F t + 1 F t .
Output:  U = [ u i j ] N × K .
Repeat
1: Compute the non-Euclidean distance matrix D N × M .
2: Update the partition matrix U N × K using Equation (29).
3: Update the weight matrix W K × M using Equation (32).
4: Update the cluster center matrix C K × M using Equation (35).
5: Until the objective function is less than or equal to the local minimum or reaches the maximum iterations.

3.2. Convergence Analysis

It is important to note that the proposed algorithm will converge in the iterations. It can be observed that different partition U occurs only once during the algorithm process. Therefore, it is assumed that U i = U j where i j . It must be noted that given U t . The minimizer W t should be calculated. For U i and U j , the minimizers are W i and W j , respectively. It is evident that W i = W j since U i = U j . Using U i , W i , U j , and W j , the minimizers C i and C j can be calculated, respectively. It is obvious that C i = C j . Therefore, the following equation can be obtained:
F ( U i , C i , W i ) = F ( U j , C j , W j ) .
However, the function F ( , , ) monotonically decreases. Therefore, different partition U occurs only once during the algorithm process. u i j in Equation (29) can be calculated after taking the derivative of Equation (18) and setting it equal to zero. Therefore, u i j can be minimal or maximal. If the second partial derivative of Equation (18) is positive, it can prove that u i j defined by Equation (29) is a local minimum of Equation (18). The second partial derivative of Equation (18) with respect to u i j is
l = 1 M λ u i j .
Since u i j > 0 , λ > 0 , it can prove that Equation (37) is positive. Therefore, K in Equation (29) is a local minimum of Equation (18). The proposed algorithm converges in a finite number of iterations.

3.3. Computational Complexity

As shown in Table 1, the computational complexity of the proposed algorithm is high when compared to other clustering algorithms. However, by adding new terms, the clustering performance is improved. The computational complexity of the algorithm is based on four update processes: updating the distance matrix D , cluster center matrix C , membership matrix U , and weight matrix W . The computational complexity of each process is equal to N K M . Each process executes independently; hence, the total computational complexity is 4 N K M . Furthermore, N denotes the number of samples, K denotes the number of clusters, and M denotes the number of features. Each iteration updates D , C , U , and W using Equations (29), (32), and (35) and finally classifies the different samples according to the matrix U .

4. Experiments

In the experimental section, to test the performance of the proposed algorithm, the performance of other clustering algorithms is evaluated and compared on real-world datasets. These algorithms are the standard K-Means [11], the standard FCM [12], WK-Means [30], RLWHCM [28], SCAD [25], EWK-Means [31], and UK-Means [32]. These algorithms have specific standard parameters. If the parameter values are uniform, the influence of the parameters can be removed to observe the clustering results. Therefore, various parameters were equalized to avoid inconsistency in algorithm performance. In the experiments, the maximum number of restarts was set to 100. For λ = 0.3 and γ = 1.4 , the clustering centroids were randomly selected from the original datasets. In practical applications, choosing the appropriate threshold v a l u e is a crucial issue. If the threshold is too small, the algorithm may converge very slowly or not even converge; if the threshold is too large, the algorithm may stop prematurely, resulting in less accurate clustering results. Therefore, experiments and adjustments are needed to determine an appropriate threshold size. Considering that the threshold v a l u e is not the same for different datasets, to get the best performance, we reduced from 0.1 to 0.00001 by 10 times in each case to get the best clustering result. After testing, v a l u e = 10 5 was set to get accurate clustering results for different datasets. Each algorithm was iterated 100 times, and the best result was recorded.
Seven real-world datasets from UCI [36] were used to assess the performance of the proposed approach and compare its results to other approaches. Furthermore, the text, face image, and biological datasets [37] highlight the proposed algorithm’s performance in high-dimensional and noisy datasets. These datasets are mentioned in Table 2. The best clustered values for each dataset in Table 3 and Table 4 are bolded.

4.1. Evaluation Indicators

The clustering accuracy is defined as
A C C = l = 1 K D l N .
In Equation (38), D l denotes the number of samples correctly classified into the l th cluster, and N represents the number of points in the dataset. A considerable value of A C C [38] suggests a better clustering performance.
Since the sample labels of the real-world dataset are known, R I [38] was used to evaluate the similarity between the clustering partitions and real partitions.
R I = f 1 + f 3 f 1 + f 2 + f 3 + f 4 .
In Equation (39), f 1 indicates the number of similar sample points belonging to a common partition, f 2 indicates the number of non-similar samples belonging to a common partition, f 3 indicates the number of non-similar samples in two separate partitions, and f 4 indicates the number of common samples belonging to two different partitions. Larger values suggest better classification results.
N M I is often used in clustering to compute the similarity between the clustering results and the real label of the dataset. The measurement method is
N M I ( A , B ) = i = 1 I j = 1 J P ( i , j ) log P ( i , j ) P ( i ) P ( j ) H ( A ) H ( B ) .
In Equation (40), A and B are the two partitions in the dataset comprising clusters I and J , respectively. P ( i ) indicates the probability that a randomly selected sample is allocated to a cluster A i , P ( i , j ) represents the probability that a sample belongs to both clusters A i and B i , and H ( A ) is the entropy associated with all the probabilities P ( i ) in partition A [39]. A larger value of N M I leads to more consistent clustering results.

4.2. Clustering Results on Real-World Datasets

As demonstrated in Table 3 and Table 4, the clustering performance of the proposed algorithm in terms of ACC, RI, and NMI was much better on different real-world datasets than other clustering methods. The clustering results suggest that the proposed algorithm significantly improved the clustering performance and provided the best results in most real-world and txt datasets. It should also be noted that the proposed algorithm’s performance was not exceptional in the Wine dataset. After careful examination of this dataset, it was discovered that there were only three clusters in this dataset, and there was little variation in the values of the features. The value of a feature in different clusters may differ by as little as 0.1, which negatively impacts any clustering algorithm and leads to poor performance of entropy-weight terms and non-Euclidean distances. Most classical clustering algorithms had ACC and NMI values of nearly 0.5 in the high-dimensional datasets because the number of clusters was two, indicating that these algorithms failed due to high dimensionality and noise. The reason is that there are often many outliers in high-dimensional data, and the data distribution could be sparser. Traditional clustering algorithms have difficulty clustering these sparse data. As the dimensionality increases, the calculation based on distance makes the clustering centers progressively more complex and is affected by outliers leading to shifts. At the same time, because the different feature weights of different clusters are not considered, the feature of sparse high-dimensional data cannot be exploited, making the clustering results much less accurate. However, using the entropy constraint feature in the proposed algorithm improves the algorithm’s performance in three performance metrics. On the other hand, the clustering algorithm with exponential constraints equalizes sample membership degrees, resulting in an inaccurate ACC value of 0.5. The results also show that the proposed algorithm has the advantage of assisting in the detection of noise and the main classification features in large datasets. The clustering results of the proposed algorithm performed better in the Face Image Dataset and Biological Dataset. This shows that, with random initialization of clustering centers, our algorithm was better and more stable in handling high and noisy datasets. From the clustering results, we can find that the K-Means algorithm performed better in the face dataset because it did not introduce the feature weights, which could lead to unsatisfactory results for high-dimensional data and when the number of clusters is large. Furthermore, this demonstrates that the algorithm can be used for classification in both supervised and unsupervised learning.
Numerous points justify the performance of the proposed algorithm. First, this algorithm has good performance even with terrible initial centers, while the other algorithms are severely sensitive to initialization. Second, the introduction of entropy weighting allows different features to be well added to the clustering process. Third, the non-Euclidean distance makes it possible to encounter noisy and sparse data in the calculation and does not affect the clustering results.
The value range of the parameters was discussed in detail in the previous section. To further examine the sensitivity of λ and γ , the algorithm’s sensitivity was analyzed on the Iris and Zoo datasets. λ = 0.3 was fixed, increasing γ from 0 by 0.1 to 2 each time. The sensitivity of λ can be inferred from Figure 1. After that, γ = 0.8 was fixed, increasing λ from 0 by 0.1 to 2 each time. The sensitivity of γ can be observed in Figure 2. It can be observed from the figures that ACC, RI, and NMI did not fluctuate much when the two parameters were varied, which highlights the excellent performance and robustness of our algorithm. Higher values for the three metrics also directly reflect the proposed algorithm’s efficiency and robustness. The ACC changes of the proposed algorithm were analyzed on the basis of the non-Euclidean distance and the Euclidean distance tested on the Iris and Zoo datasets, to better demonstrate this advancement of the proposed non-Euclidean distance in computing sample point distances. Figure 3 shows that clustering results based on non-Euclidean distances were more accurate than those based on Euclidean distances. In the non-Euclidean distance, the average ACC values were 0.95 and 0.93, respectively, while in the Euclidean distance, the average ACC values were 0.79 and 0.80, respectively. The new distance formula increased the ACC of the algorithm by 18%, which also indicates the advantage of the newly proposed non-Euclidean distance in dealing with high-dimensional sparse and noisy data. Furthermore, the ACC’s variance of non-Euclidean distance was 0.025, while the ACC’s variance of Euclidean distance was 0.074, indicating that the non-Euclidean distance makes the algorithm more robust. To better distinguish the algorithms in the experiment, Table 5 reflects the usage conditions associated with the compared algorithms. The experimental results show that the proposed algorithm had high clustering accuracy in most real-world datasets from various domains. The sensitivity analysis of the two parameters could be determined, proving that the algorithm is robust and performs well. Furthermore, when comparing the Euclidean distance and the non-Euclidean distance on the objective function, we can find that the non-Euclidean distance was more accurate and stable in clustering results.

4.3. Discussion of Noise

To demonstrate the performance of the proposed algorithm under the influence of noise, a new experiment was designed for the Iris dataset. In the Iris dataset, uniformly distributed data from 0 to 1 were randomly assigned as new noise features. Furthermore, to compare the effect with and without noise, we compared experiments for the original dataset and the dataset with noise. As shown in Figure 4, the noise only slightly affected the clustering results. This shows that the performance of the proposed algorithm remained good even though the noise affected the dataset. Moreover, as shown in Figure 5, the noise did not influence the assignment of feature weights. This confirms the high accuracy and stability of the algorithm.

5. Conclusions

This paper proposed a new algorithm for classifying high-dimensional and noisy data on the basis of non-Euclidean distance, combining feature weights and entropy weights. In this approach, two different entropy terms are added to the objective function, which helps better identify the clustering features of complex data. The performance was compared with state-of-the-art methods in terms of different clustering measures, revealing that the proposed approach is a new clustering algorithm that can partition data with improved performance. Considering the nature of the proposed algorithm and the results of extensive experiments on various datasets, it can be applied to medical research and textual information, facilitating the extraction of critical features, and obtaining clustering results in high-dimensional and complex data conditions. The proposed algorithm significantly improves on the following aspects:
(1)
The clustering result is consistent and stable, as it is not susceptible to the original cluster centers and assigns different feature weights to each cluster in the clustering process.
(2)
The entropy weights improve the algorithm’s handling of partitioning during the clustering process and highlight the importance of distinguishing different features.
(3)
The introduction of non-Euclidean distance makes the algorithm more robust and efficient in handling high-dimensional sparse and noisy data in the real world.
(4)
The insensitivity to parameter changes ensures the flexibility of the algorithm.
In the future, EM and Gaussian mixture models will be used to improve the clustering algorithm, making it more useful in image processing.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wang, G.; Francis, E.H. Data Mining: Concept, Aplications and Techniques. ASEAN J. Sci. Technol. Dev. 2000, 17, 77–86. [Google Scholar] [CrossRef] [Green Version]
  2. Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 1999, 10, 988–999. [Google Scholar] [CrossRef] [Green Version]
  3. Patel, K.M.A.; Thakral, P. The best clustering algorithms in data mining. In Proceedings of the 2016 International Conference on Communication and Signal Processing (ICCSP), IEEE, Melmaruvathur, India, 6–8 April 2016; pp. 2042–2046. [Google Scholar]
  4. Jiang, H.; Wang, G. Spatial equilibrium of housing provident fund in China based on data mining cluster analysis. Int. J. Wireless Mobile Comput. 2016, 10, 138–147. [Google Scholar] [CrossRef]
  5. Al-Dabooni, S.; Wunsch, D. Model order reduction based on agglomerative hierarchical clustering. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 1881–1895. [Google Scholar] [CrossRef]
  6. Thrun, M.C.; Ultsch, A. Using projection-based clustering to find distance-and density-based clusters in high-dimensional data. J. Classif. 2021, 38, 280–312. [Google Scholar] [CrossRef]
  7. Öner, Y.; Bulut, H. A robust EM clustering approach: ROBEM. Commun. Stat.-Theory Methods 2021, 50, 4587–4605. [Google Scholar] [CrossRef]
  8. Yu, H.; Wang, Y. others Three-Way Decisions Method for Overlapping Clustering. In Proceedings of the RSCTC; Springer: Berlin/Heidelberg, Germany, 2012; pp. 277–286. [Google Scholar]
  9. Du, M.; Zhao, J.; Sun, J.; Dong, Y. M3W: Multistep Three-Way Clustering. IEEE Trans. Neural Netw. Learn. Syst. 2022. [Google Scholar] [CrossRef] [PubMed]
  10. Sun, C.; Du, M.; Sun, J.; Li, K.; Dong, Y. A Three-Way Clustering Method Based on Improved Density Peaks Algorithm and Boundary Detection Graph. Int. J. Approx. Reason. 2023, 153, 239–257. [Google Scholar] [CrossRef]
  11. Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 1979, 28, 100–108. [Google Scholar] [CrossRef]
  12. Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  13. Jolion, J.M.; Rosenfeld, A. Cluster detection in background noise. Pattern Recognit. 1989, 22, 603–607. [Google Scholar] [CrossRef]
  14. Wu, K.L.; Yang, M.S. Alternative c-means clustering algorithms. Pattern Recognit. 2002, 35, 2267–2278. [Google Scholar] [CrossRef]
  15. Zhu, D.; Xie, L.; Zhou, C. K-Means Segmentation of Underwater Image Based on Improved Manta Ray Algorithm. Comput. Intell. Neurosci. 2022, 2022, 4587880. [Google Scholar] [CrossRef] [PubMed]
  16. Palpandi, S.; Devi, T.M. Flexible Kernel-Based Fuzzy Means Based Segmentation and Patch-Local Binary Patterns Feature Based Classification System Skin Cancer Detection. J. Med. Imag. Health Informat. 2020, 10, 2600–2608. [Google Scholar] [CrossRef]
  17. Bezdek, J.C.; Ehrlich, R.; Full, W. FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 1984, 10, 191–203. [Google Scholar] [CrossRef]
  18. Jiang, Z.; Li, T.; Min, W.; Qi, Z.; Rao, Y. Fuzzy c-means clustering based on weights and gene expression programming. Pattern Recognit. Lett. 2017, 90, 1–7. [Google Scholar] [CrossRef]
  19. Chen, H.-P.; Shen, X.-J.; Lv, Y.-D.; Long, J.-W. A novel automatic fuzzy clustering algorithm based on soft partition and membership information. Neurocomputing 2017, 236, 104–112. [Google Scholar] [CrossRef]
  20. Guo, F.F.; Wang, X.X.; Shen, J. Adaptive fuzzy c-means algorithm based on local noise detecting for image segmentation. IET Image Process. 2016, 10, 272–279. [Google Scholar] [CrossRef]
  21. Krishnapuram, R.; Keller, J.M. The possibilistic c-means algorithm: Insights and recommendations. IEEE Trans. Fuzzy Syst. 1996, 4, 385–393. [Google Scholar] [CrossRef]
  22. Pena, J.M.; Lozano, J.A.; Larranaga, P. An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett. 1999, 20, 1027–1040. [Google Scholar] [CrossRef]
  23. Celebi, M.E.; Kingravi, H.A.; Vela, P.A. A comparative study of efficient initialization methods for the k-means clustering algorithm. Exp. Syst. Appl. 2013, 40, 200–210. [Google Scholar] [CrossRef] [Green Version]
  24. Rong, J.; Haipeng, B.; Ronghui, Z. Analysis of Preparation Conditions of Low-Temperature Curing Powder Coatings Based on Local Clustering Algorithm. Math. Probl. Eng. 2022, 2022, 1143283. [Google Scholar] [CrossRef]
  25. Frigui, H.; Nasraoui, O. Unsupervised learning of prototypes and attribute weights. Pattern Recognit. 2004, 37, 567–581. [Google Scholar] [CrossRef]
  26. Jing, L.; Ng, M.K.; Huang, J.Z. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 2007, 19, 1026–1041. [Google Scholar] [CrossRef]
  27. Mahela, O.P.; Shaik, A.G. Recognition of power quality disturbances using S-transform based ruled decision tree and fuzzy C-means clustering classifiers. Appl. Soft Comput. 2017, 59, 243–257. [Google Scholar] [CrossRef]
  28. Zhi, X.-B.; Fan, J.-L.; Zhao, F. Robust local feature weighting hard c-means clustering algorithm. Neurocomputing 2014, 134, 20–29. [Google Scholar] [CrossRef]
  29. Yaghoubi, Z. Robust cluster consensus of general fractional-order nonlinear multi agent systems via adaptive sliding mode controller. Math. Comput. Simulat. 2020, 172, 15–32. [Google Scholar] [CrossRef]
  30. Huang, J.Z.; Ng, M.K.; Rong, H.; Li, Z. Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 657–668. [Google Scholar] [CrossRef]
  31. Sinaga, K.P.; Hussain, I.; Yang, M.-S. Entropy K-means clustering with feature reduction under unknown number of clusters. IEEE Access 2021, 9, 67736–67751. [Google Scholar] [CrossRef]
  32. Sinaga, K.P.; Yang, M.S. Unsupervised K-means clustering algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
  33. Strehl, A.; Ghosh, J. Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res. 2002, 3, 583–617. [Google Scholar]
  34. Singh, V.; Verma, N.K. An Entropy-Based Variable Feature Weighted Fuzzy k-Means Algorithm for High Dimensional Data. arXiv preprint 2019, arXiv:1912.11209. [Google Scholar]
  35. Yang, M.-S.; Nataliani, Y. A Feature-Reduction Fuzzy Clustering Algorithm Based on Feature-Weighted Entropy. IEEE Trans. Fuzzy Syst. 2017, 26, 817–835. [Google Scholar] [CrossRef]
  36. The Website of UC Irvine Machine Learning Repository. Available online: https://archive.ics.uci.edu (accessed on 1 October 2022).
  37. Available online: https://jundongl.github.io/scikit-feature/datasets.html (accessed on 1 October 2022).
  38. Bezdek, J.C. A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 1980, PAMI-2, 1–8. [Google Scholar] [CrossRef] [PubMed]
  39. Rand, W.M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
Figure 1. The sensitivity of λ when γ = 0.8 : (a) ACC of Iris dataset with γ = 0.8 ; (b) RI of Iris dataset with γ = 0.8 ; (c) NMI of Iris dataset with γ = 0.8 ; (d) ACC of Zoo dataset with γ = 0.8 ; (e) RI of Zoo dataset with γ = 0.8 ; (f) NMI of Zoo dataset with γ = 0.8 ; (g) ACC of PCMAC dataset with γ = 0.8 ; (h) RI of PCMAC dataset with γ = 0.8 ; (i) NMI of PCMAC dataset with γ = 0.8 .
Figure 1. The sensitivity of λ when γ = 0.8 : (a) ACC of Iris dataset with γ = 0.8 ; (b) RI of Iris dataset with γ = 0.8 ; (c) NMI of Iris dataset with γ = 0.8 ; (d) ACC of Zoo dataset with γ = 0.8 ; (e) RI of Zoo dataset with γ = 0.8 ; (f) NMI of Zoo dataset with γ = 0.8 ; (g) ACC of PCMAC dataset with γ = 0.8 ; (h) RI of PCMAC dataset with γ = 0.8 ; (i) NMI of PCMAC dataset with γ = 0.8 .
Entropy 25 00510 g001
Figure 2. The sensitivity of γ when λ = 0.3 : (a) ACC of Iris dataset with λ = 0.3 ; (b) RI of Iris dataset with λ = 0.3 ; (c) NMI of Iris dataset with λ = 0.3 ; (d) ACC of Zoo dataset with λ = 0.3 ; (e) RI of Zoo dataset with λ = 0.3 ; (f) NMI of Zoo dataset with λ = 0.3 ; (g) ACC of PCMAC dataset with λ = 0.3 ; (h) RI of PCMAC dataset with λ = 0.3 ; (i) NMI of PCMAC dataset with λ = 0.3 .
Figure 2. The sensitivity of γ when λ = 0.3 : (a) ACC of Iris dataset with λ = 0.3 ; (b) RI of Iris dataset with λ = 0.3 ; (c) NMI of Iris dataset with λ = 0.3 ; (d) ACC of Zoo dataset with λ = 0.3 ; (e) RI of Zoo dataset with λ = 0.3 ; (f) NMI of Zoo dataset with λ = 0.3 ; (g) ACC of PCMAC dataset with λ = 0.3 ; (h) RI of PCMAC dataset with λ = 0.3 ; (i) NMI of PCMAC dataset with λ = 0.3 .
Entropy 25 00510 g002
Figure 3. Comparison of Euclidean distance and non-Euclidean distance at ACC: (a) ACC of Iris dataset with γ = 0.8 ; (b) ACC of ZOO dataset with λ = 0.3 .
Figure 3. Comparison of Euclidean distance and non-Euclidean distance at ACC: (a) ACC of Iris dataset with γ = 0.8 ; (b) ACC of ZOO dataset with λ = 0.3 .
Entropy 25 00510 g003
Figure 4. The effect of noise on the Iris dataset: (a) without noise; (b) with noise.
Figure 4. The effect of noise on the Iris dataset: (a) without noise; (b) with noise.
Entropy 25 00510 g004
Figure 5. Weights of the features assigned in Iris dataset for (a) Cluster 1, (b) Cluster 2, and (c) Cluster 3.
Figure 5. Weights of the features assigned in Iris dataset for (a) Cluster 1, (b) Cluster 2, and (c) Cluster 3.
Entropy 25 00510 g005
Table 1. The computational complexity of the algorithms.
Table 1. The computational complexity of the algorithms.
MethodComputational Complexity
K-Means O ( N K M )
WK-Means O ( N K M 2 )
FCM O ( N K 2 M )
RLWHCM O ( N K M )
SCAD O ( N K 2 M + N K M 2 )
EKM O ( N K M )
UKM O ( N K M )
Proposed algorithm O ( 4 N K M )
Table 2. Characteristics of the real-world dataset.
Table 2. Characteristics of the real-world dataset.
DatasetNumber of SamplesNumber of FeaturesNumber of Classes
Dermatology366346
Iris15043
Wine178133
Ionosphere351342
Lung career32563
Statlog (heart)270132
Zoo101167
BASEHOCK199348622
PCMAC194332892
ALLAML7271292
GLOMA5044344
COIL201440102420
Yale165102415
Gisette700050002
Madelon26005002
Table 3. Comparison results of the performance of algorithms on low-dimensional datasets.
Table 3. Comparison results of the performance of algorithms on low-dimensional datasets.
Datasets KMWKMFCMRLWHCMSCADEKMUKMOurs
DermatologyACC0.360.410.360.680.470.530.360.71
RI0.680.630.700.820.680.730.680.84
NMI0.100.250.110.600.460.520.100.64
IrisACC0.890.900.890.930.920.960.890.98
RI0.880.890.880.880.900.950.880.97
NMI0.760.760.750.780.760.870.770.91
WineACC0.700.700.690.790.590.570.700.74
RI0.720.720.710.750.620.540.720.73
NMI0.430.430.420.330.480.330.450.41
IonosphereACC0.710.730.710.740.640.730.700.81
RI0.590.630.590.660.540.540.580.69
NMI0.030.250.040.140.130.210.110.27
Lung careerACC0.550.530.560.590.590.550.430.62
RI0.580.630.630.550.560.590.460.66
NMI0.240.250.270.270.200.180.120.29
Statlog (heart)ACC0.590.640.610.710.810.570.650.83
RI0.510.540.520.610.700.510.520.72
NMI0.010.070.150.120.170.030.060.23
ZooACC0.660.40.570.710.730.670.780.81
RI0.810.230.830.830.750.800.850.86
NMI0.710.340.670.620.770.660.710.74
BASEHOCKACC0.500.530.510.620.530.580.550.69
RI0.500.510.500.500.500.510.510.55
NMI0.010.040.010.010.040.050.030.07
Table 4. Comparison results of the performance of algorithms on high-dimensional datasets.
Table 4. Comparison results of the performance of algorithms on high-dimensional datasets.
Datasets KMWKMFCMRLWHCMSCADEKMUKMOurs
BASEHOCKACC0.500.530.510.620.530.500.550.69
RI0.500.510.500.500.500.510.510.55
NMI0.010.040.010.010.040.020.030.07
PCMACACC0.510.510.550.580.520.510.530.68
RI0.490.500.500.530.590.570.580.62
NMI00.020.010.040.020.040.050.12
ALLAMLACC0.680.750.670.650.720.750.820.91
RI0.560.620.550.490.590.640.700.84
NMI0.060.160.090.110.140.330.470.58
GLIOMAACC0.660.660.560.710.540.670.720.76
RI0.750.760.720.790.700.720.680.78
NMI0.480.550.560.590.530.570.520.66
COIL20ACC0.110.150.130.430.190.330.350.41
RI0.560.690.560.900.570.890.820.90
NMI0.280.400.290.580.210.420.490.53
YaleACC0.380.190.150.360.240.390.400.45
RI0.880.470.580.880.760.840.870.92
NMI0.440.190.130.400.270.390.420.55
GisetteACC0.690.500.690.530.560.730.770.81
RI0.570.500.580.510.520.620.660.66
NMI0.1200.1100.060.190.210.26
MadelonACC0.510.550.500.540.520.690.720.81
RI0.500.500.500.520.50.620.650.69
NMI00.01000.010.130.160.21
Table 5. Comparison results of the conditions of use of different algorithms.
Table 5. Comparison results of the conditions of use of different algorithms.
AlgorithmWeight DivisionFeature ConstraintDistance Division
KMNonNonEuclidean
WKMGlobalExponentialEuclidean
FCMNonNonEuclidean
RLWHCMLocalExponentialNon-Euclidean
SCADLocalExponentialEuclidean
EKMLocalEntropyEuclidean
UKMNonEntropyEuclidean
OursLocalEntropyNon-Euclidean
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Du, X. A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy. Entropy 2023, 25, 510. https://doi.org/10.3390/e25030510

AMA Style

Du X. A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy. Entropy. 2023; 25(3):510. https://doi.org/10.3390/e25030510

Chicago/Turabian Style

Du, Xinzhi. 2023. "A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy" Entropy 25, no. 3: 510. https://doi.org/10.3390/e25030510

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop