Next Article in Journal
Charging and Discharging Modeling of Inertial Sensors Based on Ultraviolet Charge Management
Next Article in Special Issue
A Novel Reinforcement Learning-Based Particle Swarm Optimization Algorithm for Better Symmetry between Convergence Speed and Diversity
Previous Article in Journal
Optimized Tool Motion Symmetry for Strip-Width-Max Mfg of Sculptured Surfaces with Non-Ball Tools Based on Envelope Approximation
Previous Article in Special Issue
A Modified Osprey Optimization Algorithm for Solving Global Optimization and Engineering Optimization Design Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sparse Fuzzy C-Means Clustering with Lasso Penalty

Department of Applied Mathematics, Chung Yuan Christian University, Taoyuan 32023, Taiwan
*
Author to whom correspondence should be addressed.
Symmetry 2024, 16(9), 1208; https://doi.org/10.3390/sym16091208
Submission received: 16 July 2024 / Revised: 6 September 2024 / Accepted: 11 September 2024 / Published: 13 September 2024
(This article belongs to the Special Issue Symmetry in Intelligent Algorithms)

Abstract

:
Clustering is a technique of grouping data into a homogeneous structure according to the similarity or dissimilarity measures between objects. In clustering, the fuzzy c-means (FCM) algorithm is the best-known and most commonly used method and is a fuzzy extension of k-means in which FCM has been widely used in various fields. Although FCM is a good clustering algorithm, it only treats data points with feature components under equal importance and has drawbacks for handling high-dimensional data. The rapid development of social media and data acquisition techniques has led to advanced methods of collecting and processing larger, complex, and high-dimensional data. However, with high-dimensional data, the number of dimensions is typically immaterial or irrelevant. For features to be sparse, the Lasso penalty is capable of being applied to feature weights. A solution for FCM with sparsity is sparse FCM (S-FCM) clustering. In this paper, we propose a new S-FCM, called S-FCM-Lasso, which is a new type of S-FCM based on the Lasso penalty. The irrelevant features can be diminished towards exactly zero and assigned zero weights for unnecessary characteristics by the proposed S-FCM-Lasso. Based on various clustering performance measures, we compare S-FCM-Lasso with the S-FCM and other existing sparse clustering algorithms on several numerical and real-life datasets. Comparisons and experimental results demonstrate that, in terms of these performance measures, the proposed S-FCM-Lasso performs better than S-FCM and existing sparse clustering algorithms. This validates the efficiency and usefulness of the proposed S-FCM-Lasso algorithm for high-dimensional datasets with sparsity.

1. Introduction

Cluster analysis is an important tool in data analysis. In multiple disciplines, including artificial intelligence, pattern recognition, geology, biology, psychology, and information retrieval, clustering is an essential method for analyzing data. As a result, unlabeled input vectors are grouped into clusters or “natural” groups so that points within the clusters are more similar than points belonging to other clusters, i.e., to maximize intra-cluster similarity while minimizing interclass similarity. Cluster analysis aims to discover patterns, relationships, or structures in datasets by partitioning them into subsets or clusters, where objects within the same cluster are more similar to each other than to those in other clusters. There are many applications for clustering in numerical taxonomy, machine learning, data mining, social network analysis, image segmentation, market segmentation, anomaly detection, and industry. In general, there are two main approaches to clustering: probability model-based [1,2] and nonparametric clustering [3,4]. In this paper, we focus on the nonparametric clustering approach. There are many popular nonparametric clustering algorithms in the literature, such as hierarchical clustering [5,6], mean-shift [7,8], spectral clustering [9,10,11], k-means [12,13,14], fuzzy c-means [15,16,17], and possibilistic c-means [18,19,20]. However, these clustering algorithms may be effective for low-dimensional data, but they do not work well for high-dimensional datasets. The development of mathematical models that emphasize the variable and feature selections of high-dimensional data is a fascinating and developing research topic [21]. Technology has facilitated and encouraged data collection, leading to massive, complex datasets with a wide variety of unique characteristics and dimensions. In general, data with a high number of dimensions may include characteristics that are seen as unimportant or sparse. As a result, with high-dimensional data, the number of dimensions generally becomes insignificant, and so feature selection techniques are being effectively and successfully applied to maximize cluster quality.
In regression analysis, variable and feature selection was well-established by penalizing the L 1 norm because it leads to a sparsity property. Tibshirani [22] first proposed a popular one, called Lasso for the “least absolute shrinkage and selection operator”. The coefficients of regression variables may be forced to shrink toward zero via Lasso. Later, Witten and Tibshirani [23] produced the sparse k-means (S-KM) clustering framework as an approach for feature selection in k-means. Witten and Tibshirani [23] considered the S-KM algorithm by using the Lasso constraint in k-means as a mechanism for selecting features. We know that fuzzy c-means (FCM) clustering is a fuzzy extension of k-means and that possibilistic c-means (PCM) clustering is a possibilistic extension of FCM. Subsequently, Qiu et al. [24] expanded S-KM to sparse FCM (S-FCM) clustering by using the Lasso constraint as a mechanism for selecting features. Moreover, Yang and Benjamin [25] expanded S-KM and S-FCM clustering to sparse PCM (S-PCM) clustering in which they introduced two S-PCM algorithms, called S-PCM1 and S-PCM2. These S-PCM1 and S-PCM2 algorithms are subject to the Lasso constraint and Lasso penalty for feature weights, respectively.
Motivated by S-KM, S-FCM, and S-PCM, we propose a new type of S-FCM clustering, called S-FCM-Lasso. For S-FCM-Lasso, we use the FCM objective function with a Lasso (i.e., L1) penalty for feature weights. That is, the S-FCM proposed by Qiu et al. [24] uses the Lasso constraint for selecting features, which is similar to the S-KM algorithm proposed by Witten and Tibshirani [23]. However, our proposed S-FCM-Lasso is a new type of S-FCM under the Lasso penalty, making it different to the S-FCM algorithm. The performance effectiveness of the proposed S-FCM-Lasso algorithm is compared to S-FCM and other existing sparse clustering algorithms by considering both simulated and real datasets. The results demonstrated that our proposed S-FCM-Lasso algorithm has better performance than S-FCM and most existing sparse clustering algorithms by using several important evaluation measures. Furthermore, S-FCM2 gives excellent performance compared to S-FCM1. The proposed S-FCM-Lasso algorithm not only has the capability to produce quick and efficient results, but also has a feature selection ability that works by throwing irrelevant features toward zero. The rest of the paper is organized as follows. Section 2 contains the related literature and the structures of existing algorithms. The proposed S-FCM-Lasso algorithm is presented in Section 3. Section 4 includes the evaluation and comparison of performance of the proposed S-FCM-Lasso and existing clustering algorithms. Section 5 sums up the paper and contains some future recommendations, especially in terms of considering a point symmetry-based distance rather than Euclidean distance for presenting more cluster symmetry behaviors [26,27,28].

2. Related Work

In this section, we review the clustering algorithms proposed in the literature that are related to our work. In 1965, Zadeh [29] first proposed the idea of a fuzzy set with partial memberships in a fuzzy set. Later, Ruspini [30] successfully applied a fuzzy set in clustering. In fuzzy clustering, the fuzzy c-means (FCM) algorithm proposed by Dunn [31] and Bezdek [15] is the most well-known and used method. Fuzzy clustering has been widely studied and applied in a variety of substantive areas [32,33,34,35]. Let X = { x 1 , x 2 , , x n } be a dataset in a d-dimensional Euclidean space d . Let V = { v 1 , v 2 , , v c } be the set of cluster centers for the dataset X , where v k j represents the k th cluster with the j th feature component, n is the number of data points in X , c is the number of clusters, and d is the number of feature components. Let the membership matrix be U = [ u i k ] n × c ,   i = 1 , 2 , , n ,   k = 1 , 2 , , c where u i k is a fuzzy membership of the i th data point in the k th cluster of dataset X with k = 1 c u i k = 1 and u i k [ 0 , 1 ] for i = 1 , 2 , , n ,   k = 1 , 2 , , c . The FCM objective function [15] was defined as
J ( U , V ) = k = 1 c i = 1 n j = 1 d u i k m ( x i j v k j ) 2
where the weighting exponent 1 < m < + presents the degree of fuzziness. The FCM algorithm is iterated through the necessary conditions for minimizing J ( U , V ) , with the updating equations for membership function and cluster centers as: For 1 i n and 1 k c , u i k = ( j = 1 d ( x i j v k j ) 2 ) 1 m 1 / t = 1 c ( j = 1 d ( x i j v t j ) 2 ) 1 m 1 and v k j = i = 1 n u i k m x i j / i = 1 n u i k m . The fuzziness coefficient, or fuzzifier, is denoted by the m, which controls the level of fuzziness or softness in the cluster assignments.
In 1996, Tibshirani [22] first developed the Lasso (least absolute shrinkage and selection operator) regression technique for variable feature selection in regression analysis. The Lasso technique has the ability to shrink some redundant variables/features towards exactly zero so that only useful variables/features are retained. Based on the Lasso constraint in k-means, Witten and Tibshirani [23] proposed the sparse k-means (S-KM) clustering algorithm. The S-KM objective function was given by Witten and Tibshirani [23] as follows:
max C 1 , C 2 , , C k , w { j = 1 d w j ( 1 n i = 1 n i = 1 n d i , i , j k = 1 c 1 n k i , i C k d i , i , j ) }
subject to w 2 2 = 1   ,   w 1 s   ,   w j 0   j , where d i , i , j = ( x i j x i j ) 2 . The weights w j s make a contribution to the clustering and sparsity of the feature j by applying a suitable tuning parameter s selection, where the value of s is with 1 s d , which is obtained through the guidance of the gap statistic, and the d  under the square root denotes the total number of features. In 2015, Qiu et al. [24] expanded S-KM to sparse FCM (S-FCM) clustering by using the Lasso constraint in FCM. Qiu et al. [24] considered the S-FCM objective function as:
max U , V , w [ j = 1 d w j ( 1 n i = 1 n ( x i j x ¯ j ) 2 i = 1 n k = 1 c u i , k , j m d i , k , j ) ]
subject to w 2 2 1   ,   w 1 s   ,   w j 0   j , where w j is a feature weight for the j th feature, j = 1 , , d , and d i , k , j = ( x i j v k j ) 2 is the Euclidian distance, and x ¯ j = 1 n i = 1 n x i j . The value s lies between 1 s d , where s is the tuning parameter. The membership matrix is U = [ u i k ] n × c and v k j are cluster centers. The choice of features is decided through the tuning parameter s the according to the Lasso constraint. Gap statistics works as a technique for its selection. More weights of features will be zero as the s value decreases. In the process of applying sparse weights for feature selection, S-FCM is able to sustain FCM’s capability to demonstrate fuzzy behavior, resulting in optimal clustering. In this way, the data clusters can be presented in a more concise and meaningful manner while still maintaining accuracy and sparsity. S-FCM extends FCM by introducing Lasso constraints into the membership function matrix, resulting in a more interpretable and robust clustering solution.
Recently, Yang and Benjamin [25] proposed two sparse methods of possibilistic c-mean (PCM) for feature selection in the use of the Lasso concept. The first proposed sparse PCM (S-PCM), called S-PCM1, consists of a framework for feature selection based on Witten and Tibshirani [23] and Qiu et al. [24] by considering the Lasso constraints. The second version of S-PCM, called S-PCM2, includes the (Lasso) L 1 -regularization of feature weights as a penalty term for the PCM objective function. The S-PCM1 objective function [25] is as follows:
max U , V , w { j = 1 d w j ( 1 n i = 1 n i = 1 n ( x i j x i j ) 2 [ i = 1 n k = 1 c u i k m ( x i j v k j ) 2 + λ i = 1 n k = 1 c ( 1 u i k ) m ] ) }
subject to w 1 s , w 2 1 , w j 0 j , where u i k are the possibilistic membership matrix, w j is a feature weight for the j th feature, and s is the tuning parameter with 1 s d . The S-PCM2 objective function [25] is given below:
J S P C M 2 ( U , V , W ) = i = 1 n k = 1 c j = 1 p w j u i k m ( x i j v k j ) 2 + λ i = 1 n k = 1 c ( 1 u i k ) m + α w 1
where w 1 is L 1 -regularization of feature weights as a penalty term and V = [ v 1 , v 2 , , v k ] and U = [ u i k ] n × c are the cluster centers and membership matrix, respectively.

3. The Proposed Sparse FCM Clustering Algorithm

In Section 2, we review the S-FCM proposed by Qiu et al. [24], which is an extension of the S-KM base on the FCM objective function. We mention that the Lasso constraint term is difficult to handle for S-FCM. In particular, the selection of the tuning parameter by using the gap statistic needs to be performed separately from the S-FCM algorithm. In general, it costs too much computation time. To avoid the complex estimate of the tuning parameter in S-FCM, in this section we propose a new type of S-FCM based on the Lasso penalty, called S-FCM-Lasso. The proposed S-FCM-Lasso should be much simpler than the S-FCM by Qiu et al. [24]. We should mention that our proposal is motivated by the S-PCM2 by Yang and Benjamin [25]. The Lasso penalty term of feature weights is added to the FCM objective function. Thus, the proposed S-FCM-Lasso objective function is as follows:
J S F C M L a s s o ( U , V , W ) = i = 1 n k = 1 c j = 1 d u i k m w j ( x i j v k j ) 2 + ψ j = 1 d | w j |
subject to w 2 2 = 1 where u i k is a fuzzy membership of the i th point in the k th cluster with k = 1 c u i k = 1 and u i k [ 0 , 1 ] for 1 i n and 1 k c . W = [ w 1 , , w d ] R d represents the feature weight vector, U = [ u i k ] n × c is the fuzzy membership matrix, and V = [ v 1 , v 2 , , v c ] represents the c cluster centers. The updated equations for minimizing the proposed S-FCM-Lasso objective function J S F C M L a s s o ( U , V , W ) are shown in Theorem 1.
Theorem 1.
The necessary conditions for minimizing the S-FCM-Lasso objective function J S F C M L a s s o ( U , V , W ) are given below:
v k j = i = 1 n u i k m x i j / i = 1 n u i k m ,   1 k c , 1 j d
  u i k = [ j = 1 d w j ( x i j v k j ) 2 ] 1 m 1 k = 1 c [ j = 1 d w j ( x i j v k j ) 2 ] 1 m 1 ,   1 k c , 1 i n
w j = { i = 1 n k = 1 c u i k m ( x i j v k j ) 2 ψ i = 1 n k = 1 c u i k m ( x i j v k j ) 2 ψ 2 2   ,   when   η j < ψ 0 ,   when   | η j | ψ i = 1 n k = 1 c u i k m ( x i j v k j ) 2 + ψ i = 1 n k = 1 c u i k m ( x i j v k j ) 2 + ψ 2 2   ,   when   η j > ψ
where  η j = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 .
Proof. 
We first consider the necessary condition for minimizing the S-FCM-Lasso objective function J S F C M L a s s o ( U , V , W ) with respect to v k j by fixing u i k and w j . We take the partial derivative of J S F C M L a s s o ( U , V , W ) with respect to v k j and set it to be 0 with J S F C M L a s s o v k j = v k j [ i = 1 n k = 1 c j = 1 d u i k m w j ( x i j v k j ) 2 + ψ j = 1 d | w j | ] = 0 . We have i = 1 n u i k m x i j i = 1 n u i k m v k j = 0 , and then v k j = i = 1 n u i k m x i j / i = 1 n u i k m , 1 k c , 1 j d which is Equation (7). We next consider the necessary condition for minimizing J S F C M L a s s o ( U , V , W ) with respect to u i k by fixing v k j and w j . We need to use the Langrange multiplier with the Lagrangian J S F C M L a s s o L ( U , V , W ) = i = 1 n k = 1 c j = 1 d u i k m w j ( x i j v k j ) 2 + ψ j = 1 d | w j | + λ 1 [ k = 1 c u i k 1 ] . Taking the partial derivative of J S F C M L a s s o L ( U , V , W ) with respect to u i k and setting it to be 0, we obtain J S F C M L a s s o L u i k = j = 1 d m u i k m 1 w j ( x i j v k j ) 2 + λ 1 = 0 with k = 1 c u i k = 1 . Thus, we obtain u i k = [ j = 1 d w j ( x i j v k j ) 2 ] 1 m 1 / k = 1 c [ j = 1 d w j ( x i j v k j ) 2 ] 1 m 1 , which is Equation (8). Similarly, we consider the necessary condition for minimizing J S F C M L a s s o ( U , V , W ) with respect to w j by taking the partial derivative of the Lagrangian J S F C M L a s s o * ( U , V , W ) = i = 1 n k = 1 c j = 1 d u i k m w j ( x i j v k j ) 2 + ψ j = 1 d | w j | + λ 2 [ w 2 2 1 ] with respect to w j and set it to be 0, i.e., J S F C M L a s s o * ( U , V , W ) w j = 0 . The equation can be divided into two parts that separate the penalty expression ψ j = 1 d | w | j from the other term so that we can deal both parts of w j [ i = 1 n k = 1 c j = 1 d u i k m w j ( x i j v k j ) 2 + λ 2 [ w 2 2 1 ] ] and w j [ ψ j = 1 d | w j | ] separately. Let Z = i = 1 n k = 1 c j = 1 d u i k m w j ( x i j v k j ) 2 + λ 2 [ w 2 2 1 ] . To differentiate Z partially with respect to w j and set it to be zero, we have Z w j = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 + 2 λ 2 w j = 0 with w 2 2 = 1 . Thus, we obtain w j = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 / 2 λ 2 , and so w 2 2 = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 / 2 λ 2 2 2 = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 2 2 / 2 λ 2 2 2 = 1 . We obtain 2 λ 2 = 2 λ 2 2 2 = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 2 2 . Thus, we have w j = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 i = 1 n k = 1 c u i k m ( x i j v k j ) 2 2 2 . For another term w j ψ j = 1 d | w j | , we let L 1 = ψ j = 1 d | w j | , which is not differentiable with respect to w j . However, we can use sub-differentiation for this penalty term L 1 = ψ j = 1 d | w j | . We know that a sub-differential is a slope value that is less than the actual slope value between two points. Although the slope cannot be infinite, an infinite number of sub-differentials exists for any two points on a function curve. If we give the function f ( w j ) = γ | w j | with γ > 0 , then the sub-differential of f with respect to w j should be γ s i g n ( w j ) . Thus, to sub-differentiate L 1 with respect to w j , we can obtain the term ψ s i g n ( w j ) , and so by using both the equations Z w j = Z and ψ s i g n ( w j ) , we have J S F C M L a s s o * w j = Z + { ψ ,   w h e n   w j < 0   [ ψ , ψ ] ,   w h e n   w j = 0 ψ ,   w h e n   w j > 0 , i.e., J S F C M L a s s o * w j = { Z ψ , when   w j < 0 ( Z ψ , Z + ψ ) , when   w j = 0 Z + ψ , when   w j > 0 . For J S F C M L a s s o * ( U , V , W ) w j = Z w j + w j ψ j = 1 d | w j | = 0 , we have the following three possibilities to solve this expression:
  • Case 1 for w j < 0 : Let η j = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 . Since J S F C M L a s s o * ( U , V , W ) w j = 0 . When w j < 0 , we have J S F C M L a s s o * ( U , V , W ) w j = Z ψ = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 + 2 λ 2 w j ψ = η j + 2 λ 2 w j ψ = 0 . We obtain w j = η j + ψ 2 λ 2 . And so, when η j > ψ , we have w j = η j + ψ 2 λ 2 = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 + ψ i = 1 n k = 1 c u i k m ( x i j v k j ) 2 + ψ 2 2 < 0 .
  • Case 2 for w j > 0 : Let η j = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 . Since J S F C M L a s s o * ( U , V , W ) w j = 0 . When w j > 0 , we have J S F C M L a s s o * ( U , V , W ) w j = Z + ψ = η j + 2 λ 2 w j + ψ = 0 . We obtain w j = η j ψ 2 λ 2 . And so, when η j < ψ , we have w j = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 ψ i = 1 n k = 1 c u i k m ( x i j v k j ) 2 ψ 2 2 > 0 .
  • Case 3 for w j = 0 : It is necessary to have an interval to contain zero according to sub-differentiation for L 1 = ψ j = 1 d | w j | so that w j = 0 can be an optimum. In this case, we have Z ψ = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 + 2 λ 2 w j ψ = η j + 2 λ 2 w j ψ = 0 with w j = η j + ψ 2 λ 2 . Thus, we can obtain η j ψ for w j 0 . On the other have, we have Z + ψ = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 + 2 λ 2 w j + ψ = η j + 2 λ 2 w j + ψ = 0 with w j = η j ψ 2 λ 2 . Thus, we can obtain η j ψ for w j 0 . For these cases where w j < 0 , we have J S F C M L a s s o * ( U , V , W ) w j = Z + ψ = η j + 2 λ 2 w j ψ = 0 . Since both sides with η j ψ and η j ψ imply | η j | ψ , we obtain that w j = 0 when | η j | ψ .
According to cases 1, 2, and 3, the updated weighted equations in the form of Equation (9) are obtained. □
Thus, the S-FCM-Lasso algorithm (Algorithm 1) is as follows:
Algorithm 1. The S-FCM-Lasso Algorithm
Step 1:Initialize the weights of the features as W ( 0 ) = [ w j ] 1 × p , w 1 = w 2 = = w d = 1 / d and the cluster centers V ( 0 ) randomly.
Step 2:Update the initial membership function U ( 0 ) = [ u i k ] n × c by using V ( 0 ) and Equation (8) and W ( 0 ) . For a fixed ε > 0 with ε = 1 × 10 5 , this is the threshold value for the stopping condition in Step 8 of the algorithm, and c is the cluster size with   2 c n .
Step 3:Set t = 1 and update V ( t ) with an initial U ( 0 ) by using Equation (7).
Step 4:Update U ( t ) by using the V ( t )   of Step 3 and W ( 0 ) with the help of Equation (8).
Step 5:Using the updated V ( t )   and U ( t ) of Steps 3 and 4, respectively, compute w*(t) with the help of the following equation:
w j = i = 1 n k = 1 c u i k m ( x i j v k j ) 2 [ i = 1 n k = 1 c j = 1 d u i k m ( x i j v k j ) 2 ]
Step 6:Select ψ using a binary search technique between max [ w * ( t ) , 0 ] and the weighted bound interval [ 0 , d ] .
Step 7:Update weights w j ( t )   w i t h   ψ by using Equation (9).
Step 8:If j = 1 d | w j ( t ) w j ( t 1 ) | j = 1 d | w j ( t 1 ) | < ε   , while w j ( t ) is the t th updated step for w j , then stop;
ELSE setting t = t + 1 and return back to Step-3.
We next examine how to pick the value of ψ . In S-FCM-Lasso, if feature weights and cluster centers are initialized, then membership values can be calculated: we might be able to update the weighted equations, cluster center, and membership function by using them. To identify the parameter that could produce sparsity, a binary search technique is applied. To find out the value of ψ , we may use the updated weighted equation after having selected w * = max ( w j * , 0 ) . We finally choose the optimal value of ψ between [ w , ( a , d ) ] by the binary search method according to Yang and Benjamin [25], where ‘a’ is the positive number. The optimal value of ψ can be obtained by using this strategy and then weights can be updated using the cluster center and membership updating equations. The algorithm terminates after the condition specified in Step-8 is satisfied in accordance with a threshold value ε , which is usually set at 1 × 10 5 . We mention that the binary search technique, also known as half-interval search, logarithmic search, or binary chop, is used to find the position of a target value within a sorted set. The middle element of the array is compared to the target value in a binary search. Here, we obtain w * = max ( w j * , 0 ) by using the updated weighted equation, and then we pick the optimal value of ψ between the intervals defined by [ w , ( a , d ) ] . Thus, we present the optimal selection procedure for the value of ψ with the binary search technique as follows:
1. 
Parameter range:
Let the parameter ψ of S-FCM-Lasso have the lower and upper bounds w m i n and w m a x .
2. 
Binary search initialization:
Set the initial search bounds with w low = w m i n and w high = w m a x .
3. 
Iterative binary search:
Let us consider w high w low > ε .
i. 
Midpoint computation:  w mid = w low + w high 2
ii. 
Execute the S-FCM-Lasso method: Tune the S-FCM-Lasso algorithm using the current w mid . To assess clustering performance, use a suitable measure (validation metric, etc.).
iii. 
Evaluation of results: Comparing the clustering performance to previous iterations. Continue the investigate whether w mid produces a better result.
iv. 
Set the search parameters:
IF the performance indicates that a higher w can generate better results:
define w low = w mid ; ELSE w high = w mid .
4. 
Finally, convergence:
The search continues until the gap w high w low ε denotes convergence.
5. 
The optimal result for the parameter  ψ :
The value of w mid at the stage of convergence is considered to be optimal ψ .
In the next section, we present some experimental results and comparisons to evaluate the performance of the proposed algorithm.

4. Clustering Measures, Results, and Comparisons

In this section, we demonstrate the capabilities and efficiency of the proposed S-FCM-Lasso algorithm via comprehensive experiments on both synthetic and real data. In our experiments, we use four synthetic datasets by Qiu et al. [24] and nine real datasets. We compare our proposed S-FCM-Lasso algorithm with FCM, S-KM, S-FCM, S-PCM1, and S-PCM2. In order to determine the efficiency of the proposed S-FCM-Lasso algorithm, a comparison between the true labels (the labels assigned to each data point) and the predicted labels (assigned by the algorithm) is conducted. As evaluation indexes to measure the performance and strength of the proposed algorithm, we use accuracy rate (AR), the rand index (RI) [36], normalized mutual information (NMI) [37], the Jaccard index (JI) [38], the Fowlkes–Mallows index (FMI) [39], and running time (RT). The formal description of these measures is as follows.
Accuracy Rate (AR): Suppose that C = { C 1 , C 2 , , C c } is the set of c clusters in the actual dataset and that C = { C 1 , C 2 , , C c } is the set of c clusters produced by the relevant algorithm. The accuracy rate (AR) is defined as A R = k = 1 c h k / n , where h k denotes the number of data points in C k , which are also clustered to C k , and n is the total number of data points in the dataset. The evaluation measures accuracy rate (AR), which measures the percentage of algorithm performance with correctly identified data points.
Rand Index (RI) [36]: The rand index (RI) [36] evaluation measure is another evaluation criterion that measures the similarity between the two distinct clustering datasets. The RI is a metric used to evaluate the correct decisions by an algorithm to assess the overall effectiveness and efficiency of that algorithm. The rand index evaluates the percentage of similarities and disagreements between pairs of data points in two distinct clustering datasets and then generates a similarity score between 0 and 1, where 0 represents no agreement and 1 denotes complete agreement. Let ( X i , X j ) be the given data points in the form of pairs. Let a be the number of points in a pair that are members of the same cluster in both C   and   C and let b be the total number of points that can be paired if both are a part of two distinct clusters in C . If the two points are part of the same cluster in C but different clusters in C , then the number of pairs of points is c , and if they are part of two different clusters in C but the same cluster in C , then the number of pairs of points is d . The total number of possible point pairs in a given dataset is M = n ( n 1 ) / 2 and n is the number of total data points that are used to determine the RI, where RI is computed by R I = ( a + d ) / M . The RI can measure how accurate the algorithm is.
Normalized Mutual Information (NMI) [37]: In the fields of information theory and data clustering, normalized mutual information (NMI) [37] is a metric of similarity between two sets, A and B, of data. The NMI metric varies between 0 and 1, with 1 denoting complete agreement between the two sets and 0 representing no mutual information. The joint and marginal probabilities are used to measure the level of agreement between clusters. It evaluates how accurately the clustering technique is able to reconstruct a dataset’s original label. We calculate NMI by applying the evaluation measure N M I = I ( A : B ) / ( [ H   ( A ) + H   ( B ) ]   /   2 ) , where H ( A ) and H ( B ) are the marginal entropies of A and B, respectively. I ( A : B ) is the mutual information between H ( A ) and H ( B ) .
Jaccard Index (JI) [38]: The Jaccard index (JI) [38], usually referred to as the Jaccard similarity coefficient, is a metric that evaluates how similar or dissimilar two sets are with respect to one another. The JI analyzes members of the two sets in order to identify which members are similar and which are dissimilar [38]. It provides an easy approach to calculating the level of agreement or overlap in sets. The Jaccard index varies from 0 to 1. An evaluation score of 0 denotes that there are no similarities between the sets, demonstrating their full dissimilarity. A score of 1 represents full resemblance, with the sets exactly equivalent. We calculate the JI score by J I = J ( A , B ) = | A B | | A B | , where B is a number from either set, while A is a number that exists in both sets. The intersection of the sets A and B is symbolized by | A B | . The magnitude of the unions of A and B is symbolized by | A B | . According to a, b, c, d, and M, described in the RI, the JI is computed by J I = a / M .
Fowlkes–Mallows Index (FMI) [39]: To measure the extent of agreement across two clusterings, Fowlkes and Mallows [39] proposed the Fowlkes–Mallows index (FMI) in 1983. Three kinds of outcomes are taken into consideration by the FMI: false positives (FPs), true positives (TPs), and false negatives (FNs). TP is a count of pairs of points that are presented in the same cluster in both clusterings C and C . FP denotes the number of pairs of points that are presented in the same cluster in C , but not in C . FN is the number of pairs of points that are presented in the same cluster in C , but not in C . The FMI investigates TPs, FPs, and FNs, which are combined for constructing the FMI with F M I = T P ( T P + F P ) ( T P + F N ) . The FMI has a range from 0 to 1, with a value of 1 indicating perfect agreement between both clusterings C and C , and a value close to 0 indicating little to no agreement between C and C . Moreover, it illustrates the existence of compact clusters.
Running Time (RT): The total average running time (RT), expressed in seconds, is how long it takes for an algorithm to achieve convergence. Based on the above measures of AR, RI, NMI, JI, FMI, and RT, we next evaluate the performance of our proposed S-FCM-Lasso algorithm in comparison with the existing algorithms.

Evaluation and Comparison with Synthetic Data

In this section, the proposed S-FCM-Lasso algorithm takes advantage of sparsity, and selects the relevant features with nonzero weights and discards the irrelevant features exactly towards zero. This can overcome the drawbacks associated with traditional clustering techniques that have no sparsity behavior. The proposed S-FCM-Lasso algorithm works in two phases. In the first phase, it finds and picks the most relevant attributes, hence reducing the dimensions of the datasets. It benefits from sparsity to enhance computing power as well as interpretation, providing an efficient algorithm to both researchers and practitioners to quickly identify the most important characteristics in a shorter time. For the comparisons, we implemented FCM, S-KM, S-FCM, S-PCM1, S-PCM2, and the proposed S-FCM-Lasso algorithm repeatedly with 100 simulations using various initial values to obtain their averages of AR, RI, NMI, JI, FMI, and RT. These findings are displayed in the below tables. We evaluated the performance and effectiveness of these algorithms using both synthetic and real datasets.
Example 1
(data generated from Qiu [24]). Let the data matrix be  X n × p  having dataset size  n × p , where  n  represents the number of data objects and  p  is the number of attributes. The dataset has an independent and normal distribution and it is generated by an indicator function with three groups; namely,  C 1 , C 2   &   C 3    by Qiu, et al. [24]. The means of the respective generated distributions are  μ , μ , 0 , which differ exclusively in regard to the first  q ( q < p )  characteristic, and the irrelevant attributes have a zero mean for the remaining  ( q p )  attributes. We have constructed three unique normal distributions with a mean shift only in the first  q  characteristics among the three clusters  C 1 , C 2   &   C 3   . Applying the above method, we constructed a dataset with a number of data items,  n = 60   &   p = 25  is the number of attributes, respectively. The values of the parameters as a fuzzifier are  m = 2  and  μ = 2.5 , with  ( q = 5 )  being the number of relevant attributes and  ( q p = 20 )  remaining irrelevant attributes. The binary search approach with the proposed S-FCM-Lasso algorithm was applied to determine the value of the tuning parameter psi  ( ψ )  with respect to the number of iterations.
We initialized the weights through the equation w j = 1 / p . The performance of our S-FCM-Lasso algorithm, with comparisons, is shown in Table 1. The accuracy rate (AR), rand index (RI), and average running time (RT) were applied as evaluation indicators to investigate the performance and efficacy of our proposed S-FCM-Lasso method. As shown in Table 1, we apply our proposed S-FCM-Lasso algorithm to the datasets generated from a Gaussian distribution to illustrate its efficacy, effectiveness, and performance. In Figure 1, we have shown bar graphs comparing the S-FCM-Lasso algorithm with existing algorithms, i.e., FCM, S-FCM, S-PCM1, and S-PCM2. In Figure 2, the clustering structure is also shown for all algorithms and their performances are also shown in Table 1.
The proposed S-FCM-Lasso algorithm has a small bar after minimization of the objective function (the algorithm converges exactly towards the zero). We can see that the bar graphs of S-FCM-Lasso show that a smaller number of iterations are required to converge the algorithm comparatively, although the exiting S-PCM2 also has a smaller bar (as it has a smaller number of parameters) than the other algorithms; therefore, we can say that our proposed S-FCM-Lasso is faster and converges more efficiently with a smaller number of iterations. In Table 1, which is shown below, we have applied different algorithms, such as FCM, S-FCM, S-PCM1, S-PCM2, and our proposed S-FCM-Lasso, to show their performances and how efficiently they convergence.
The results and comparisons of the above three algorithms in Table 1 have shown that the S-FCM-Lasso algorithm surpasses FCM comparatively and the existing sparse S-FCM, S-KM2, and S-PCM2 clustering algorithms in terms of the average (avg) evaluation indices such as AR, RI, NMI, JI, and FMI. The outcomes in Table 1 demonstrate that S-FCM-Lasso shows excellent results compared to the other algorithms, i.e., a v g A R S F C M L a s s o > a v g A R F C M ,   a v g A R S F C M ,   a v g A R S P C M 1 ,   a n d   a v g ( A R S P C M 2 ) , i.e., 1.000 > 0.9833 ,   0.9841 ,   0.9650 ,   a n d   0.9670 . Furthermore, for (RI), a v g R I S F C M L a s s o > a v g R I F C M , a v g R I S F C M , a v g R I S P C M 1 ,   a n d   a v g ( R I S P C M 2 ) , i.e., 1.0000 > 0.3333, 0.9894, 0.5480, and 0.8100. A similar behavior of averages (NMI, JI, and FMI) can be observed in Table 1. The avg (RT) is also provided in Table 1; that is, the average total running time. The avg (RT) of FCM (0.0510) is small but the avg (RT) of the proposed S-FCM-Lasso algorithm is the lowest, as compared to S-FCM, S-PCM1, and S-PCM2, i.e., 0.0401 < 2.4758, 0.0690, and 0.0630. We can conclude that the performance of our proposed S-FCM2 algorithm is much better than all the other algorithms. The clustering structure of FCM, S-FCM, S-PCM1, and S-PCM2 is shown in Figure 2a–d and that of S-FCM-Lasso is shown in Figure 2e,f, with respect to true labels and with respect to predicted labels, respectively; that is, this is similar to the results obtained for the real dataset. We can say that our proposed S-FCM-Lasso algorithm also has an excellent ability in terms of good clustering structures.
Example 2
(dataset generated at varying dimensions p). The data points were formed with a Gaussian distribution but categorized into four different groups or clusters, including the generation of data elements with a normal distribution. Similarly to experiment 1, the dataset for this experiment has been generated using numerous numbers increasing in size that vary from 100 to 1000. There are the four clusters in the generated dataset, these being  C 1 , C 2 , C 3   and   C 4 , the means of the distribution are  μ , μ , 2 μ   and   0 , ( μ = 2.5 ) , respectively, which first consist of  q  significant characteristics. The remaining (q-p) characteristics with a mean of zero indicate that they are all insignificant features. The numerical value of the fuzzifier is  m = 2 . We obtained the clustering results for our proposed S-FCM-Lasso algorithm in experiment 2 with an increasing number of dimensions from 100 to 1000, and its comparative analysis with the classic FCM, S-FCM, S-PCM1, and S-PCM2 is shown in Figure 3. Multiple bar graphs illustrating the averages of the parameters AR, RI, NMI, JI, FMI, and RT (total running time in seconds) are shown in Figure 3. In Figure 3, the multiple bar graphs of our proposed S-FCM-Lasso show its superiority over all of the other algorithms, i.e., the classic FCM and the competing S-FCM, S-PCM1, and S-PCM2 clustering algorithms. We conclude that our proposed S-FCM-Lasso algorithm outperforms all the other algorithms in Figure 3a–f, respectively.
Example 3
(based on simulated data for n = 100, 200… 1000). This example is based on the generated dataset of three clusters by using the method of Qiu, et al. [24] for an increasing number of sample sizes, n = 100, 200… 1000, and a fixed number of dimensions p = 1000. The first 25 dimensions are considered as relevant and the remaining 975 as irrelevant features with parametric values mu= 3 and m = 2. The resulting values of the evaluation indices AR, RI, NMI, JI, and FMI for the proposed S-FCM-Lasso clustering method along with the existing S-FCM, S-PCM1, S-PCM2, and classic FCM are shown in Figure 4, based on the averages of these evaluation measures, using multiple bar graphs. In Figure 4, the multiple bar graphs represent the averages of evaluation indices, i.e., AR, RI, NMI, JI, FMI, and RT, respectively. We observe that our proposed S-FCM-Lasso algorithm surpasses all the other algorithms in Figure 4a–f, respectively. S-FCM-Lasso has been shown to have a low average running time (RT) in multiple bar graphs in Figure 4f. Therefore, we conclude that our proposed S-FCM-Lasso algorithm has shown good clustering performance and excellent clustering results. We can say that S-FCM-Lasso outperforms all the other competing algorithms.
Example 4
(dataset generated from Gaussian mixture distribution). The dataset is generated from multivariate Gaussian mixture distribution  k = 1 3 α k N ( μ k , Σ k )  with parameters and mean vectors being  α k = 1 2 ,   k = 1 , 2 , 3 ,   μ 1 = [ 3 , 3 , 3 ] ;   μ 2 = [ 6 , 6 , 6 ] ;   μ 3 = [ 9 , 9 , 9 ]  and the identity variance covariance sigma matrices being  Σ k = 0.30 I 3 . The generated dataset of 450 values is distributed into three clusters  C 1 , C 2 , C 3 , each of a size of 150 with 15 features or dimensions. Among these features, three features are considered relevant features. The remaining twelve features are generated from a uniform distribution with zero mean. The value of the fuzzifier is  m = 2 . We generated the above dataset using the famous Monte-Carlo simulation technique with 100 simulations to apply our proposed S-FCM-Lasso algorithm and compare it with the classic FCM and the existing S-FCM, S-PCM1, and S-PCM2 algorithms. The resulting values of the different evaluation measures used to compare these algorithms are provided in Table 2. The bar graph is shown in Figure 5 and the clustering patterns of the different algorithms are shown in Figure 6.
The results in Table 2 demonstrate that our proposed S-FCM-Lasso algorithm outperforms the classic FCM and the existing sparse S-FCM, S-PCM1, and S-PCM2 algorithms in terms of averages (Avg.) of the evaluation measures AR, RI, NMI, JI, and FMI, i.e., a v g A R S F C M L a s s o , a v g A R F C M ,   a v g A R S F C M ,   a v g A R S P C M 1 ,   a n d   a v g A R S P C M 2 ,   i . e . , 1.0000 > 0.9547 ,   0.9987 , 0.9990 ,   a n d   0.9950 . In addition, for RI, NMI, JI, and FMI, respectively, we can see that 1.0000 > 0.5639, 0.9991, 0.9720, and 0.8550 for NMI, 1.0000 > 0.9548, 0.9940, 0.9700, and 0.8210, and for JI 1.0000 > 0.1889, 0.9974, 0.9500, and 0.7410, and also for FMI 1.0000 > 0.3178, 0.9987, 0.9720, and 0.8470. The average total running time, avg. (RT), of our proposed S-FCM-Lasso algorithm (0.0212) is lower than the avg. (RT) of all the other competing algorithms such as the classic FCM and the existing S-FCM, S-PCM1, and S-PCM2, i.e., a v g R T S F C M L a s s o < a v g R T F C M , a v g R T S F C M , a v g R T S P C M 1 ,   a n d   a v g R I S P C M 2 ,   i . e . ,   0.0212 < 0.0246, 3.3712, 0.0900, and 0.01230. Therefore, we concluded that S-FCM-Lasso has superior results compared to all of the other competing algorithms mentioned above.
Moreover, the bar graph of S-FCM-Lasso in Figure 5 is the smallest due to its simplicity, with a smaller number of iterations required for the algorithm to converge. The clustering structures of the classic FCM and the competing S-FCM, S-PCM1, and S-PCM2 algorithms are shown in Figure 6a–d, which further illustrates the clustering structure of our proposed sparse S-FCM-Lasso algorithm, as shown in Figure 6e,f, with respect to true and predicted labels. We concluded that the overall structure was consistent across the actual and predicted labeling for all the algorithms mentioned above. Thus, it is obvious that the proposed clustering S-FCM-Lasso algorithm can produce superior clustering patterns. As a result, we concluded that our proposed S-FCM-Lasso performed well based on good clustering structures and results that are very efficient, leading us to claim that our proposed S-FCM-Lasso algorithm is superior to all other algorithms.
Example 5
(iris dataset). In the fields of machine learning and statistics, the iris dataset [40,41] is a widely utilized dataset. Since British statistician and biologist Sir. Fisher first published it in 1936 [36], it has grown to be a standard dataset for exploring pattern recognition and classification approaches. The Setosa, Versicolor, and Virginica varieties of iris flowers are the three different species that collectively make up the iris dataset. There are four characteristics that each flower sample has: sepal length (SL), sepal width (SW), petal length (PL), and petal width (PW). In Figure 7, we illustrate that our proposed S-FCM-Lasso algorithm exhibits the best number of iterations compared to other existing algorithms. The results of feature weights based on S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso for the iris dataset are shown in Table 3. Table 4 shows the performances of FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso with the top-ranked values marked in bold and the second-ranked values underlined.
The results in Table 4 show that the S-FCM-Lasso algorithm outperforms the classic FCM algorithm and the existing S-FCM, S-PCM1, and S-PCM2 algorithms based on the average (Avg.) evaluation indicators AR, RI, NMI, JI, and FMI, i.e., a v g A R S F C M L a s s o > a v g A R F C M ,   a v g A R S F C M , a v g A R S P C M 1 ,   a n d   a v g A R S P C M 2 ,   i . e . ,   0.9997 > 0.9580 ,   0.9700 , 0.9690 ,   a n d   0.9620 . For the average (Avg.) rand index (RI), a v g R I S F C M L a s s o > a v g R I F C M ,   a v g R I S F C M , a v g R I S P C M 1 ,   a n d   a v g R I S P C M 2 ,   i . e . ,   0.9997 > 0.950 ,   0.9600 , 0.7630 , 0.8100 ,   a n d   0.9620 . The other average evaluation measures, NMI, JI, and FMI, have revealed similar behavioral trends with a v g N M I S F C M L a s s o > a v g N M I F C M ,   a v g N M I S F C M , a v g N M I S P C M 1 ,   a n d   a v g N M I S P C M 2 , i.e., 0.8705 > 0.8377 , 0.8642 ,   0.7190 ,   a n d   0.6950 , and similarly for the other JI and FMI measures, i.e., 0.9234 > 0.8867 , 0.9223 ,   0.5780 ,   a n d   0.6500 , and 0.9600 > 0.9370 , 0.9500 ,   0.7580 ,   a n d   0.7930 . The average total running time (RT) of S-FCM-Lasso (0.0742) is the lowest but the avg(RT) of S-PCM2 (0.0780) is also low.
Consequently, we conclude that our proposed S-FCM-Lasso has shown better performance than the other algorithms; namely, the classic FCM and the competing S-FCM, S-PCM1, and S-PCM2 algorithms. The proposed S-FCM-Lasso algorithm has the ability to retain relevant features by assigning nonzero weights and discard the irrelevant features by assigning zero weights to them, as shown in Table 3. The bar graph of S-FCM-Lasso is also the smallest bar graph, as shown in Figure 7. The proposed S-FCM-Lasso algorithm consumes a small number of iterations to achieve minimization (algorithm convergence). The clustering structures of the classic FCM and competing S-FCM, S-PCM1, and S-PCM2 algorithms is shown in Figure 8a–d. The graphical clustering structure of the proposed S-FCM-Lasso is further illustrated in Figure 8e,f with respect to the true and predicted labels. We ultimately concluded that, in general, the structure of the algorithms was similar for actual and predicted labeling. As a consequence, it seems obvious that the proposed S-FCM-Lasso clustering algorithm has the potential of generating effective clustering structures. Consequently, we ultimately concluded that our proposed S-FCM-Lasso algorithm performed well in terms of clustering structure and produced excellent results that were successful and efficient, leading us to claim that it is superior to all other competing algorithms.
Example 6
(abalone datasets). Abalone is the common name for any group of small to very large sea snails, commonly found along the coasts across the world, used as a delicacy in cuisines, and its leftover shell is fashioned into jewelry due to its iridescent luster. Due to its demand and economic value, it is often harvested on farms. There are many species of sea snails known as abalone, which can range in size from small to very large, and they are found across the world. Their shells are used as delicacies and fashioned into jewelry because of their iridescent color. Farms often harvest them due to their high economic value and demand. This dataset is available on open access data source Kaggle. The total number of observations in the abalone dataset is 4177 and the total number of features is eight, which includes sex, length, diameter, height, whole weight, shucked weight, viscera weight, and shell weight. We applied our proposed S-FCM-Lasso algorithm to this dataset and compared it with the other competing algorithms, i.e., the FCM, S-FCM, S-PCM1, and S-PCM2 clustering algorithms. Table 5 compares the S-FCM-Lasso algorithm with the other algorithms and shows the following performance results:
The results in Table 5 show that the S-FCM-Lasso algorithm demonstrated excellent performance compared to the classic FCM algorithm and the competing S-FCM, S-PCM1, and S-PCM2 algorithms based on the average (Avg.) evaluation indicators AR, RI, NMI, JI, and FMI, i.e., a v g A R S F C M L a s s o > a v g A R F C M ,   a v g A R S F C M , a v g A R S P C M 1 ,   a n d   a v g A R S P C M 2 , i.e., 0.9997 > 0.7524 ,   0.9562 , 0.9582 ,   a n d   0.9600 . For the average (Avg.) rand index (RI), a v g R I S F C M L a s s o > a v g R I F C M ,   a v g R I S F C M , a v g R I S P C M 1 ,   a n d   a v g R I S P C M 2 , i.e., 0.8392 > 0.8242 ,   0.8390 , 0.5560 ,   a n d   0.5030 . A similar behavioral pattern was observed in the other average evaluation measures, NMI, JI, and FMI i.e., a v g N M I S F C M L a s s o > a v g N M I F C M ,   a v g N M I S F C M , a v g N M I S P C M 1 ,   a n d   a v g N M I S P C M 2 , i.e., 0.4484 > 0.4452 , 0.4389 ,   0.0420 ,   a n d   0.0400 , and similarly for the other JI and FMI measures, i.e., 0.609 > 0.5543 , 0.5546 ,   0.5440 ,   a n d   0.0530 and 0.757 > 0.7524 , 0.7535 ,   0.7410 ,   a n d   0.6720 . The average total running time (RT) of S-FCM-Lasso (0.1261) is the lowest, but the avg(RT) of S-PCM2 (0.1273) is also low.
Example 7
(diabetes datasets). The original source of this dataset is the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to determine if a patient has diabetes or not using diagnostic parameters. From a large database, these instances were selected based on several constraints. All patients here are females of Pima Indian heritage who are at least 21 years old. The dataset has one target variable, outcome, and other medical explanatory factors. a patient’s age, BMI, insulin level, number of pregnancies, and other factors are all considered predictor variables. The data used in this experiment is publicly available on Kaggle. The total number of observations in the diabetes dataset is 768 and the total number of features is eight, which includes pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree, age, and outcome. We used this dataset to apply our proposed S-FCM-Lasso method and compare it to all other competing clustering algorithms, such as FCM, S-FCM, S-PCM1, and S-PCM2. The performance of our proposed S-FCM-Lasso with an algorithmic comparison in Table 6 illustrates the effectiveness of the proposed S-FCM-Lasso algorithm:
The results are summarized in Table 6, which shows that the S-FCM-Lasso approach outperforms the existing S-FCM, S-PCM1, S-PCM2, and traditional FCM methods when considering the average (Avg.) evaluation indicators AR, RI, NMI, JI, and FMI, i.e., a v g A R S F C M L a s s o > a v g A R F C M ,   a v g A R S F C M , a v g A R S P C M 1 ,   a n d   a v g A R S P C M 2 ,   i . e . ,   0.9999 > 0.7520 ,   0.9200 , 0.9560 ,   a n d   0.9600 . For the average (Avg.) rand index (RI), a v g R I S F C M L a s s o > a v g R I F C M ,   a v g R I S F C M , a v g R I S P C M 1 ,   a n d   a v g R I S P C M 2 ,   i . e . ,   0.8381 > 0.8349 ,   0.8090,0.5599 ,   a n d   0.5533 . The other average evaluation measures, NMI, JI, and FMI, reveale similar behavioral trends: a v g N M I S F C M L a s s o > a v g N M I F C M ,   a v g N M I S F C M , a v g N M I S P C M 1 ,   a n d   a v g N M I S P C M 2 , i.e., 0.4589 > 0.4376 , 0.4389 ,   0.0510 ,   a n d   0.0400 and similarly for the other JI and FMI measures, i.e., 0.6092 > 0.4988 , 0.5546 ,   0.5583 ,   a n d   0.0530 and 0.7591 > 0.7524 , 0.7135 ,   0.7500 ,   a n d   0.6720 . The average total running time (RT) of S-FCM-Lasso (0.1240) is the lowest but the avg(RT) of S-PCM2 (0.1293) is also low.
Example 8
(real datasets). The experiment incorporates different real-world datasets including breast cancer, seeds, vertebrae, sonar, diabetes, abalone, and lungs taken from the UCI Machine Learning Repository [41] and ODDS Library [42]. These real datasets were subjected to the exiting FCM and our proposed S-FCM-Lasso, to evaluate their algorithmic effectiveness as well as their potential for identifying the relevant features and discarding those that are irrelevant. The performance of both the existing S-FCM and the proposed S-FCM-Lasso clustering algorithms regarding feature selection (the number of relevant features with nonzero weights) is illustrated in Table 7. Moreover, Table 6 shows that both algorithms can select relevant features with nonzero weights and can shrink irrelevant features because the Lasso penalty on the feature weights is included.
The two sparse algorithms, S-FCM and our proposed S-FCM-Lasso algorithm, are compared and examined in Table 7. We demonstrate that both algorithms are capable of generating sparse feature weights and, by discarding unnecessary features due to the inclusion of the lasso penalty, giving them exactly zero weights, and both algorithms have the ability to select the most appropriate features.
Example 9
(other real datasets). In this experiment, we used a variety of other real datasets, such as breast cancer, seeds, vertebrae, cardiac activity, sonar and lungs [38,39]. In this study, we reviewed a few authentic real datasets from the health and medical sciences using a variety of dimensions. We subsequently applied our proposed method to these real datasets to analyze its usefulness, efficiency, and proficiency in selecting important features. These real datasets have been generated by scientists and researchers in a variety of domains of life, i.e., science, research, machine learning projects, and numerous other fields. We performed 100 simulations with different initial values to investigate the clustering results. In our investigation, we obtained the value of tuning parameter s with fuzzifier  m = 2 , considering the statistics gap in S-FCM. We applied our proposed S-FCM-Lasso algorithm on these real datasets and compared it with the classic FCM and competing S-FCM, S-KM, S-PCM1, and SPCM2 clustering algorithms in order to evaluate their performance as well as their feature selection capability. In Table 8, we display the performance of S-FCM-Lasso with the other competing algorithms based on different criteria of evaluation: average accuracy rate (AR), RI, NMI, JI, FMI. Table 8 summarizes the results, with bold results denoting the highest values and highlighted results denoting the second-highest values.
In Table 8, we applied the classic FCM and proposed S-FCM-Lasso algorithms and the competing S-FCM, S-KM, S-PCM1, and S-PCM2 algorithms to these real datasets. Moreover, we compared and analyzed our proposed S-FCM-Lasso algorithm with the other existing algorithms. Several evaluation measures (average AR, RI, FMI, JI, MNI, and RT) assessed when the proposed S-FCM-Lasso algorithm achieves convergence, having been obtained using different real datasets. Moreover, we found, from the above table, that our proposed S-FCM-Lasso algorithm outperformed the other algorithms and displayed superior effectiveness and efficient findings compared with those obtained using the classic FCM and existing S-FCM, S-KM, S-PCM1, and S-PCM2 algorithms, i.e., for cancer data: a v g A R S F C M L a s s o > a v g A R F C M ,   a v g A R S K M , a v g A R S F C M , a v g A R S P C M 1 , a n d   a v g A R S P C M 2   w i t h   0.9999 > 0.8541 , 0.9940 , 0.9700 , 0.9960 ,   a n d   0.9970 . In addition, we have shown that our proposed S-FCM-Lasso outperforms the classic FCM and existing S-KM, S-FCM, S-PCM1, and S-PCM2 algorithms based on the average (Avg.) RI: a v g A R S F C M L a s s o > a v g N M I F C M , a v g R I S K M , a v g R I S F C M , a v g R I S P C M 1 , a v g R I S P C M 2 ,   i . e . ,   0.9228 > 0.4306 , 0.8950 , 0.4389 , 0.8090 ,   a n d   0.5280 .
Similarly, for the other evaluation measures, NMI, JI, and FMI, i.e., a v g N M I S F C M L a s s o > a v g N M I F C M , a v g N M I S K M , a v g N M I S F C M , a v g N M I S P C M 1 , a v g N M I S P C M 1 ,   a v g N M I S P C M 2 with 0.7372 > 0.4371 , 0.6800 , 0.8950 , 0.4389 , 0.0440 ,   a n d   0.7160 and a v g J I S F C M L a s s o > a v g J I F C M , a v g J I S K M , a v g J I S F C M , a v g J I S P C M 1 , a v g J I S P C M 2 , i.e., 0.8454 > 0.0787 , 0.8270 , 0.5546 , 0.5190 ,   a n d   0.8030 and for FMI we have a v g F M I S F C M L a s s o > a v g F M I F C M , a v g F M I S K M , a v g F M I S F C M , a v g F M I S P C M 1 ,   a v g F M I S P C M 2 , i.e., 0.9541 > 0.1459 , 0.9050 , 0.7135 , 0.7090 ,   a n d   0.8910 . We found similar behavior for the other datasets; namely, arrhythmia, cardio and lungs, based on average (Avg.) AR, RI, NMI, JI, and FMI evaluation measures. Furthermore, we can observe the effectiveness of our proposed S-FCM-Lasso algorithm on seeds and sonar datasets in Table 7. Therefore, we can say that the proposed S-FCM-Lasso method is faster than the other classic and competing algorithms mentioned above. Similarly, we analyzed the behavior of our proposed S-FCM-Lasso algorithm for other datasets and compared it with the competing algorithms mentioned above. In conclusion, as shown in Table 8, our proposed S-FCM-Lasso algorithm performed better than the other algorithms across numerous real datasets. It demonstrated highly efficient and successful results compared to the other algorithms. For S-FCM to obtain a tuning parameter value by using a gap statistic consumes more time than for S-FCM-Lasso. The S-FCM-Lasso method performed better than the FCM, S-KM, S-FCM, S-PCM1, and S-PCM2 algorithms.

5. Conclusions

In this article, we proposed our clustering algorithm S-FCM-Lasso, which not only efficiently and successfully seeks out significant features but is also capable of eliminating insignificant ones by adding sparsity to feature weights. On both simulated and real datasets, the algorithm demonstrated superior clustering results. The S-FCM-Lasso algorithm has a good capacity to identify and discard redundant features since it can pick out key features by assigning nonzero weights to relevant features and zero weights to irrelevant features. Using the Lasso penalty, the proposed algorithm is capable of achieving sparsity with feature weights by shredding irrelevant features down to exactly zero. In S-FCM, sparsity depends on the value of the tuning parameter ψ , which can be achieve by gap statistics and is more complicated and time-consuming. The competing S-FCM algorithm applies a more complex gap statistics procedure, while our proposed S-FCM-Lasso applies a much simpler algorithmic convergence. The sparse feature weights cannot be produced with a high ψ value. A tuning parameter that is suitably small is required to achieve the optimum results.
As a result, we can conclude that S-FCM-Lasso has shown superior performance over all existing algorithms via both synthetic and real datasets, performs excellently, and has shown efficient clustering results with a feature selection ability. We are able to say that proposed S-FCM-Lasso outperforms the other traditional FCM and existing S-KM, S-FCM, S-PCM1, and S-PCM2 algorithms. Moreover, the proposed S-FCM-Lasso performed better on both synthetic and real datasets and also showed good clustering structures. In fuzzy c-means, data points can be grouped into multiple clusters with varying degrees of membership, while k-means performs hard clustering, where each data point is assigned exclusively to one cluster. On the other hand, S-FCM-Lasso and S-PCM2 have greater flexibility in accommodating data than S-KM, which leads us to conclude that they are more flexible than S-KM. Furthermore, our proposed S-FCM-Lasso algorithm has shown an excellent ability to pick out the relevant features by assigning them nonzero weights and discarding the irrelevant features towards exactly zero by taking advantage of the Lasso penalty. In summary, S-FCM-Lasso has shown powerful clustering results and excellent cluster structures with a good capability for feature selection that performs more efficiently than S-KM, S-FCM, S-PCM1, and S-PCM2 on both synthetic and real datasets. Since a point symmetry-based distance [25] can represent better behavior for cluster symmetry [26] than Euclidean distance, in our further work, we will extend the S-FCM-Lasso by using a point symmetry-based distance so that it can present better cluster symmetry.

Author Contributions

Conceptualization, S.P. and M.-S.Y.; methodology, S.P. and M.-S.Y.; validation, S.P.; formal analysis, S.P.; investigation, S.P. and M.-S.Y.; writing—original draft preparation, S.P.; writing—review and editing, M.-S.Y.; supervision, M.-S.Y.; funding acquisition, M.-S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the National Science and Technology Council, Taiwan, under Grant NSTC 113-2118-M-033-001.

Data Availability Statement

All data generated or analyzed during this study are included in this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Banfield, J.D.; Raftery, A.E. Model-based Gaussian and non-Gaussian clustering. Biometrics 1993, 49, 803–821. [Google Scholar] [CrossRef]
  2. Yu, J.; Chaomurilige, C.; Yang, M.S. On convergence and parameter selection of the EM and DA-EM algorithms for Gaussian mixtures. Pattern Recognit. 2018, 77, 188–203. [Google Scholar] [CrossRef]
  3. Hines, P.; Pothos, E.M.; Chater, N. A non-parametric approach to simplicity clustering. Appl. Artif. Intell. 2007, 21, 729–752. [Google Scholar] [CrossRef]
  4. Efimov, K.; Adamyan, L.; Spokoiny, V. Adaptive nonparametric clustering. IEEE Trans. Inf. Theory 2019, 65, 4875–4892. [Google Scholar] [CrossRef]
  5. Roux, M. A comparative study of divisive and agglomerative hierarchical clustering algorithms. J. Classif. 2018, 35, 345–366. [Google Scholar] [CrossRef]
  6. Cohen-Addad, V.; Kanade, V.; Mallmann-Trenn, F.; Mathieu, C. Hierarchical clustering: Objective functions and algorithms. J. ACM 2019, 66, 26. [Google Scholar] [CrossRef]
  7. Chang-Chien, S.J.; Hung, W.L.; Yang, M.S. On mean shift-based clustering for circular data. Soft Comput. 2012, 16, 1043–1060. [Google Scholar] [CrossRef]
  8. Cariou, C.; Le Moan, S.; Chehdi, K. A novel mean-shift algorithm for data clustering. IEEE Access 2022, 10, 14575–14585. [Google Scholar] [CrossRef]
  9. von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
  10. Huang, D.; Wang, C.D.; Wu, J.S.; Lai, J.H. Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng. 2020, 32, 1212–1226. [Google Scholar] [CrossRef]
  11. Al-sharoa, E.; Aviyente, S. A unified spectral clustering approach for detecting community structure in multilayer networks. Symmetry 2023, 15, 1368. [Google Scholar] [CrossRef]
  12. Jain, A.K. Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
  13. Huang, S.; Kang, Z.; Xu, Z.; Liu, Q. Robust deep k-means: An effective and simple method for data clustering. Pattern Recognit. 2021, 117, 107996. [Google Scholar] [CrossRef]
  14. Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
  15. Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Kluwer Academic Publishers: London, MA, USA, 1981. [Google Scholar]
  16. Chang, S.T.; Lu, K.P.; Yang, M.S. Fuzzy change-point algorithms for regression models. IEEE Trans. Fuzzy Syst. 2015, 23, 2343–2357. [Google Scholar] [CrossRef]
  17. Rout, R.; Parida, P.; Alotaibi, Y.; Alghamdi, S.; Khalaf, O.I. Skin lesion extraction using multiscale morphological local variance reconstruction based watershed transform and fast fuzzy c-means clustering. Symmetry 2021, 13, 2085. [Google Scholar] [CrossRef]
  18. Krishnapuram, R.; Keller, J.M. A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1993, 1, 98–110. [Google Scholar] [CrossRef]
  19. Yang, M.S.; Chang-Chien, S.J.; Nataliani, Y. A fully-unsupervised possibilistic c-means clustering method. IEEE Access 2018, 6, 78308–78320. [Google Scholar] [CrossRef]
  20. Zeng, W.; Liu, Y.; Cui, H.; Ma, R.; Xu, Z. Interval possibilistic C-means algorithm and its application in image segmentation. Inf. Sci. 2022, 612, 465–480. [Google Scholar] [CrossRef]
  21. Bouveyron, C.; Girard, S.; Schmid, C. High-Dimensional Data Clustering. Comput. Stat. Data Anal. 2007, 52, 502. [Google Scholar] [CrossRef]
  22. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  23. Witten, D.M.; Tibshirani, R. A framework for feature selection in clustering. J. Am. Stat. Assoc. 2010, 105, 713–726. [Google Scholar] [CrossRef] [PubMed]
  24. Qiu, X.; Qiu, Y.; Feng, G.; Li, P. A sparse fuzzy c-means algorithm base on sparse clustering framework. Neurocomputing 2015, 157, 290–295. [Google Scholar] [CrossRef]
  25. Yang, M.S.; Benjamin, J.B.M. Sparse possibilistic c-means clustering with Lasso. Pattern Recognit. 2023, 138, 109348. [Google Scholar] [CrossRef]
  26. Bandyopadhyay, S.; Saha, S. A point symmetry-based clustering technique for automatic evolution of clusters. IEEE Trans. Knowl. Data Eng. 2008, 20, 1441–1457. [Google Scholar] [CrossRef]
  27. Su, M.C.; Chou, C.H. A Modified Version of the k-Means Algorithm with a Distance Based on Cluster Symmetry. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 674–680. [Google Scholar]
  28. Fu, J.; Hsiao, C.A. Decoding intelligence via symmetry and asymmetry. Sci. Rep. 2024, 14, 12525. [Google Scholar] [CrossRef] [PubMed]
  29. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
  30. Ruspini, E.H. A new approach to clustering. Inf. Control 1969, 15, 22–32. [Google Scholar] [CrossRef]
  31. Dunn, J.C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 1973, 3, 32–57. [Google Scholar] [CrossRef]
  32. Baraldi, A.; Blonda, P. A survey of fuzzy clustering algorithms for pattern recognition-part I and II. IEEE Trans. Syst. Man Cybern. 1999, 29, 778–801. [Google Scholar] [CrossRef] [PubMed]
  33. Fazendeiro, P.; Oliveira, J.V.D. Observer-Biased Fuzzy Clustering. IEEE Trans. Fuzzy Syst. 2015, 23, 85–97. [Google Scholar] [CrossRef]
  34. Chaomurilige, C.; Yu, J.; Yang, M.S. Deterministic annealing Gustafson-Kessel fuzzy clustering algorithm. Inf. Sci. 2017, 417, 435–453. [Google Scholar] [CrossRef]
  35. Kalaycı, T.A.; Asan, U. Improving classification performance of fully connected layers by fuzzy clustering in transformed feature spac. Symmetry 2022, 14, 658. [Google Scholar] [CrossRef]
  36. Rand, W.M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar]
  37. Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 1991. [Google Scholar]
  38. Jaccard, P. Distribution de la flore alpine dans le bassin des dranses et dans quelques regions voisines. Bull. Soc. Vaudoise Sci. Nat. 1901, 37, 241–272. [Google Scholar]
  39. Fowlkes, E.B.; Mallows, C.L. A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 1983, 78, 553–569. [Google Scholar] [CrossRef]
  40. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
  41. Blake, C.L.; Merz, C.J. UCI Repository of Machine Learning Databases, a Huge Collection of Artificial and Real-World Data Sets 1998. Available online: http://www.ics.uci.edu/~mlearn/MLRepository.html (accessed on 15 July 2024).
  42. Rayana, S. ODDS Library; Department of Computer Science, Stony Brook University: Stony Brook, NY, USA, 2016; Available online: http://odds.cs.stonybrook.edu (accessed on 15 July 2024).
Figure 1. The graphic illustration of a comparison of different algorithms.
Figure 1. The graphic illustration of a comparison of different algorithms.
Symmetry 16 01208 g001
Figure 2. Three-dimensional graph of the first three features for FCM, S-FCM, S-PCM1, and S-PCM2 (ad); performance of the S-FCM-Lasso algorithm with respect to true and predicted labels, respectively (e,f).
Figure 2. Three-dimensional graph of the first three features for FCM, S-FCM, S-PCM1, and S-PCM2 (ad); performance of the S-FCM-Lasso algorithm with respect to true and predicted labels, respectively (e,f).
Symmetry 16 01208 g002
Figure 3. Performances of FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso in form of bar graphs for increasing number of dimensions (100–1000) based on average evaluation measures (a) AR, (b) RI, (c) NMI, (d) JI, (e) FMI, and (f) RT, respectively.
Figure 3. Performances of FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso in form of bar graphs for increasing number of dimensions (100–1000) based on average evaluation measures (a) AR, (b) RI, (c) NMI, (d) JI, (e) FMI, and (f) RT, respectively.
Symmetry 16 01208 g003aSymmetry 16 01208 g003b
Figure 4. Performances of FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso in form of bar graphs for increasing number of sample sizes (100–1000) based on average evaluation measures (a) AR, (b) RI, (c) NMI, (d) JI, (e) FMI, and (f) RT respectively.
Figure 4. Performances of FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso in form of bar graphs for increasing number of sample sizes (100–1000) based on average evaluation measures (a) AR, (b) RI, (c) NMI, (d) JI, (e) FMI, and (f) RT respectively.
Symmetry 16 01208 g004aSymmetry 16 01208 g004b
Figure 5. The illustration is based on the comparison of different algorithms vs. the number of iterations.
Figure 5. The illustration is based on the comparison of different algorithms vs. the number of iterations.
Symmetry 16 01208 g005
Figure 6. Three-dimensional graph of the first three features for FCM, S-FCM, S-PCM1, and S-PCM2 (ad); performances of S-FCM-Lasso with respect to true and predicted labels, respectively (e,f).
Figure 6. Three-dimensional graph of the first three features for FCM, S-FCM, S-PCM1, and S-PCM2 (ad); performances of S-FCM-Lasso with respect to true and predicted labels, respectively (e,f).
Symmetry 16 01208 g006
Figure 7. Illustration of number of iterations based on different algorithms for iris dataset.
Figure 7. Illustration of number of iterations based on different algorithms for iris dataset.
Symmetry 16 01208 g007
Figure 8. Three-dimensional graphs of the first three features for FCM, S-FCM, S-PCM1, and S-PCM2 (ad); performance of the S-FCM-Lasso algorithm with respect to true and predicted labels (e,f) based on the iris experiment.
Figure 8. Three-dimensional graphs of the first three features for FCM, S-FCM, S-PCM1, and S-PCM2 (ad); performance of the S-FCM-Lasso algorithm with respect to true and predicted labels (e,f) based on the iris experiment.
Symmetry 16 01208 g008
Table 1. Performance comparisons between the classic FCM, S-FCM, S-PCM1, and S-PCM2 and S-FCM-Lasso based on average AR, RI, NMI, JI, FMI, and RT.
Table 1. Performance comparisons between the classic FCM, S-FCM, S-PCM1, and S-PCM2 and S-FCM-Lasso based on average AR, RI, NMI, JI, FMI, and RT.
AlgorithmsEvaluation Criteria
AR (Avg.)RI (Avg.)NMI (Avg.)JI (Avg.)FMI (Avg.)RT (Avg.)
FCM0.91670.94440.72180.84620.90000.0510
S-FCM0.98410.98940.94230.96880.98412.4758
S-PCM10.96500.54800.00000.54800.74000.0690
S-PCM20.96700.81000.72300.62400.77300.0630
S-FCM-Lasso1.00001.00000.99990.99990.99990.0401
Table 2. Comparisons of algorithms FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso through AR, RI, NMI, JI, FMI, and RT.
Table 2. Comparisons of algorithms FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso through AR, RI, NMI, JI, FMI, and RT.
AlgorithmsEvaluation Criteria
AR (Avg.)RI (Avg.)NMI (Avg.)JI (Avg.)FMI (Avg.)RT (Avg.)
FCM0.95470.56390.95480.18890.31780.0246
S-FCM0.99870.99910.99400.99740.99873.3712
S-PCM10.99900.97200.97000.95000.97200.0900
S-PCM20.99500.85500.82100.74100.84700.1230
S-FCM-Lasso1.00001.00001.00001.00001.00000.0212
Table 3. Results of feature weights based on S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso for iris dataset.
Table 3. Results of feature weights based on S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso for iris dataset.
AlgorithmsFeature Weights
PLPWSLSW
S-FCM0.00000.00000.41130.9115
S-PCM10.11800.00000.98100.1560
S-PCM20.01500.00000.24000.0000
S-FCM-Lasso0.00000.00000.14340.8566
Table 4. Performances of FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso with nonzero weights.
Table 4. Performances of FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso with nonzero weights.
AlgorithmsEvaluation Criteria
AR (Avg.)RI (Avg.)NMI (Avg.)JI (Avg.)FMI (Avg.)RT (Avg.)
FCM0.95800.95000.83770.88670.93700.1028
S-FCM0.97000.96000.86420.92230.95003.3404
S-PCM10.96900.76300.71900.57800.75800.0850
S-PCM20.96200.81000.69500.65000.79300.0780
S-FCM-Lasso0.99970.97330.87050.92340.96000.0742
Table 5. Comparisons of algorithms FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso based on average (Avg.) evaluation measures AR, RI, NMI, JI, FMI, and RT for the abalone dataset.
Table 5. Comparisons of algorithms FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso based on average (Avg.) evaluation measures AR, RI, NMI, JI, FMI, and RT for the abalone dataset.
AlgorithmsEvaluation Criteria
AR (Avg.)RI (Avg.)NMI (Avg.)JI (Avg.)FMI (Avg.)RT (Avg.)
FCM0.75240.82420.44520.55430.75240.1380
S-FCM0.95620.83900.43890.55460.75352.0369
S-PCM10.95820.55600.04200.54400.74102.0272
S-PCM20.96000.50300.04000.05300.67200.1273
S-FCM-Lasso0.99970.83920.44840.60920.75710.1261
Table 6. Comparisons of algorithms FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso based on average (Avg.) evaluation measures AR, RI, NMI, JI, FMI, and RT for the diabetes dataset.
Table 6. Comparisons of algorithms FCM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso based on average (Avg.) evaluation measures AR, RI, NMI, JI, FMI, and RT for the diabetes dataset.
AlgorithmsEvaluation Criteria
AR (Avg.)RI (Avg.)NMI (Avg.)JI (Avg.)FMI (Avg.)RT (Avg.)
FCM0.75240.83490.43760.49880.75240.1482
S-FCM0.92000.80900.43890.55460.71352.0369
S-PCM10.95600.55990.05100.55830.75002.0590
S-PCM20.96000.55330.04000.05300.67200.1293
S-FCM-Lasso0.99990.83810.45890.60920.75910.1240
Table 7. Based on real datasets, the performance evaluation and potential of S-FCM1 and S-FCM2 for the remaining features.
Table 7. Based on real datasets, the performance evaluation and potential of S-FCM1 and S-FCM2 for the remaining features.
Real DatasetsData Points (n)Features (p)Clusters (c)No. of Remaining Features
S-FCMS-FCM-Lasso
Breast cancer5693021012
Vertebral3106211
Seeds2107322
Sonar2086023335
Lung3091341110
Diabetes7688233
Abalone41778344
Table 8. A comparison of the performance of FCM, S-KM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso based on average AR, RI, NMI, JI, and FMI using different real datasets.
Table 8. A comparison of the performance of FCM, S-KM, S-FCM, S-PCM1, S-PCM2, and S-FCM-Lasso based on average AR, RI, NMI, JI, and FMI using different real datasets.
DatasetsEvaluation IndicatorsARRINMIJIFMI
Breast Cancerfuzzy c-means (FCM)0.85410.43060.43710.07870.1459
sparse k-means (S-KM)0.99400.89500.68000.82700.9050
S-FCM0.97000.80900.43890.55460.7135
S-PCM10.99600.52800.04400.51900.7090
S-PCM20.99700.88400.71600.80300.8910
S-FCM-Lasso0.99990.92280.73720.84540.9541
Vertebralfuzzy c-means (FCM)0.86350.44020.43320.07260.1654
sparse k-means (S-KM)0.99200.51000.09900.47100.8830
S-FCM0.99200.68820.21680.36260.5323
S-PCM10.99400.78000.00000.78000.8830
S-PCM21.00000.55400.15500.13100.6690
S-FCM-Lasso1.00000.85700.54610.66710.7852
Sonarfuzzy c-means (FCM)0.86670.47730.05130.09230.1660
sparse k-means (S-KM)0.98100.52000.02800.05000000.6890
S-FCM0.98200.71790.03570.40540.5769
S-PCM11.00000.89900.00000.053000.9480
S-PCM21.00000.50300.04000.05300.6720
S-FCM-Lasso1.00000.72010.04820.58330.6993
Arrhythmiafuzzy c-means (FCM)0.85330.46770.43020.09520.1703
sparse k-means (S-KM)1.00000.53400.01400.11900.6460
S-FCM0.99220.87230.44600.48720.4502
S-PCM10.99200.49800.00000.4980 0.7060
S-PCM21.00000.65700.01100.10100.7750
S-FCM-Lasso1.00000.88130.46320.48960.9332
Seedsfuzzy c-means (FCM)0.75240.83120.44760.50310.7524
sparse k-means (S-KM)0.96500.82700.64200.59000.7420
S-FCM0.95640.81230.62200.57620.7232
S-PCM10.97320.83170.42330.59720.7476
S-PCM20.97200.74100.56800.52200.7000
S-FCM-Lasso0.97530.83810.64000.60920.7571
Cardiofuzzy c-means (FCM)0.82240.80420.43750.40310.7424
sparse k-means (S-KM)0.99800.50400.00700.45700000.6460
S-FCM0.99800.80900.43890.45460.7135
S-PCM11.00000.82600.00000.05500.9090
S-PCM21.00000.82600.00000.09600.9090
S-FCM-Lasso1.00000.84210.44840.60920.9071
Lungsfuzzy c-means (FCM)0.85520.61170.43750.26380.4175
sparse k-means (S-KM)0.96100.75300.60700.56500.7210
S-FCM0.99600.67800.00420.51720.4822
S-PCM10.96100.49500.00000.49500.7030
S-PCM20.93600.61500.31300.40700.5790
S-FCM-Lasso0.99860.67850.02880.51780.4822
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Parveen, S.; Yang, M.-S. Sparse Fuzzy C-Means Clustering with Lasso Penalty. Symmetry 2024, 16, 1208. https://doi.org/10.3390/sym16091208

AMA Style

Parveen S, Yang M-S. Sparse Fuzzy C-Means Clustering with Lasso Penalty. Symmetry. 2024; 16(9):1208. https://doi.org/10.3390/sym16091208

Chicago/Turabian Style

Parveen, Shazia, and Miin-Shen Yang. 2024. "Sparse Fuzzy C-Means Clustering with Lasso Penalty" Symmetry 16, no. 9: 1208. https://doi.org/10.3390/sym16091208

APA Style

Parveen, S., & Yang, M. -S. (2024). Sparse Fuzzy C-Means Clustering with Lasso Penalty. Symmetry, 16(9), 1208. https://doi.org/10.3390/sym16091208

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop