Next Article in Journal
Enhancing Properties of Aerospace Alloy Elektron 21 Using Boron Carbide Nanoparticles as Reinforcement
Next Article in Special Issue
Leveraging User Comments for Recommendation in E-Commerce
Previous Article in Journal
Antibacterial Activity and Impact of Different Antiseptics on Biofilm-Contaminated Implant Surfaces
Previous Article in Special Issue
An Improved Shuffled Frog-Leaping Algorithm for Solving the Dynamic and Continuous Berth Allocation Problem (DCBAP)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Weighted z-Distance-Based Clustering and Its Application to Time-Series Data

1
Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 804, Taiwan
2
Department of Electrical Engineering, Intelligent Electronic Commerce Research Center, National Sun Yat-Sen University, Kaohsiung 804, Taiwan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2019, 9(24), 5469; https://doi.org/10.3390/app9245469
Submission received: 6 October 2019 / Revised: 5 December 2019 / Accepted: 9 December 2019 / Published: 12 December 2019
(This article belongs to the Special Issue New Frontiers in Computational Intelligence)

Abstract

:
Clustering is the practice of dividing given data into similar groups and is one of the most widely used methods for unsupervised learning. Lee and Ouyang proposed a self-constructing clustering (SCC) method in which the similarity threshold, instead of the number of clusters, is specified in advance by the user. For a given set of instances, SCC performs only one training cycle on those instances. Once an instance has been assigned to a cluster, the assignment will not be changed afterwards. The clusters produced may depend on the order in which the instances are considered, and assignment errors are more likely to occur. Also, all dimensions are equally weighted, which may not be suitable in certain applications, e.g., time-series clustering. In this paper, improvements are proposed. Two or more training cycles on the instances are performed. An instance can be re-assigned to another cluster in each cycle. In this way, the clusters produced are less likely to be affected by the feeding order of the instances. Also, each dimension of the input can be weighted differently in the clustering process. The values of the weights are adaptively learned from the data. A number of experiments with real-world benchmark datasets are conducted and the results are shown to demonstrate the effectiveness of the proposed ideas.

1. Introduction

In the field of artificial intelligence, clustering techniques play a very important role [1,2]. Clustering is an unsupervised learning technology, with the purpose of forming meaningful clusters for the unlabeled data instances under consideration. Intuitively, similar instances are to be grouped in a cluster and different instances grouped in different clusters. Clustering has been widely utilized in a variety of applications, such as revealing the internal structure of the data [3], deriving segmentation of the data [4,5], preprocessing the data for other artificial intelligence (AI) techniques [6,7], business intelligence [1,8], and knowledge discovery in data [9,10]. For example, in electronic text processing [11,12,13], clustering is used to reduce the dimensionality in order to improve the efficiency of the processing, or to ease the curse of dimensionality encountered for high-dimensional problems. In the recommendation applications in e-commerce [14], the size of the information matrix is reduced by clustering to enhance the efficiency of making recommendations. In power systems, clustering helps predict the trend of electricity demand in the future [15]. In stock market forecasting and social media data analysis [1,16], clustering technology is an important and indispensable core key. Therefore, developing good clustering technology is a very critical issue.
Many types of clustering algorithms have been proposed [2,17]. Similarity or distance measures are core components to cluster similar data into the same clusters, while dissimilar or distant data are placed into different clusters [18]. Centroid-based clustering [19,20,21,22,23,24,25,26] groups data instances in an exclusive way. If an instance belongs to a definite cluster, it could not be included in another cluster. K-means is one such algorithm, well-known in the AI community. To use it, the user has to provide the desired number of clusters, K, in advance. Each instance is assigned to the nearest cluster center. Then the K cluster centers are re-estimated. This process is repeated until the cluster centers are stable. Self-organizing mapping (SOM) employs a set of representatives. When a vector is presented, all representatives compete with each other. The winner is updated so as to move toward the vector. Hierarchical clustering [27,28,29,30,31,32] creates a hierarchical decomposition of the set of data instances using some criteria. Two strategies, bottom-up and top-down, are adopted in hierarchical algorithms. The user usually has to decide how many and what clusters are most desirable from the offered hierarchy of clusters. Distribution-based clustering [33,34,35,36,37,38] is based on distribution models. Fuzzy C-means uses fuzzy sets to cluster instances, and data instances are bound to each cluster by means of a membership function. Therefore, each instance may belong to several clusters with different degrees of membership. Gaussian mixture model-Expectation maximization (GMM-EM) uses a completely probabilistic approach. Each cluster is mathematically represented by a parametric distribution, e.g., a Gaussian, and the entire data set is therefore modeled by a mixture of these distributions. Density-based algorithms [39,40,41], e.g., Density-based spatial clustering of applications with noise (DBSCAN), consider clusters as dense regions in the instance space. Most of them do not impose any shape restrictions on the resulting clusters. However, they differ in the way the density is defined. Subspace clustering [42,43,44,45,46,47], e.g., Clustering in quest (CLIQUE), searches for clusters within the various subspaces of the feature space. It reveals the clusters as well as the subspaces where they reside. Also, message passing [48,49], a recent development in computer science and statistical physics, has led to the creation of new types of clustering algorithms.
A special type of clustering is time-series clustering. A sequence composed of continuous, real-valued temporal data is known as a time series. Time-series data are of interest in various areas, e.g., engineering, business, finance, economics, and healthcare. There are good surveys and reviews published on time-series clustering [50,51,52]. In [50], time-series clustering methods are grouped into three major categories, raw-data-based, feature-based, and model-based approaches. Raw-data-based clustering methods work with raw data, either in the time or frequency domain. The two time series being compared are normally sampled at the same interval. Clustering based on raw data implies working with high-dimensional space. It is also not desirable to work directly with the raw data that are highly noisy. Feature-based clustering methods have been proposed to address these concerns. Although most feature extraction methods are generic in nature, the extracted features are usually application dependent. Model-based clustering methods consider each time series being generated by some kind of model or by a mixture of underlying probability distributions. Time series are considered similar when the models characterizing individual series are similar. However, model-based approaches usually have scalability problems, and its performance reduces when the clusters are close to each other. In [51], time-series clustering approaches are classified into six groups: partitioning, hierarchical, grid-based, model-based, density-based clustering, and multi-step clustering algorithms. A partitioning clustering method, e.g., K-means, makes k groups from unlabeled objects in the way that each group contains at least one object. Hierarchical clustering is an approach of cluster analysis that makes a hierarchy of clusters using agglomerative or divisive algorithms. Model-based clustering, e.g., SOM, assumes a model for each cluster, and finds the best fit of data to that model. In density based clustering [41], clusters are subspaces of dense objects which are separated by subspaces in which objects have low density. The grid-based methods quantize the space into a finite number of the cells that form a grid, and then perform clustering on the grid’s cells [53]. A multi-step approach presented in [54] uses a three-phase method: (1) pre-clustering of time series; (2) purifying and summarization; and (3) merging, to construct the clusters based on similarity in shape.
Lee and Ouyang proposed a self-constructing clustering (SCC) algorithm [20] which has been applied in various applications [4,6,13,14,16,55]. SCC is an exclusive clustering method. For a given training set, SCC performs only one training cycle on the training instances. Initially, no clusters exist. Training instances are considered one by one. If an instance is closer enough to an existing cluster, the instance is assigned to the most suitable cluster. Otherwise, a new cluster is created and the instance is assigned to it. SCC offers several advantages. First, the algorithm runs through the training instances only one time, so it is fast. Second, the distributions of the data are statistically characterized. Third, the similarity threshold, instead of the number of clusters, is specified in advance by the user. However, once an instance is assigned to a cluster, the assignment will not be changed afterwards. The clusters produced may depend on the order the instances are considered, and assignment errors are more likely to occur. As a result, the accuracy of the result can be low. Also, all dimensions are equally weighted in the clustering process, which may not be suitable in certain applications, e.g., time-series clustering [56,57,58].
In this paper, we propose improvements to the SCC method to overcome its shortcomings. Two or more training cycles on the instances are performed. In each cycle, training instances are considered one by one. An instance can be added into or removed from a cluster, and thus it is allowed to be re-assigned to another cluster. A desired number of clusters is obtained when all the assignments are stable, i.e., no assignment has been changed, in the current cycle. In this way, the clusters produced are less likely to be affected by the feeding order of the instances. Furthermore, each dimension can be weighted differently in the clustering process. The values of the weights are adaptively learned from the data. Having different weights is useful when certain relevance exists between different dimensions in many applications, e.g., clustering of time series data [56,57]. The effectiveness of the proposed ideas is demonstrated by a number of experiments conducted with real-world benchmark datasets.
The rest of this paper is organized as follows. SCC is briefly reviewed in Section 2. The proposed methods are described in Section 3 and Section 3.2, respectively. Experimental results are presented in Section 4. Finally, a conclusion is given in Section 5.

2. Self-Constructing Clustering (SCC)

Suppose X = { x i | 1 i N } is a finite set of N unlabeled training instances, where x i = [ x i , 1     x i , n ] T n is the ith instance. Each instance is a vector with n features. SCC [20,55] does clustering in a progressive way. Only one training cycle on the instances is performed. Each cluster is described by a Gaussian-like membership function characterized by the center and deviation induced from the data assigned to the cluster. The instances are considered one by one sequentially, and clusters are created incrementally.
Let J denote the number of currently existing clusters. Initially, no clusters exist and thus J = 0. When instance 1 comes in, the first cluster, C 1   , is created and instance is assigned to it and J = 1. Then, for instance i, which is x i   , i 2   , the z-distance between instance i and every existing cluster, C j   , is calculated by
Z ( i , j ) = k = 1 n ( x i , k c j , k σ j , k ) 2
for j = 1 , 2 ,   , J   . Note that c j = [ c j , 1     c j , n ] T   and σ j = [ σ j , 1     σ j , n ] T denote the mean and deviation, respectively, induced from the instances assigned to cluster C j   . The membership degree (MD) of instance i belonging to cluster C j is defined as:
μ j ( x i ) = exp { Z ( i , j ) }
with the value lying in the range of (0, 1]. Let S j denote the number of instances assigned to cluster C j   . There are two cases:
  • If all the MD’s are less than ρ n   , where ρ is a pre-specified similarity threshold in one dimension, i.e.,
    exp { Z ( i , j ) } < ρ n
    for all existing clusters C j   , 1 j J   , a new cluster, C J + 1   , is created and instance i is assigned to it by
    c J + 1 = x i , σ J + 1 = [ σ 0     σ 0 ] T , S J + 1 = 1 , J J + 1
    where σ 0 is a pre-specified constant.
  • Otherwise, instance i is assigned to the cluster with the largest MD, say cluster C a   , i.e., center c a = [ c a , 1 c a , n ] T   , deviation σ a = [ σ a , 1 σ a , n ] T   , and size S a   , are updated by
    σ a , k { ( S a 1 ) ( σ a , k σ 0 ) 2 + S a c a , k 2 + x i , k 2 S a S a + 1 S a [ S a c a , k + x i , k S a + 1 ] 2 } 1 2 + σ 0 ,   1 k n ,
      c a S a c a + x i S a + 1 ,  
      S a S a + 1 .  
Note that J is not changed in the latter case.
When all the instances have been considered, SCC stops with J clusters. Algorithm 1 can be summarized below.
Algorithm 1. SCC
   for each instance, instance i, 1 i N  
     Compute Z(i, j), 1 j J   ;
     if Equation (3) holds
       Create a new cluster according to Equation (4);
     else
       Instance i is assigned to the cluster with largest MD, according to Equation (5);
     end if
   end for
  end SCC
Note that SCC takes the training set X as input and outputs J clusters C 1 , , C J   . SCC runs fast due to running through the training instances only once. Unlike K-means, the similarity threshold, instead of the number of clusters, is specified in advance by the user.

3. Proposed Methods

The clusters produced by SCC may depend on the feeding order of the instances and the accuracy of the result can be low. Furthermore, all dimensions are equally weighted in the clustering process, which may not be appropriate in certain applications, e.g., time-series clustering.

3.1. Iterative SCC (SCC-I)

For convenience, the proposed approach is abbreviated as SCC-I, standing for the iterative version of SCC. SCC-I consists of multiple iterations. An instance is allowed to be re-assigned to another cluster. The clustering work stops when all the assignments are stable, i.e., no assignment will be changed. A training cycle on the instances is performed in each iteration.
In the first iteration, SCC is applied. Consider the rth iteration, r 2   . For any instance x i   , 1 i N   , we first remove it from the cluster, say C t   , which x i is assigned to. Three cases may arise:
  • If no instance remains assigned to C t   , C t is deleted and the other clusters are re-named to be C 1   , ...,   C J 1   . Then J is decreased by 1, i.e., J J 1   .
  • If only one instance remains assigned to C t   , then set c t to be this instance, set σ t = [ σ 0     σ 0 ] T   , and set S t = 1   .
  • Otherwise, the characteristics of C t are updated by.
    σ t , k { ( S t 1 ) ( σ t , k σ 0 ) 2 + S t c t , k 2 + x i , k 2 S t 2 S t 1 S t 2 [ S t c t , k + x i , k S t + 1 ] 2 } 1 2 + σ 0 ,   1 k n ,
      c t S t c t + x i S t + 1 ,  
      S t S t + 1 .  
Then, we calculate the z-distance and MD between x i and each existing cluster by Equations (1) and (2). A new cluster may be created and x i is assigned to it as following Equation (4), or x i is assigned to the cluster with the largest MD as following Equation (5).
The current rth iteration ends when all the instances are run through. If one of the cluster assignments has been changed in this iteration, the next iteration, i.e., the ( r + 1 )   th iteration, proceeds. Otherwise, the assignments are stable and SCC-I stops with J clusters. The SCC-I Algorithm 2 can be summarized below.
Algorithm 2. SCC-I
 Perform SCC on X;
repeat
  for each instance, instance i, 1 ≤ iN
   Remove instance i from its cluster and update the existing clusters;
   Compute Z(i, j), 1 ≤ jJ;
   Create a new cluster or assign instance i to the cluster with largest MD;
  end for
until assignments are stable;
end SCC-I
Note that SCC-I takes the training set X as input and outputs J clusters C 1   , ..., C J   . Since re-assignments are allowed to be done, SCC-I can produce clusters that are less likely to be affected by the feeding order of the instances. As a result, more stable clusters can be obtained by SCC-I. By “more stable” we mean both the number of clusters and the clusters produced are less likely to be affected by the feeding order of the instances. However, SCC runs faster than SCC-I since SCC-I performs more than one iteration. To illustrate the advantage of SCC-I over SCC, a simple example is given in Appendix A.

3.2. Weighted SCC-I (SCC-IW)

In SCC and SCC-I, each dimension is equally weighted in the calculation of z-distances, as shown in Equation (1). For some applications, e.g., time-series clustering, where certain relevance exists between different dimensions, allowing different dimensions to be weighted differently could be a useful idea [56]. This motivates the development of the weighted SCC-I, abbreviated as SCC-IW.
Let the weighted z-distance between instance i and cluster C j   , 1 i N   , 1 j J   , be defined as
Z w ( i , j ) = k = 1 n w j , k ( x i , k c j , k σ j , k ) 2
where w j = [ w j , 1     w j , n ] T is the weight vector associated with C j   , w j , k 0 for 1 k n   , and w j , 1 + + w j , n = 1   . Accordingly, the MD of instance i belonging to C j is defined as:
μ j ( x i ) = exp { Z w ( i , j ) } .
Clearly, we have 0 μ j ( x i ) 1   . If
exp { Z w ( i , j ) } < ρ
for all existing clusters C j   , 1 j J   , a new cluster is created; otherwise, instance i is assigned to the cluster with the largest MD, as described before.
Remark 1.
In Equation (3), the test is e x p { Z ( i , j ) } < ρ n where ρ is the pre-specified similarity threshold. Note that for e x p { Z ( i , j ) }   , we have
exp { Z ( i , j ) }   = exp { k = 1 n ( x i , k c j , k σ j , k ) 2 }     = exp { ( x i , 1 c j , 1 σ j , 1 ) 2 } exp { ( x i , n c j , n σ j , n ) 2 }     < ρ ρ = ρ n   .
For e x p { Z w ( i , j ) }   , we have
exp { Z w ( i , j ) }   = exp { k = 1 n w j , k ( x i , k c j , k σ j , k ) 2 }     = exp { w j , 1 ( x i , 1 c j , 1 σ j , 1 ) 2 } exp { w j , n ( x i , n c j , n σ j , n ) 2 }     = { exp { ( x i , 1 c j , 1 σ j , 1 ) 2 } } w j , 1 { exp { ( x i , n c j , n σ j , n ) 2 } } w j , n     < ρ w j , 1 ρ w j , n     = ρ w j , 1 + + w j , n = ρ .  
This is Equation (9).
Now the problem left is how the weights are determined to optimize the clustering work under consideration. Here, we consider time-series clustering as an exemplar application [56]. Firstly, it is required that the instances assigned to each cluster be close together as much as possible. In other words, we want to maximize:
j = 1 J i = 1 N u i , j μ j ( x i )
where u i , j = 1 if instance i is assigned to cluster C j and u i , j = 0 otherwise, for 1 i N and 1 j J   , and μ j ( x i ) is the MD defined in Equation (8). Note that μ j ( x i ) is an exponential function which is non-linear on w j , 1 , , w j , n   . Maximizing Equation (10) is a kind of non-linear optimization which is hard. However, maximizing μ j ( x i ) is identical to minimizing Z w ( i , j )   . Therefore, instead of maximizing Equation (10), we minimize:
j = 1 J i = 1 N u i , j Z w ( i , j ) .
Since Z w ( i , j ) is linear on w j , 1 , , w j , n   , minimizing Equation (11) is kind of linear optimization which is much easier. Secondly, since the neighboring dimensions are next to each other in the time line, the weights of neighboring dimensions should be close to each other. Therefore, we want to minimize:
j = 1 J i = 1 n 1 { w j , k w j , k + 1 } 2 .
Combining Equation (11) and Equation (12), together with the constraints on weights, we’d like the weights to minimize
j = 1 J i = 1 N u i , j Z w ( i , j ) + α j = 1 J k = 1 n 1 { w j , k w j , k + 1 } 2
  subject   to   k = 1 n w j , k = 1   , w j , k 0 ,   k = 1 ,   .   .   .   , n ,   1 j J  
which, by Equation (7), is equivalent to minimizing
j = 1 J i = 1 N u i , j k = 1 n w j , k ( x i , k c j , k σ j , k ) 2 + α j = 1 J k = 1 n 1 { w j , k w j , k + 1 } 2
  subject   to   k = 1 n w j , k = 1   , w j , k 0 ,   k = 1 ,   .   .   .   , n ,   1 j J  
where α is a positive real constant. Through quadratic programming, optimal values for the weights can be derived from Equation (14).
Now, we are ready to present the SCC-IW algorithm. We adopt Z w ( i ,   j )   , instead of Z ( i ,   j )   , in SCC-IW. Also, whenever a new cluster is created, its weights are each initialized to be 1 n   . At the end of the current iteration, we minimize Equation (14) to find optimal weights which will be used in the next iteration. The SCC-IW Algorithm 3 can be summarized below.
Algorithm 3. SCC-IW
 Perform SCC on X with weighted z-distance, and initialize the weights for each newly
  created cluster;
repeat
for each instance, instance i, 1 i N  
   Remove instance i from its cluster and update the existing clusters;
   Compute Z w ( i , j )   , 1 j J   ;
   Create a new cluster or assign instance i to the cluster with largest MD, and initialize the
     weights for each newly created cluster;
   end for
   Derive optimal weights by solving Equation (14) through quadratic programming;
  until assignments are stable;
end SCC-IW
Note that SCC-IW takes the training set X as input and outputs J clusters C 1   , ..., C J   .
It is not surprising that SCC-IW can work well. In each iteration, optimal weights are derived. For a dimension which is more useful for clustering, it is more important and is, therefore, given a larger weight. To illustrate how SCC-IW works, a simple example is given in Appendix B.

4. Experimental Results

In this section, we demonstrate empirically the superiority of our proposed methods. The proposed methods and others are applied to do clustering on benchmark datasets. Three external measures of evaluation, Fscore, Rand Index (RI), and Normalized Mutual Information (NMI) [59], and another three internal measures, Dunn index (DI), Davies–Bouldin index (DBI), and Silhouette index (SI) [60], are adopted.
Fscore = k = 1 K N k N max 1 j J { 2 N k j N k N k j N j N k j N k + N k j N j }
where K is the number of classes, J is the number of clusters, N is the size of the entire data set, N k j is the number of data instances belonging to class k in cluster j, N j is the size of cluster j, and N k is the size of class k. A higher Fscore is better.
RI = a + b N ( N 1 ) / 2
where a is the number of pairs of data instances having different class labels and belonging to different clusters, b is the number of pairs of data instances having the same cluster labels and belonging to the same clusters, and N is the size of the entire data set. A higher RI is better.
NMI = k = 1 K j = 1 J N k j log ( N * N k j N k * N j ) ( k = 1 K N k log N k N ) * ( j = 1 J N j log N j N )
where K is the number of classes, J is the number of clusters, N is the size of the entire data set, N k j is the number of data instances belonging to class k in cluster j, N j is the size of cluster j, and N k is the size of class k. A higher NMI is better.
DI = min 1 i J { min j i d min ( C i , C j ) max 1 l J diam ( C l ) }
where d min ( C i , C j ) is the minimum distance between clusters C i and C j   , and diam ( C l ) is the largest distance between the instances contained in cluster C l   . A higher DI is better.
DBI = 1 J i = 1 J max j i { avg ( C i ) + avg ( C j ) d cen ( C i , C j ) }
where avg ( C ) is the average of the distances between the instance contained in cluster C, and d cen ( C i , C j ) is the between the centers of clusters C i and C j   . A lower DBI is better.
SI = 1 N i = 1 N b ( i ) a ( i ) max { a ( i ) , b ( i ) }
where a ( i ) is the average of the distances between instance i and all other instances within the same cluster, b ( i ) is the lowest average distance of instance i to all instances in any other cluster of which instance i is not a member, and N is the size of the entire data set. A higher SI is better.

4.1. Non Time-Series Datasets

To illustrate the effectiveness of SCC-I, fourteen benchmark non-time-series datasets are selected from the UCI repository [61] for the experiments. The characteristics of these datasets are shown in Table 1. For example, there are 569 instances in the Breast dataset. Each instance has 30 features or dimensions, and belongs to one of 2 classes. For each dataset, an instance belongs to one and only one class. We compare SCC-I with Kmeans [62], DSKmeans [59], Fuzzy C-means (FCM) [63], Gaussian mixture model (Gmm) [64], DBSCAN [65], and SCC [20]. The codes for K-means, DBSCAN, and FCM are adopted from Matlab [66], and the code for Gmm is adopted from [67]. We wrote the codes for DSKmeans, SCC, and SCC-I in Matlab.
Table 2 shows comparisons of Fscore, RI, and NMI among different methods for each dataset. To have fair comparisons among different methods, the number of clusters are tuned to be identical to the number of classes for each dataset. For K-means, the number of clusters, k, is set to be equal to the number of classes. For DSKmeans, the parameter γ is set to be a value between 0.01 and 5, and η is between 0.01 and 0.3. For SCC and SCC-I, σ 0 and ρ are set to be values between 0.1 and 0.95. Also, each method performed 25 runs on a dataset and the averaged result is shown. For K-means and DSKmeans, each run started with a different set of initial seeds. For SCC and SCC-I, each run was given a different feeding order of the training instances. For FCM, the maximum number of iterations is set to be 50. For DBSCAN, ε is set to be a value between 0.1 and 3, while the minimum number of neighbors, minpts, required for a core point, is set to be between 1 and 10. In addition to the values of the measures, performance ranking is also indicated at the right side of ‘/’ for each dataset in Table 2. For example, consider the Breast dataset. The Fscore values are 0.9270, 0.8710, 0.9274, 0.7470, 0.7983, and 0.9264 by K-means, DSKmeans, FCM, Gmm, DBSCAN, SCC, and SCC-I, respectively. FCM has the best value, 0.9274, so it ranks as the first place, indicated by 1 at the right side of ‘/’; K-means has the second best value, 0.9270, so it ranks as the second place, indicated by 2 at the right side of ‘/’; etc. From this table, we can see that (1) SCC-I outperforms SCC significantly, and (2) SCC-I is no less effective than the other methods. Table 3 shows the averaged ranking of all the 14 datasets for each method. As can be seen, SCC-I is the best in Fscore and NMI, indicated by boldfaced numbers, and is the second best in RI.
Although the overall ranking of SCC-I is better than others, looking at individual results for each dataset there are some variations. For example, K-means outperforms SCC-I for Heart, Ionophere, and Seeds. Compared with K-means, SCC-I has advantages: (1) SCC-I considers deviation in the computation of distance; (2) SCC-I allows ellipsoidal shape of clusters. Note that SCC-I is less affected by the feeding order of instances, and thus can give a more stable and accurate clustering than SCC. However, given the number of clusters, the clusters obtained by K-means are not affected by the feeding order of instances. For the datasets with ellipsoidal shape of clusters, SCC-I is more likely to perform better on them. By contrast, for the datasets with spherical clusters, SCC-I may be inferior to K-means on such datasets. Table 2 also shows the execution time, in seconds, of each method on each dataset. The computer used for running the codes is equipped with Intel(R) Core(TM) i7-4770 CPU 3.40 GHz, 16 GB RAM, and Matlab R2011b. The times shown in the table only provide an idea of how efficiently these methods can run. Note that SCC-I runs slower than SCC. SCC only performs one training cycle on the instances, while SCC-I requires two or more training cycles. SCC-I takes more training time than other baselines due to some factors: (1) The codes of these baselines, e.g., K-means and FCM, were adopted from established websites, while SCC-I were written by graduate students; (2) SCC-I has to compute z-distances and Gaussian values, which is more computationally expensive; (3) In order to do re-assignment, the operation of removing instances from clusters is done during the clustering process. Comparisons of DBI, DI, and SI among different methods for each dataset are shown in Table 4. As can be seen from the table, SCC-I is better than other methods. SCC-I gets the lowest DBI for 9 out of 14 datasets, the highest DI for 9 out of 14 datasets, and the highest SI for 7 out of 14 datasets.
Now we use the paired t-test [68] to test the real significant differences between SCC-I and other methods. Table 5 shows the t-values based on the values of Fscore, NMI, and DBI, respectively, under the 90% confidence level. Note that we have 14 datasets involved. So the degree of freedom is 13 and the corresponding threshold is 1.771. From Table 5, we can see that all the t-values are greater than 1.771, except for Gmm with NMI. Therefore, by statistical test we observe that SCC-I shows significantly better performance than other methods. A multiple comparison test may be used since there are multiple algorithms involved. Analysis of variance (ANOVA) [69] provides such tests. We have tried with ANOVA in two ways. Firstly, we used ANOVA to return a structure that can be used to determine which pairs of algorithms are significantly different. The results similar to those shown in Table 5 were obtained. Secondly, we used ANOVA to test the null hypothesis that all algorithms perform equally well against the alternative hypothesis that at least one algorithm is different from the others. A p-value that is smaller than 0.05 indicates that at least one of the algorithms is significantly different from the others. However, it is not certain which ones are significantly different from each other.

4.2. Time-Series Datasets

Next, we show the effectiveness of SCC-IW in clustering time series data. Ten benchmark time series datasets are taken from the UCR repository [70] for the experiments. The characteristics of these datasets are shown in Table 6. In addition to the previous methods, we also compare with TSKmeans [56] here. We wrote the code for TSKmeans in Matlab.
Table 7 and Table 8 show the Fscore, RI, and NMI obtained by different methods for each dataset. Again, each method performed 25 runs on a dataset and the averaged result is shown. From these two tables, we can see that (1) SCC-IW outperforms both SCC and SCC-I significantly, and (2) SCC-IW is no less effective th.an the other methods. Table 8 shows the averaged ranking of all the 10 time series datasets for each method. As can be seen, SCC-IW is the best in Fscore, RI, and NMI. However, SCC-IW runs slower than SCC and SCC-I, since weights are involved in SCC-IW and they have to be optimally updated in each training cycle. Comparisons of DBI, DI, and SI among different methods for each dataset are shown in Table 9. As can be seen from the table, SCC-I is better than other methods. SCC-I gets the lowest DBI for 8 out of 10 datasets, the highest DI for 5 out of 10 datasets, and the highest SI for 4 out of 10 datasets. The main reason that SCCIW outperforms all the other competing classifiers in Table 10 is the consideration of both weights and deviations in SCC-IW. Gmm, SCC and SCC-I do not use weights. K-means, DSKmeans, and FCM do not use deviations nor weights in the clustering process. TSKmeans considers weights, but deviations are not involved.
Now we use the paired t-test [68] to test the real significant differences between SCC-IW and other methods. Table 11 shows the t-values based on the values of Fscore, RI, and DBI, respectively, at the 90% confidence level. Note that we have 10 datasets involved. So the degree of freedom is 9 and the corresponding threshold is 1.833. From the above table, we can see that all the t-values are greater than 1.833, except for DSKmeans and FCM with RI. Therefore, by statistical test we observe that SCC-IW shows significantly better performance than other methods.

4.3. Compairsons with Other Methods

Recently, evolutionary algorithms, e.g., simulated annealing and differential evolution, have been proposed to perform clustering [71]. The evolutionary algorithms can perform clustering using either a fixed or variable number of clusters and find the clustering that is optimal with respect to a certain validity index. Either a population of solutions or only one solution can be used. The single-solution-based evolutionary algorithms have smaller evaluation count but their solution quality is usually not as good as those that are population-based.
Siddiqi and Sait [72] propose a heuristic for data-clustering problems, abbreviated as FCM . It comprises two parts, a greedy algorithm and a single-solution-based heuristic. The first part selects the data points that can act as the centroids of well-separated clusters. The second part performs clustering with the objective of optimizing a cluster validity index. The proposed heuristic consists of five main components: (1) genes; (2) fitness of genes; (3) selection; (4) mutation operation; and (5) diversification. The objective functions used in the proposed heuristic are the Calinski–Harabasz index and Dunn index. Zhang et al. [73] propose a clustering algorithm, called ICFSKM, for clustering large dynamic data in industrial IoT. Two cluster operations, cluster creating and cluster merging, are defined to integrate the current pattern into the previous one for the final clustering result. Also, k-mediods is used for modifying the clustering centers according to the new arriving objects. Table 12 shows a DI comparison between SCC-I and HDC, and Table 13 shows a NMI comparison between SCC-I and ICFSKM, for some datasets. In these tables, the datasets Balance scale, Banknote authentication, Landsat satellite, Pen-based digits, Waveform-5000, and Wine are also selected from the UCI repository [61]. The results for HDC and ICFSKM are copied directly from [65,66]. We can see that SCC-I is comparable to HDC and ICFSKM. Note that HDC and ICFSKM are goal-oriented. For example, the objective functions used in HDC include the Dunn index. Therefore, DI values can be optimized by HDC intentionally. Furthermore, the objective function is usually computationally intensive and the evolutionary algorithms are considered to be slow.

4.4. Setting of α

In Equation (14), the difference between the weights of neighboring dimensions is controlled by α   . For α     = 0, no constraints are imposed on the weight differences. As α increases, neighboring dimensions are forced to be increasingly equally weighted and SCC-IW will behave more and more like SCC-I. Therefore, the setting of α could affect the performance of SCC-IW. In [56], a constant g is defined as
g = i = 1 N k = 1 n ( x i , k m k ) 2
where
m k = i = 1 N x i , k N
and it is shown empirically that the performance of TSKmeans varies with the value of α g [56]. Figure 1 shows how SCC-IW performs on four datasets with different values of α g   . Note that the horizontal line is scaled in log ( α g )   . When α g gets larger, SCC-IW performs increasingly like SCC-I.

5. Conclusions

SCC is an exclusive clustering method, performing only one training cycle on the training instances. Clusters are created incrementally. However, the clusters produced may depend on the feeding order in which the instances are considered, and assignment errors are more likely to occur. Also, all dimensions are equally weighted in the clustering process, which may not be suitable in certain applications, e.g., time-series clustering. We have presented two improvements, SCC-I and SCC-IW. SCC-I performs two or more training cycles iteratively, and allows instances to be re-assigned afterwards. In this way, the clusters produced are less likely to be affected by the feeding order of the instances. On the other hand, SCC-IW allows each dimension to be weighted differently in the clustering process. The values of the weights are adaptively learned from the data. Experiments have shown that SCC-IW performs effectively in clustering time-series data.
SCC-I and SCC-IW take more training time due to: (1) they have to compute z-distances and Gaussian values, which is computationally expensive; (2) in order to do re-assignment, the operation of removing instances from clusters is done during the clustering process. We will investigate these issues to reduce the training time in the future. Spectral clustering [74] and multidimensional scaling [75] deal with extraction of features from original ones. SCC-IW has probably some relationship with these methods since SCC-IW is also somehow able to extract some new “axis or principal components” by the adaptation of the weights. It will be interesting to explore such a relationship in the future.

Author Contributions

Conceptualization, Z.-Y.W. and S.-J.L.; methodology, Z.-Y.W. and S.-J.L; software, Z.-Y.W., C.-Y.W., and Y.-T.L.; validation, Z.-Y.W., C.-Y.W. and Y.-T.L.; formal analysis, Z.-Y.W. and S.-J.L.; investigation, S.-J.L.; resources, S.-J.L.; data curation, Z.-Y.W., C.-Y.W., and Y.-T.L.; writing—original draft preparation, Z.-Y.W., C.-Y.W., and Y.-T.L.; writing—review and editing, S.-J.L.; visualization, C.-Y.W. and Y.-T.L.; supervision, S.-J.L.; project administration, Y.-T.L.; funding acquisition, S.-J.L.

Funding

This research was funded by the Ministry of Science and Technology under grants MOST-103-2221-E-110-047-MY2, MOST-104-2622-E-110-014-CC3, and MOST-106-2221-E-110-080. The APC was funded by National Sun Yat-Sen University.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

A simple example is given here to illustrate how SCC-I works. Suppose X has the following 12 training instances:
x 1     = [ 0.30     0.60 ] T ;   x 2     = [ 0.70     0.35 ] T ;   x 3     = [ 0.50     0.52 ] T ;
  x 4     = [ 0.35     0.38 ] T ;   x 5     = [ 0.19     0.89 ] T ;   x 6     = [ 0.78     0.20 ] T   ;
  x 7     = [ 0.62     0.25 ] T ;   x 8     = [ 0.24     0.81 ] T ;   x 9     = [ 0.29     0.89 ] T   ;
  x 10   = [ 0.40     0.65 ] T ;   x 11   = [ 0.28     0.48 ] T ;   x 12   = [ 0.24     0.89 ] T .  
Note that N = 12 and n = 2   . These instances are shown in Figure A1a, marked as circles. Let ρ = 0.55 and σ 0 = 0.2   . Below, we consider two feeding orders of instances.
• The feeding order of x 1   , ...,   x 12   .
After performing SCC in the first iteration, there are six clusters: C 1   , C 2   , C 3   , C 4   , C 5   , and C 6   , as shown in Figure A1b, with
c 1     = [ 0.350     0.625 ] T ,   σ 1     = [ 0.271     0.235 ] T ;
  c 2     = [ 0.660     0.300 ] T ,   σ 2     = [ 0.257     0.271 ] T   ;
  c 3     = [ 0.500     0.520 ] T ,   σ 3     = [ 0.200     0.200 ] T   ;
  c 4     = [ 0.315     0.430 ] T ,   σ 4     = [ 0.250     0.271 ] T ;  
  c 5     = [ 0.240     0.870 ] T ,   σ 5     = [ 0.241     0.240 ] T ;  
  c 6     = [ 0.780     0.200 ] T ,   σ 6     = [ 0.200     0.200 ] T .  
The clusters are numbered and wrapped in dashed contours, with their centers marked with crosses. Instances x 1 and x 10 are assigned to C 1   , x 2 and x 7 are assigned to C 2   , x 3 is assigned to C 3   , x 4 and x 11 are assigned to C 4   , x 5   , x 8   , x 9   , and x 12 are assigned to C 5   , and x 6 is assigned to C 6   . After the second iteration, we have 4 clusters: C 1   , C 2   , C 3   , and C 4   , as shown in Figure A1c, with:
  c 1     = [ 0.370     0.563 ] T ,   σ 1     = [ 0.301     0.277 ] T   ;
  c 2     = [ 0.700     0.267 ] T ,   σ 2     = [ 0.280     0.276 ] T   ;
  c 3     = [ 0.350     0.380 ] T ,   σ 3     = [ 0.200     0.200 ] T   ;
  c 4     = [ 0.240     0.870 ] T ,   σ 4     = [ 0.241     0.240 ] T   .
Instances x 1   , x 3   , x 10   , and x 11 are assigned to C 1   , x 2   , x 6   , and x 7 are assigned to C 2   , x 4 is assigned to C 3   , and x 5   , x 8   , x 9   , and x 12 are assigned to C 4   . After the third iteration, we have 3 clusters: C 1   , C 2   , and C 3   , as shown in Figure A1d, with:
Figure A1. Clusters produced with the first feeding order.
Figure A1. Clusters produced with the first feeding order.
Applsci 09 05469 g0a1
c 1     = [ 0.366     0.526 ] T ,   σ 1     = [ 0.288     0.305 ] T ;
  c 2     = [ 0.700     0.267 ] T ,   σ 2     = [ 0.280     0.276 ] T   ;
  c 3     = [ 0.240     0.870 ] T ,   σ 3     = [ 0.241     0.240 ] T   .
Instances x 1   , x 3   , x 4   , x 10   , and x 11 are assigned to C 1   , x 2   , x 6   , and x 7 are assigned to C 2   , and x 5   , x 8   , x 9   , and x 12 are assigned to C 3   . In the fourth iteration, no assignment has been changed. Therefore, SCC-I stops with three clusters C 1   , C 2   , and C 3 as shown above, with 5, 3, and 4 instances assigned to them, respectively.
• The feeding order of x 9   , x 10   , x 4   , x 7   , x 5   , x 11   , x 2   , x 12   , x 3   , x 1   , x 6   , x 8   . After performing SCC in the first iteration, there are 5 clusters: C 1   , C 2   , C 3   , C 4   , and C 5   , as shown in Figure A2a, with:
Figure A2. Clusters produced with the second feeding order.
Figure A2. Clusters produced with the second feeding order.
Applsci 09 05469 g0a2
c 1     = [ 0.240     0.870 ] T ,   σ 1     = [ 0.240     0.240 ] T ;
  c 2     = [ 0.350     0.625 ] T ,   σ 2     = [ 0.271     0.235 ] T   ;
  c 3     = [ 0.315     0.430 ] T ,   σ 3     = [ 0.250     0.271 ] T   ;
  c 4     = [ 0.700     0.267 ] T ,   σ 4     = [ 0.280     0.276 ] T .  
  c 5     = [ 0.500     0.520 ] T ,   σ 5     = [ 0.200     0.200 ] T .  
Instances x 1   , x 5   , x 8   , and x 12 are assigned to C 1   , x 2 and x 10 are assigned to C 2   , x 3 and x 6 are assigned to C 3   , x 4   , x 7   , and x 11 are assigned to C 4   , and x 9 is assigned to C 5   . Iterations 2 and 3 are performed subsequently. After the fourth iteration, we have 3 clusters: C 1   , C 2   , and C 3   , as shown in Figure A2b, with:
c 1     = [ 0.240     0.870 ] T ,   σ 1     = [ 0.241     0.240 ] T ;
  c 2     = [ 0.366     0.526 ] T ,   σ 2     = [ 0.288     0.305 ] T   ;
  c 3     = [ 0.700     0.267 ] T ,   σ 3     = [ 0.280     0.276 ] T .  
Instances x 1   , x 5   , x 8   , and x 12 are assigned to C 1   , x 2   , x 3   , x 6   , x 9   , and x 10 are assigned to C 2   , and x 4   , x 7   , and x 11 are assigned to C 3   . Then the cluster assignments are stable and so SCC-I stops with three clusters C 1   , C 2   , and C 3 as shown above, with 4, 5, and 3 instances assigned to them, respectively.
Note that SCC produces 6 clusters, Figure A1b, with the first feeding order and 5 clusters, Figure A2a, with the second feeding order, and the two sets of clusters are different. However, SCC-I produces 3 clusters, Figure A1d and Figure A2b, with both feeding orders and the two sets of clusters are essentially the same. Clearly, the clusters obtained by SCC-I are more stable and reasonable.

Appendix B

Another simple example is given here to illustrate how SCC-IW works. Suppose X has the following 15 training instances:
x 1 = [ 0.450     0.739     0.044     0.865     0.641     0.036 ] T ;
  x 2 = [ 0.240     0.985     0.957     0.808     0.601     0.053 ] T ;  
  x 4 = [ 0.125     0.411     0.769     0.165     0.769     0.340 ] T ;  
  x 5 = [ 0.613     0.842     0.262     0.807     0.438     0.953 ] T ;  
  x 6 = [ 0.520     0.574     0.394     0.879     0.656     0.022 ] T ;  
  x 7 = [ 0.753     0.812     0.258     0.897     0.050     0.482 ] T ;  
  x 8 = [ 0.114     0.493     0.783     0.674     0.588     0.499 ] T ;  
  x 9 = [ 0.176     0.491     0.787     0.760     0.601     0.326 ] T ;  
  x 10 = [ 0.172     0.620     0.714     0.802     0.692     0.089 ] T ;  
  x 11 = [ 0.997     0.813     0.213     0.884     0.166     0.613 ] T ;  
  x 12 = [ 0.147     0.498     0.738     0.889     0.111     0.100 ] T ;  
  x 13 = [ 0.287     0.052     0.193     0.821     0.625     0.090 ] T ;  
  x 15 = [ 0.009     0.831     0.282     0.842     0.885     0.561 ] T .  
Note that N = 15 and n = 6   . Each instance is a short time series comprising 6 consecutive time samplings. Let ρ = 0.125 and σ 0 = 0.125 Below, we consider the feeding order of x 1   , ..., x 15   .
  • By SCC, 7 clusters, C 1   , ..., C 7   , are obtained, with size being 3, 1, 5, 1, 1, 3, and 1, respectively. Instances , x 6   , and x 13 are assigned to C 1   ; , x 8   , x 9   , x 10   , and x 12 are assigned to C 3   ; and , x 11   , and x 14 are assigned to C 6   . C 2   , C 4   , C 5   , and C 7 are singletons, containing x 2   , x 4   , x 5   , and x 15   , respectively.
  • By SCCI-I, convergence is achieved in the 3rd iteration with 4 clusters, C 1   , ..., C 4   , with size being 5, 5, 4, and 1, respectively. Instances x 1   , x 2   , x 6   , x 10   , and x 13 are assigned to C 1   ; x 3   , x 4   , x 8   , x 9   , and x 12 are assigned to C 2   ; x 5   , x 7   , x 11   , and x 14 are assigned to C 3   ; and x 15 is assigned to C 4   .
  • By SCC-IW, convergence is also achieved in the 3rd iteration, but with only 3 clusters, C 1   , C 2   , C 3   , with size being 5, 5, and 5, respectively. Instances x 1   , x 2   , x 6   , x 10   , and x 13 are assigned to C 1   ; x 3   , x 4   , x 8   , x 9   , and x 12 are assigned to C 2   ; and x 5   , x 7   , x 11   , x 14   , and x 15 are assigned to C 3   . The instances assigned to the clusters are shown in Figure A3a–c. The weights associated with the cluster are:
    w 1 = [ 0.134     0.232     0.259     0.211     0.095     0.069 ] T ;
      w 2 = [ 0.000     0.000     0.055     0.221     0.333     0.391 ] T ;  
      w 3 = [ 0.385     0.328     0.225     0.062     0.000     0.000 ] T  
Which are depicted in Figure A3d. The clustering done by SCC-IW seems to be the most suitable, intuitively. All the instances assigned to cluster C 1 have similar time samplings at indices 2, 3, and 4 which are manifested by large weights at these indices. Similarly, all the instances assigned to cluster C 2 have similar time samplings at indices 4, 5, and 6, which are manifested by large weights at these indices, and all the instances assigned to cluster C 3 have similar time samplings at indices 1, 2, and 3, which are manifested by large weights at these indices.
Figure A3. Clusters produced by SCC-IW.
Figure A3. Clusters produced by SCC-IW.
Applsci 09 05469 g0a3

References

  1. Olson, D.L.; Shi, Y. Introduction to Business Data Mining; McGraw-Hill/Irwin Englewood Cliffs: Boston, MA, USA, 2007. [Google Scholar]
  2. Theodoridis, S.; Koutroumbas, K. Pattern Recognition; Elsevier: Amsterdam, The Netherlands, 2008. [Google Scholar]
  3. Li, W.; Jaroszewski, L.; Godzik, A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 2001, 17, 282–283. [Google Scholar] [CrossRef]
  4. Lee, S.-J.; Ouyang, C.-S.; Du, S.-H. A neuro-fuzzy approach for segmentation of human objects in image sequences. IEEE Trans. Cybern. 2003, 33, 420–437. [Google Scholar]
  5. Filipovych, R.; Resnick, S.M.; Davatzikos, C. Semi-supervised cluster analysis of imaging data. NeuroImage 2011, 54, 2185–2197. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Jiang, J.Y.; Liou, R.J.; Lee, S.J. A fuzzy self-constructing feature clustering algorithm for text classification. IEEE Trans. Knowl. Data Eng. 2011, 23, 335–349. [Google Scholar] [CrossRef]
  7. Xu, R.-F.; Lee, S.-J. Dimensionality reduction by feature clustering for regression problems. Inf. Sci. 2015, 299, 42–57. [Google Scholar] [CrossRef]
  8. Wang, M.; Yu, Y.; Lin, W. Adaptive neural-based fuzzy inference system approach applied to steering control. In Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks—Part II, Wuhan, China, 26–29 May 2009; pp. 1189–1196. [Google Scholar]
  9. Xu, Y.; Olman, V.; Xu, D. Clustering gene expression data using a graph-theoretic approach: An application of minimum spanning trees. Bioinformatics 2002, 18, 536–545. [Google Scholar] [CrossRef] [Green Version]
  10. Wei, C.-C.; Chen, T.-T.; Lee, S.-J. K-NN based neuro-fuzzy system for time series prediction. In Proceedings of the14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Honolulu, HI, USA, 1–3 July 2013; pp. 569–574. [Google Scholar]
  11. Can, F.; Ozkarahan, E.A. Concepts and effectiveness of the cover-coefficient based clustering methodology for text databases. ACM Trans. Database Syst. 1990, 15, 483–517. [Google Scholar] [CrossRef]
  12. Feldman, R.; Sanger, J. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
  13. Lee, S.-J.; Jiang, J.-Y. Multilabel text categorization based on fuzzy relevance clustering. IEEE Trans. Fuzzy Syst. 2014, 22, 1457–1471. [Google Scholar] [CrossRef]
  14. Liao, C.-L.; Lee, S.-J. A clustering based approach to improving the efficiency of collaborative filtering recommendation. Electron. Commer. Res. Appl. 2016, 18, 1–9. [Google Scholar] [CrossRef]
  15. Alvarez, F.M.; Troncoso, A.; Riquelme, J.C.; Ruiz, J.S.A. Energy time series forecasting based on pattern sequence similarity. IEEE Trans. Knowl. Data Eng. 2011, 23, 1230–1243. [Google Scholar] [CrossRef]
  16. Wang, Z.-Y.; Lee, S.-J. A neuro-fuzzy based method for TAIEX forecasting. In Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC), Lanzhou, China, 13–16 July 2014; Volume 1, pp. 579–584. [Google Scholar]
  17. Everitt, B. Cluster Analysis; Wiley: West Sussex, Chichester, UK, 2011. [Google Scholar]
  18. Aghabozorgi, S.; Shirkhorshidi, A.S.; Wah, T.Y. A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE 2015, 10, e0144059. [Google Scholar]
  19. Kohonen, T. Self-Organizing Maps; Springer-Verlag: Berlin, Germany, 1995. [Google Scholar]
  20. Lee, S.-J.; Ouyang, C.-S. A neuro-fuzzy system modeling with self-constructing rule generation and hybrid SVD-based learning. IEEE Trans. Fuzzy Syst. 2003, 11, 341–353. [Google Scholar]
  21. Park, H.-S.; Jun, C.-H. A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 2009, 36, 3336–3341. [Google Scholar] [CrossRef]
  22. Sculley, D. Web-scale k-means clustering. In Proceedings of the19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 1177–1178. [Google Scholar]
  23. Capo, M.; Pérez, A.; Lozano, J.A. An efficient k-means clustering algorithm for massive data. arXiv 2018, arXiv:1801.02949. [Google Scholar]
  24. Abdalgader, K. Centroid-based lexical clustering. IntechOpen 2018. [Google Scholar] [CrossRef]
  25. Rezaei, M. Improving a centroid-based clustering by using suitable centroids from another clustering. J. Classif. 2019, 1–14. [Google Scholar] [CrossRef]
  26. Sarmiento, A.; Fondon, I.; Durán-Díaz, I.; Cruces, S. Centroid-based clustering with αβ-divergences. Entropy 2019, 21, 196. [Google Scholar] [CrossRef] [Green Version]
  27. Kraskov, A.; Stogbauer, H.; Andrzejak, R.G.; Grassberger, P. Hierarchical clustering based on mutual information. arXiv 2003, arXiv:q-bio/0311039v2. [Google Scholar]
  28. Szekely, G.J.; Rizzo, M.L. Hierarchical clustering via joint between-within distances: Extending Ward’s minimum variance method. J. Classif. 2005, 22, 151–183. [Google Scholar] [CrossRef]
  29. Achtert, E.; Bohm, C.; Kroger, P. DeLi-Clu: Boosting robustness, completeness, usability, and efficiency of hierarchical clustering by a closest pair ranking. Lect. Notes Comput. Sci. 2006, 3918, 119–128. [Google Scholar]
  30. Achtert, E.; Bohm, C.; Kroger, P.; Zimek, A. Mining hierarchies of correlation clusters. In Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM), Vienna, Austria, 3–5 July 2006; pp. 119–128. [Google Scholar]
  31. Zhang, W.; Zhao, D.; Wang, X. Agglomerative clustering via maximum incremental path integral. Pattern Recognit. 2013, 46, 3056–3065. [Google Scholar] [CrossRef]
  32. Gagolewski, M.; Bartoszuk, M.; Cena, A. Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 2016, 363, 8–23. [Google Scholar] [CrossRef]
  33. Figueiredo, M.A.T.; Jain, A.K. Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 381–396. [Google Scholar] [CrossRef] [Green Version]
  34. Pal, K.; Keller, J.; Bezdek, J. A possibilistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Syst. 2005, 13, 517–530. [Google Scholar] [CrossRef]
  35. Fellows, M.R.; Guo, J.; Komusiewicz, C.; Niedermeier, R.; Uhlmann, J. Graph-based data clustering with overlaps. Discret. Optim. 2011, 8, 2–17. [Google Scholar] [CrossRef] [Green Version]
  36. Pérez-Suárez, A.; Martinez-Trinidad, J.F.; Carrasco-Ochoa, J.A.; Medina-Pagola, J.E. OClustR: A new graph-based algorithm for overlapping clustering. Neuro-Computing 2013, 121, 234–247. [Google Scholar]
  37. Baadel, S.; Thabtah, F.; Lu, J. MCOKE: Multi-cluster overlapping k-means extension algorithm. Int. J. Comput. Control Quantum Inf. Eng. 2015, 9, 374–377. [Google Scholar]
  38. Améndola, C.; Faugére, J.-C.; Sturmfels, E. Moment varieties of Gaussian mixtures. J. Algebraic Stat. 2016, 7, 14–28. [Google Scholar] [CrossRef] [Green Version]
  39. Kriegel, H.-P.; Kroger, P.; Sander, J.; Zimek, A. Density-based clustering. WIREs Data Min. Knowl. Discov. 2011, 1, 231–240. [Google Scholar] [CrossRef]
  40. Heredia, L.C.; Mor, A.R. Density-based clustering methods for unsupervised, separation of partial discharge sources. Int. J. Electr. Power Energy Syst. 2019, 107, 224–230. [Google Scholar] [CrossRef]
  41. Wang, T.; Ren, C.; Luo, Y.; Tian, J. NS-DBSCAN: A density-based clustering algorithm in network space. Int. J. Geo-Inf. 2018. [Google Scholar] [CrossRef] [Green Version]
  42. Cheng, C.-H.; Fu, A.W.; Zhang, Y. Entropy-based subspace clustering for mining numerical data. In Proceedings of the 5th ACM International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 15—18 August 1999; pp. 84–93. [Google Scholar]
  43. Kailing, K.; Kriegel, H.-P.; Kroger, P. Density-connected subspace clustering for high-dimensional data. In Proceedings of the SIAM International Conference on Data Mining (SDM’04), Lake Buena Vista, FL, USA, 22–24 April 2004; pp. 246–257. [Google Scholar]
  44. Agrawal, R.; Gehrke, J.; Gunopulos, D.; Raghavan, P. Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 2005, 11, 5–33. [Google Scholar] [CrossRef]
  45. Kriege, H.-P.; Kroger, P.; Zimek, A. Subspace clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2012, 2, 351–364. [Google Scholar] [CrossRef]
  46. Luo, S.; Zhang, C.; Zhang, W.; Cao, X. Consistent and specific multi-view sub-space clustering. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–3 February 2018; pp. 3730–3737. [Google Scholar]
  47. Zhang, T.; Ji, P.; Harandi, M.; Huang, W.; Li, H. Neural collaborative subspace clustering. arXiv 2019, arXiv:1904.10596v1. [Google Scholar]
  48. Frey, B.J.; Dueck, D. Clustering by passing messages between data points. Science 2007, 315, 972–976. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  49. Shi, C.; Liu, Y.; Zhang, P. Weighted community detection and data clustering using message passing. arXiv 2018, arXiv:1801.09829v1. [Google Scholar] [CrossRef] [Green Version]
  50. Liao, T.W. Clustering of time series data-a survey. Pattern Recognit. 2005, 38, 1857–1874. [Google Scholar] [CrossRef]
  51. Aghabozorgi, S.; Shirkhorshidi, A.S.; Wah, T.Y. Time-series clustering-a decade review. Inf. Syst. 2015, 53, 16–38. [Google Scholar] [CrossRef]
  52. Maharaj, E.A.; D’Urso, P.; Caiado, J. Time Series Clustering and Classification; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
  53. Wang, W.; Yang, J.; Muntz, R. STING: A statistical information grid approach to spatial data mining. In Proceedings of the International Symposium on Very Large Data Bases. Athens, Greece, 25–29 August 1997; pp. 186–195. [Google Scholar]
  54. Aghabozorgi, S.; Wah, T.Y. Stock market co-movement assessment using a three-phase clustering method. Expert Syst. Appl. 2014, 41, 1301–1314. [Google Scholar] [CrossRef]
  55. Ouyang, C.-S.; Lee, W.-J.; Lee, S.-J. A TSK-type neuro-fuzzy network approach to system modeling problems. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2005, 35, 751–767. [Google Scholar] [CrossRef]
  56. Huang, X.; Ye, Y.; Xiong, L.; Lau, R.; Jiang, N.; Wang, S. Time series k-means: A new k-means type smooth subspace clustering for time series data. Inf. Sci. 2016, 367, 1–13. [Google Scholar] [CrossRef]
  57. Wang, Z.-Y. Some Variants of Self-Constructing Clustering. Master’s Thesis, National Sun Yat-Sen University, Kaohsiung, Taiwan, 2017. [Google Scholar]
  58. Roelofsen, P. Time Series Clustering. Master’s Thesis, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands, 2018. [Google Scholar]
  59. Huang, X.; Ye, Y.; Guo, H.; Cai, Y.; Zhang, H.; Li, Y. DSKmeans: A new kmeans-type approach to discriminative subspace clustering. Knowl. Based Syst. 2014, 70, 293–300. [Google Scholar] [CrossRef]
  60. Model Evaluation: Quantifying the Quality of Predictions. Available online: https://scikit-learn.org/stable/modules/model_evaluation.html#model-evaluation (accessed on 1 September 2018).
  61. The UCI machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/about.html (accessed on 5 December 2019).
  62. K-Means. Available online: https://www.mathworks.com/help/stats/kmeans.html (accessed on 1 September 2018).
  63. Fuzzy C-Means. Available online: https://www.mathworks.com/help/fuzzy/fcm.html (accessed on 1 September 2018).
  64. Gaussian Mixture Model. Available online: https://en.wikipedia.org/wiki/Mixture model (accessed on 1 September 2018).
  65. Dbscan. Available online: https://www.mathworks.com/help/fuzzy/dbscan.html (accessed on 1 November 2019).
  66. Matlab. Available online: https://www.mathworks.com/products/matlab.html (accessed on 1 September 2018).
  67. Gmm Source Code. Available online: http://blog.pluskid.org/?p=39 (accessed on 1 September 2018).
  68. Carrasco, J.; del Mar Rueda, S.M.; Herrera, F. rNPBST: An R package covering non-parametric and Bayesian statistical tests. In Proceedings of the International Conference on Hybrid Artificial Intelligence Systems, La Rioja, Spain, 21–23 June 2017; Volume 1, pp. 281–292. [Google Scholar]
  69. ANOVA. Available online: https://www.mathworks.com/help/stats/anova1.html (accessed on 1 December 2019).
  70. Dau, H.A.; Bagnall, A.J.; Kamgar, K.; Yeh, C.-C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.; Keogh, E.J. The UCR time series archive. arXiv 2018, arXiv:1810.07758. [Google Scholar]
  71. Cuevas, E.; Santuario, E.; Zaldivar, D.; Perez-Cisneros, M. An improved evolutionary algorithm for reducing the number of function evaluations. Intell. Autom. Soft Comput. 2016, 22, 177–192. [Google Scholar] [CrossRef]
  72. Siddiqi, U.F.; Sait, S.M. A new heuristic for the data clustering problem. IEEE Access. 2017, 5, 6801–6812. [Google Scholar] [CrossRef]
  73. Zhang, Q.; Zhu, C.; Yang, L.T.; Chen, Z.; Zhao, L.; Li, P. An incremental CFS algorithm for clustering large data in industrial internet of things. IEEE Trans. Ind. Infomat. 2017, 13, 1193–1201. [Google Scholar] [CrossRef]
  74. Arias-Castro, E.; Chen, G.; Lerman, G. Spectral clustering based on local linear approximations. Electron. J. Stat. 2011, 5, 1537–1587. [Google Scholar] [CrossRef] [Green Version]
  75. Borg, I.; Groenen, P. Modern Multidimensional Scaling: Theory and Applications, 2nd ed.; Springer: New York, NY, USA, 2005. [Google Scholar]
Figure 1. Performance of SCC-IW vs. α.
Figure 1. Performance of SCC-IW vs. α.
Applsci 09 05469 g001
Table 1. Characteristics of 14 non-time-series datasets.
Table 1. Characteristics of 14 non-time-series datasets.
Dataset#Instances#Features#Classes
Breast569302
Ecoli33678
Ionophere351332
Breast_Tissue10696
SPECT Heart267442
Seeds21073
Sonar208602
User_Knowledge40354
Musk4761662
Vehicle846184
Glass21496
Heart270122
Iris15043
Yeast1484810
Table 2. Performance comparisons of Fscore, Rand Index (RI), and Normalized Mutual Information (NMI) on 14 non-time-series datasets.
Table 2. Performance comparisons of Fscore, Rand Index (RI), and Normalized Mutual Information (NMI) on 14 non-time-series datasets.
Dataset K-MeansDSKmeansFCMGmmDBSCANSCCSCC-I
BreastFscore0.9270/20.8710/40.9274/10.7470/70.8663/50.7983/60.9264/3
RI0.8660/10.7940/40.8660/10.6515/70.7316/50.6953/60.8630/3
NMI0.6232/10.5214/50.6152/20.2449/70.5215/40.3333/60.6075/3
Time(s)0.0040.2000.0090.0600.0050.0200.270
HeartFscore0.7442/20.6749/60.7932/10.6880/40.6339/70.6875/50.6885/3
RI0.6318/20.5678/60.6700/10.5746/40.5107/70.5708/50.5780/3
NMI0.2096/20.1102/70.2647/10.1205/60.1844/30.1564/50.1580/4
Time(s)0.0030.020 0.0100.0100.0020.0010.054
IonophereFscore0.7101/30.7067/50.7072/40.7966/10.7224/20.6966/60.6892/7
RI0.5818/30.5816/40.5795/50.6948/10.5957/20.5529/70.5610/6
NMI0.1243/30.1224/40.1194/50.3016/10.2570/20.0532/70.1035/6
Time(s)0.0040.1000.0050.0200.0040.0100.140
MuskFscore0.5718/50.6258/40.5539/70.5589/60.6675/20.6565/30.6690/1
RI0.5037/40.5027/50.5015/70.5020/60.5074/10.5065/20.5061/3
NMI0.0164/50.0220/20.0086/60.0086/60.0393/10.0170/40.0214/3
Time(s)0.0090.3600.0200.1400.0120.0200.090
SonarFscore0.5530/60.6668/10.5519/70.5920/40.5630/50.6553/30.6645/2
RI0.5032/20.4993/70.5030/30.5053/10.4999/60.5010/50.5011/4
NMI0.0088/60.0215/40.0085/70.0117/50.0272/30.0290/20.0475/1
Time(s)0.0050.0400.0200.0200.0020.0080.070
SPECTFscore0.6944/50.7255/40.6136/70.6916/60.7732/20.7648/30.7901/1
RI0.5313/60.5773/40.4986/70.5395/50.6418/30.6540/10.6503/2
NMI0.0885/40.0601/60.1560/20.0634/50.0939/30.0153/70.1798/1
Time(s)0.0040.0900.0100.0200.0030.0100.180
EcoliFscore0.6306/60.7527/30.5977/70.7362/40.7080/50.7647/20.7843/1
RI0.7973/50.8673/20.7895/60.8500/40.7701/70.8547/30.8697/1
NMI0.5925/60.6612/20.5543/70.6273/40.6057/50.6466/30.6787/1
Time(s)0.0081.0200.0300.2000.0020.0200.130
GlassFscore0.4808/40.5358/30.4525/60.4783/50.4018/70.5378/20.5740/1
RI0.6698/30.6143/50.7023/10.6785/20.5598/70.5658/60.6267/4
NMI0.3274/40.3757/30.2967/60.3201/50.2816/70.3782/20.4804/1
Time(s)0.0050.2700.0160.0200.0020.0080.100
IrisFscore0.8479/60.9603/10.8926/30.8774/40.7061/70.8579/50.9029/2
RI0.8429/60.9498/10.8797/40.8818/30.7820/70.8580/50.8898/2
NMI0.7116/60.8648/10.7433/50.7940/20.5797/70.7516/40.7727/3
Time(s)0.0030.0600.0020.0360.0010.0060.040
YeastFscore0.4398/30.4227/50.3840/60.4597/10.3525/70.4252/40.4550/2
RI0.7490/10.6345/60.7192/40.7099/50.7227/30.5556/70.7405/2
NMI0.2769/20.2318/40.1785/70.2481/30.2045/60.2060/50.2879/1
Time(s)0.05015.9000.1703.5000.0150.0704.700
Breast_TisFscore0.5572/60.5683/30.5639/40.5605/50.5898/20.5276/70.6122/1
RI0.7869/20.7711/40.7887/10.7849/30.6635/60.6535/70.7339/5
NMI0.5203/50.5429/30.5213/40.5189/60.5745/10.4959/70.5688/2
Time(s)0.0050.1000.0050.0400.0010.0050.030
SeedsFscore0.8905/20.8861/40.9002/10.8714/60.8754/50.8607/70.8885/3
RI0.8693/30.8668/40.8789/10.8658/50.8404/70.8449/60.8698/2
NMI0.6743/50.6841/40.6911/30.7232/10.6694/60.6556/70.6943/2
Time(s)0.0030.0800.0030.0400.0020.0080.080
User_KnowFscore0.5199/40.4919/60.5402/20.5060/50.5256/30.4879/70.6064/1
RI0.6916/20.6459/60.6763/30.6752/40.6718/50.5865/70.7357/1
NMI0.3062/20.2451/60.2888/30.2628/50.2718/40.2295/70.4593/1
Time(s)0.0070.3600.0400.4000.0040.0160.320
VehicleFscore0.4264/50.4563/40.4191/60.4587/30.4003/70.4673/20.4788/1
RI0.6539/10.5830/40.6521/20.6428/30.5494/60.5466/70.5631/5
NMI0.1283/60.1590/40.0986/70.1733/20.1302/50.1686/30.1986/1
Time(s)0.0100.9500.0300.4400.0070.0300.270
Table 3. Averaged ranking of 14 non-time-series datasets.
Table 3. Averaged ranking of 14 non-time-series datasets.
K-MeansDSKmeansFCMGmmDBSCANSCCSCC-I
Fscore4.23.84.44.44.74.42.1
RI2.94.43.23.85.15.33.1
NMI4.13.94.64.14.14.92.1
Table 4. Performance comparisons of Davies–Bouldin index (DBI), Dunn index (DI), and Silhouette index (SI) on 14 non-time-series datasets.
Table 4. Performance comparisons of Davies–Bouldin index (DBI), Dunn index (DI), and Silhouette index (SI) on 14 non-time-series datasets.
Dataset K-MeansDSKmeansFCMGmmDBSCANSCC-I
BreastDBI1.23360.92181.24151.19381.08140.8836
DI0.08380.14570.08380.08530.14520.1452
SI0.57650.66130.56830.60280.54590.5459
HeartDBI1.88441.90281.89831.88941.38211.4799
DI0.36440.37150.24990.25510.17870.3604
SI0.34840.33140.34440.34670.34580.2785
IonophereDBI0.54980.58131.71821.84972.09650.4926
DI0.48410.44800.07030.38370.10120.5706
SI0.57060.49440.40970.51620.53760.6050
MuskDBI1.30691.33361.35941.29951.10380.7281
DI0.18940.20560.19630.25080.15590.5793
SI0.53210.52150.51820.53190.52780.5319
SonarDBI1.81991.71281.84831.62401.31810.5805
DI0.13370.23120.17860.23760.14580.3789
SI0.33190.37540.33180.38720.15410.4163
SPECTDBI1.45251.25781.87141.26761.16550.3567
DI0.25850.38530.15660.32680.13790.1287
SI0.67390.72570.46100.73080.30210.7467
EcoliDBI1.16731.35711.86590.93350.86780.7825
DI0.07650.09560.03980.06610.07030.0947
SI0.49400.56850.28720.39810.53900.6269
GlassDBI0.99561.14421.79021.17212.19300.6346
DI0.14890.06330.04420.13920.20510.2060
SI0.70420.45120.44210.51380.66720.6831
IrisDBI0.82810.61130.84360.61720.80120.8157
DI0.06940.14480.07010.10560.06190.0899
SI0.69590.66560.68910.64610.69490.6959
YeastDBI1.35261.67162.70411.88561.05500.8318
DI0.03990.03270.02080.03210.01670.0400
SI0.33390.29590.02630.29110.82130.3321
Breast_TisDBI0.86911.07051.02861.07150.78000.8583
DI0.11460.10960.04490.10410.10930.1174
SI0.59260.53300.46290.49600.56100.6109
SeedsDBI0.94360.96120.94480.69050.93130.9451
DI0.12590.11250.08000.09880.04930.1259
SI0.62090.59720.61960.61960.60630.6208
User_KnowDBI1.60131.66261.69331.61181.69791.7487
DI0.09570.09890.08420.07760.07580.1053
SI0.30970.29490.24730.26760.25930.3018
VehicleDBI1.19331.29741.60171.18831.07090.9569
DI0.08460.08840.06680.09510.05730.0969
SI0.47930.43720.39860.38720.48640.6170
Table 5. Pairwise comparisons by t-test for non-time-series datasets.
Table 5. Pairwise comparisons by t-test for non-time-series datasets.
SCC-I vsK-MeansDSKmeansFCMGmmDBSCAN
Fscore3.35592.53832.76732.76863.9622
NMI2.66342.36372.41571.76772.2601
DBI3.37033.60334.45843.31392.5186
Table 6. Characteristics of 10 time-series datasets.
Table 6. Characteristics of 10 time-series datasets.
Dataset#Instances#Features#Classes
SynControl600606
Coffee562862
Light71433197
OSU_Leaf4424276
Sony_Surface621702
Trace2002754
CBF9301283
ECGFiveDays8841362
FaceFour3501124
OliveOil605704
Table 7. Performance comparisons of Fscore, RI, and NMI on 10 time series datasets—Part I.
Table 7. Performance comparisons of Fscore, RI, and NMI on 10 time series datasets—Part I.
Dataset K-MeansTSKmeansDSKmeansFCM
CBFFscore0.6359/50.7028/20.6890/30.6316/6
RI0.7071/40.7331/20.7028/50.6973/6
NMI0.3616/50.4716/20.4701/30.3364/7
Time(s)0.0301.9001.6000.140
CoffeeFscore0.7578/50.7425/60.8441/20.8912/1
RI0.6667/30.6572/40.7662/20.8052/1
NMI0.3230/40.3051/60.5067/20.6001/1
Time(s)0.0400.1000.0100.010
ECG5Fscore0.5157/70.7091/20.5943/30.5147/8
RI0.4999/70.5780/20.5033/30.4999/7
NMI0.0007/70.1462/20.0327/30.0006/8
Time(s)0.0101.5001.0500.030
Face4Fscore0.6468/50.7394/20.6631/40.5765/8
RI0.7443/50.7977/20.7448/40.6881/7
NMI0.4585/40.6140/20.4493/50.3777/8
Time(s)0.0090.7800.1600.070
Light7Fscore0.5779/30.5677/40.5652/50.3664/8
RI0.8181/10.7937/40.8142/20.6317/8
NMI0.4990/20.4796/50.4890/40.2588/8
Time(s)0.0201.5000.5000.020
OilFscore0.8212/30.8175/40.8148/50.8226/2
RI0.8558/30.8524/50.8480/60.8757/2
NMI0.6906/20.6603/50.6688/40.6809/3
Time(s)0.0091.1000.0500.060
OSU_leafFscore0.4154/30.4068/50.4070/40.3411/8
RI0.7456/10.7447/20.7391/30.5895/8
NMI0.2233/30.2091/50.2159/40.1030/8
Time(s)0.0905.3002.3000.350
Sony_SurfFscore0.8022/40.7883/50.7445/60.8610/1
RI0.6947/30.6863/40.6415/60.7710/1
NMI0.3828/30.3674/40.2727/60.4907/1
Time(s)0.0090.4200.4000.009
SyntheicFscore0.7256/30.7543/10.7284/20.6393/6
RI0.8763/20.8908/10.8722/30.8386/6
NMI0.7859/30.8143/10.7756/40.6946/6
Time(s)0.0100.5601.2000.040
TraceFscore0.5491/80.5820/50.6143/20.5643/6
RI0.7498/70.7401/80.7493/70.7521/2
NMI0.5160/60.5142/80.5698/20.5204/5
Time(s)0.0080.6000.5000.040
Table 8. Performance comparisons of Fscore, RI, and NMI on 10 time-series datasets—Part II.
Table 8. Performance comparisons of Fscore, RI, and NMI on 10 time-series datasets—Part II.
Dataset GmmSCCSCC-ISCC-IW
CBFFscore0.6101/70.5822/80.6694/40.8032/1
RI0.6770/70.5925/80.7192/30.7516/1
NMI0.3559/60.2837/80.4302/40.4860/1
Time(s)0.4700.0401.0401.200
CoffeeFscore0.7366/70.6849/80.7619/40.7745/3
RI0.6358/60.5331/80.6295/70.6451/5
NMI0.2649/70.1329/80.3109/50.3318/3
Time(s)0.0700.0030.0200.120
ECG5Fscore0.5438/40.5321/50.5178/60.7395/1
RI0.5013/40.5003/50.5001/60.6147/1
NMI0.0029/40.0014/50.0009/60.1750/1
Time(s)0.0400.0400.4301.600
Face4Fscore0.6136/60.6107/70.7008/30.8185/1
RI0.7065/60.6339/80.7470/30.8655/1
NMI0.4220/60.3846/70.5163/30.7540/1
Time(s)0.0900.0070.0600.540
Light7Fscore0.4967/70.5082/60.5819/20.6036/1
RI0.7783/60.692/70.8033/30.7809/5
NMI0.4110/60.3985/70.4985/30.5048/1
Time(s)0.1300.0100.1000.680
OilFscore0.8103/60.6979/80.7728/70.8671/1
RI0.8548/40.6984/80.8042/70.8823/1
NMI0.6579/60.5155/80.6250/70.7422/1
Time(s)2.4000.0040.0300.650
OSU_leafFscore0.3606/70.3612/60.4295/10.4252/2
RI0.7086/60.5978/70.7265/50.7350/4
NMI0.1663/60.1636/70.2562/10.2398/2
Time(s)0.360.0300.7002.200
Sony_SurfFscore0.6604/80.6869/70.8033/30.8361/2
RI0.5627/70.5335/80.6838/50.7275/2
NMI0.1569/70.0814/80.3432/50.4245/2
Time(s)0.0200.0300.3200.410
SyntheicFscore0.5924/70.6789/50.5524/80.7164/4
RI0.8128/70.8588/50.7175/80.8694/4
NMI0.6378/70.7703/50.6161/80.7932/2
Time(s)1.0000.2900.0300.450
TraceFscore0.5500/70.6128/30.5896/40.7622/1
RI0.7500/50.7506/40.7514/30.8313/1
NMI0.5188/60.5696/30.5452/40.7649/1
Time(s)0.3000.0100.1000.620
Table 9. Performance comparisons of DBI, DI, and SI on 10 time-series datasets.
Table 9. Performance comparisons of DBI, DI, and SI on 10 time-series datasets.
Dataset K-MeansTSKmeansDSKmeansFCMGmmSCC-IW
CBFDBI1.98472.23972.21151.51321.69531.2029
DI0.29800.29680.26950.28820.31980.3452
SI0.28620.22360.22940.20710.31350.3981
CoffeeDBI1.52041.43111.43111.76451.43111.1966
DI0.30080.25860.25860.26330.30450.5019
SI0.43530.44470.44470.27490.44470.6344
ECG5DBI1.22761.23201.23181.22871.21681.0752
DI0.04330.03840.03860.02740.05740.0239
SI0.54460.54240.51610.54450.54450.5452
Face4DBI1.40301.91851.81851.81661.36591.3381
DI0.33990.21760.24690.27420.32840.2735
SI0.43010.25470.22790.24750.40360.2523
Light7DBI1.41361.85421.68551.47401.66821.2476
DI0.33240.22750.28710.28110.31510.2181
SI0.34930.18270.29210.24940.29140.2025
OilDBI1.06581.05591.58531.66791.26710.6984
DI0.39410.30000.16600.22790.30620.4959
SI0.53170.52610.24480.18930.48370.4494
OSU_leafDBI2.09692.13812.34282.05922.26471.3514
DI0.25690.27120.25610.18370.25070.1365
SI0.26610.25290.22560.12410.23470.2755
Sony_SurfDBI2.08781.87171.98322.62122.05510.8246
DI0.26730.22460.21120.23140.24490.2196
SI0.27370.26120.20210.22960.25790.2267
SyntheicDBI2.01562.25603.08033.38521.20281.2416
DI0.30880.29720.26940.25710.25180.6108
SI0.48300.45750.24300.19360.35570.4368
TraceDBI0.74610.94151.01171.01320.68621.3145
DI0.14660.10290.04620.06620.14340.2315
SI0.70130.44610.53900.67790.69260.6023
Table 10. Averaged ranking of 10 time-series datasets.
Table 10. Averaged ranking of 10 time-series datasets.
K-MeansTSKmeansDSKmeansFCM
Fscore4.63.63.65.4
RI3.63.44.14.8
NMI3.94.03.75.5
GmmSCCSCC-ISCC-IW
Fscore6.66.34.21.7
RI5.86.85.02.5
NMI6.16.64.61.5
Table 11. Pairwise comparisons by t-test for time series datasets.
Table 11. Pairwise comparisons by t-test for time series datasets.
SCC-IW vsK-MeansTSKmeansDSKmeansFCMGmm
Fscore4.38344.37533.25502.26066.4330
RI1.94291.92281.36441.70493.8533
DBI2.53393.74673.56882.96192.0451
Table 12. DI comparison between SCC-I and HDC.
Table 12. DI comparison between SCC-I and HDC.
MethodGlassIonosphereSonarVehicleHeart
SCC-I0.20600.57060.37890.09890.3604
HDC0.24500.19240.36980.10540.1165
MethodBalance scaleBanknote authenticationLandsat satellitePen-based digitsWaveform-5000
SCC-I0.44720.17460.24680.07430.4962
HDC0.15790.09690.06500.04080.3384
Table 13. NMI comparison between SCC-I and ICFSKM.
Table 13. NMI comparison between SCC-I and ICFSKM.
MethodIrisWineYeast
SCC-I0.77270.83340.2879
ICFSKM0.80300.78100.3930

Share and Cite

MDPI and ACS Style

Wang, Z.-Y.; Wu, C.-Y.; Lin, Y.-T.; Lee, S.-J. Weighted z-Distance-Based Clustering and Its Application to Time-Series Data. Appl. Sci. 2019, 9, 5469. https://doi.org/10.3390/app9245469

AMA Style

Wang Z-Y, Wu C-Y, Lin Y-T, Lee S-J. Weighted z-Distance-Based Clustering and Its Application to Time-Series Data. Applied Sciences. 2019; 9(24):5469. https://doi.org/10.3390/app9245469

Chicago/Turabian Style

Wang, Zhao-Yu, Chen-Yu Wu, Yan-Ting Lin, and Shie-Jue Lee. 2019. "Weighted z-Distance-Based Clustering and Its Application to Time-Series Data" Applied Sciences 9, no. 24: 5469. https://doi.org/10.3390/app9245469

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop