Weighted z-Distance-Based Clustering and Its Application to Time-Series Data

Wang, Zhao-Yu; Wu, Chen-Yu; Lin, Yan-Ting; Lee, Shie-Jue

doi:10.3390/app9245469

Open AccessArticle

Weighted z-Distance-Based Clustering and Its Application to Time-Series Data

¹

Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 804, Taiwan

²

Department of Electrical Engineering, Intelligent Electronic Commerce Research Center, National Sun Yat-Sen University, Kaohsiung 804, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2019, 9(24), 5469; https://doi.org/10.3390/app9245469

Submission received: 6 October 2019 / Revised: 5 December 2019 / Accepted: 9 December 2019 / Published: 12 December 2019

(This article belongs to the Special Issue New Frontiers in Computational Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Clustering is the practice of dividing given data into similar groups and is one of the most widely used methods for unsupervised learning. Lee and Ouyang proposed a self-constructing clustering (SCC) method in which the similarity threshold, instead of the number of clusters, is specified in advance by the user. For a given set of instances, SCC performs only one training cycle on those instances. Once an instance has been assigned to a cluster, the assignment will not be changed afterwards. The clusters produced may depend on the order in which the instances are considered, and assignment errors are more likely to occur. Also, all dimensions are equally weighted, which may not be suitable in certain applications, e.g., time-series clustering. In this paper, improvements are proposed. Two or more training cycles on the instances are performed. An instance can be re-assigned to another cluster in each cycle. In this way, the clusters produced are less likely to be affected by the feeding order of the instances. Also, each dimension of the input can be weighted differently in the clustering process. The values of the weights are adaptively learned from the data. A number of experiments with real-world benchmark datasets are conducted and the results are shown to demonstrate the effectiveness of the proposed ideas.

Keywords:

data clustering; similarity; training cycle; z-distance; quadratic programming

1. Introduction

In the field of artificial intelligence, clustering techniques play a very important role [1,2]. Clustering is an unsupervised learning technology, with the purpose of forming meaningful clusters for the unlabeled data instances under consideration. Intuitively, similar instances are to be grouped in a cluster and different instances grouped in different clusters. Clustering has been widely utilized in a variety of applications, such as revealing the internal structure of the data [3], deriving segmentation of the data [4,5], preprocessing the data for other artificial intelligence (AI) techniques [6,7], business intelligence [1,8], and knowledge discovery in data [9,10]. For example, in electronic text processing [11,12,13], clustering is used to reduce the dimensionality in order to improve the efficiency of the processing, or to ease the curse of dimensionality encountered for high-dimensional problems. In the recommendation applications in e-commerce [14], the size of the information matrix is reduced by clustering to enhance the efficiency of making recommendations. In power systems, clustering helps predict the trend of electricity demand in the future [15]. In stock market forecasting and social media data analysis [1,16], clustering technology is an important and indispensable core key. Therefore, developing good clustering technology is a very critical issue.

Many types of clustering algorithms have been proposed [2,17]. Similarity or distance measures are core components to cluster similar data into the same clusters, while dissimilar or distant data are placed into different clusters [18]. Centroid-based clustering [19,20,21,22,23,24,25,26] groups data instances in an exclusive way. If an instance belongs to a definite cluster, it could not be included in another cluster. K-means is one such algorithm, well-known in the AI community. To use it, the user has to provide the desired number of clusters, K, in advance. Each instance is assigned to the nearest cluster center. Then the K cluster centers are re-estimated. This process is repeated until the cluster centers are stable. Self-organizing mapping (SOM) employs a set of representatives. When a vector is presented, all representatives compete with each other. The winner is updated so as to move toward the vector. Hierarchical clustering [27,28,29,30,31,32] creates a hierarchical decomposition of the set of data instances using some criteria. Two strategies, bottom-up and top-down, are adopted in hierarchical algorithms. The user usually has to decide how many and what clusters are most desirable from the offered hierarchy of clusters. Distribution-based clustering [33,34,35,36,37,38] is based on distribution models. Fuzzy C-means uses fuzzy sets to cluster instances, and data instances are bound to each cluster by means of a membership function. Therefore, each instance may belong to several clusters with different degrees of membership. Gaussian mixture model-Expectation maximization (GMM-EM) uses a completely probabilistic approach. Each cluster is mathematically represented by a parametric distribution, e.g., a Gaussian, and the entire data set is therefore modeled by a mixture of these distributions. Density-based algorithms [39,40,41], e.g., Density-based spatial clustering of applications with noise (DBSCAN), consider clusters as dense regions in the instance space. Most of them do not impose any shape restrictions on the resulting clusters. However, they differ in the way the density is defined. Subspace clustering [42,43,44,45,46,47], e.g., Clustering in quest (CLIQUE), searches for clusters within the various subspaces of the feature space. It reveals the clusters as well as the subspaces where they reside. Also, message passing [48,49], a recent development in computer science and statistical physics, has led to the creation of new types of clustering algorithms.

A special type of clustering is time-series clustering. A sequence composed of continuous, real-valued temporal data is known as a time series. Time-series data are of interest in various areas, e.g., engineering, business, finance, economics, and healthcare. There are good surveys and reviews published on time-series clustering [50,51,52]. In [50], time-series clustering methods are grouped into three major categories, raw-data-based, feature-based, and model-based approaches. Raw-data-based clustering methods work with raw data, either in the time or frequency domain. The two time series being compared are normally sampled at the same interval. Clustering based on raw data implies working with high-dimensional space. It is also not desirable to work directly with the raw data that are highly noisy. Feature-based clustering methods have been proposed to address these concerns. Although most feature extraction methods are generic in nature, the extracted features are usually application dependent. Model-based clustering methods consider each time series being generated by some kind of model or by a mixture of underlying probability distributions. Time series are considered similar when the models characterizing individual series are similar. However, model-based approaches usually have scalability problems, and its performance reduces when the clusters are close to each other. In [51], time-series clustering approaches are classified into six groups: partitioning, hierarchical, grid-based, model-based, density-based clustering, and multi-step clustering algorithms. A partitioning clustering method, e.g., K-means, makes k groups from unlabeled objects in the way that each group contains at least one object. Hierarchical clustering is an approach of cluster analysis that makes a hierarchy of clusters using agglomerative or divisive algorithms. Model-based clustering, e.g., SOM, assumes a model for each cluster, and finds the best fit of data to that model. In density based clustering [41], clusters are subspaces of dense objects which are separated by subspaces in which objects have low density. The grid-based methods quantize the space into a finite number of the cells that form a grid, and then perform clustering on the grid’s cells [53]. A multi-step approach presented in [54] uses a three-phase method: (1) pre-clustering of time series; (2) purifying and summarization; and (3) merging, to construct the clusters based on similarity in shape.

Lee and Ouyang proposed a self-constructing clustering (SCC) algorithm [20] which has been applied in various applications [4,6,13,14,16,55]. SCC is an exclusive clustering method. For a given training set, SCC performs only one training cycle on the training instances. Initially, no clusters exist. Training instances are considered one by one. If an instance is closer enough to an existing cluster, the instance is assigned to the most suitable cluster. Otherwise, a new cluster is created and the instance is assigned to it. SCC offers several advantages. First, the algorithm runs through the training instances only one time, so it is fast. Second, the distributions of the data are statistically characterized. Third, the similarity threshold, instead of the number of clusters, is specified in advance by the user. However, once an instance is assigned to a cluster, the assignment will not be changed afterwards. The clusters produced may depend on the order the instances are considered, and assignment errors are more likely to occur. As a result, the accuracy of the result can be low. Also, all dimensions are equally weighted in the clustering process, which may not be suitable in certain applications, e.g., time-series clustering [56,57,58].

In this paper, we propose improvements to the SCC method to overcome its shortcomings. Two or more training cycles on the instances are performed. In each cycle, training instances are considered one by one. An instance can be added into or removed from a cluster, and thus it is allowed to be re-assigned to another cluster. A desired number of clusters is obtained when all the assignments are stable, i.e., no assignment has been changed, in the current cycle. In this way, the clusters produced are less likely to be affected by the feeding order of the instances. Furthermore, each dimension can be weighted differently in the clustering process. The values of the weights are adaptively learned from the data. Having different weights is useful when certain relevance exists between different dimensions in many applications, e.g., clustering of time series data [56,57]. The effectiveness of the proposed ideas is demonstrated by a number of experiments conducted with real-world benchmark datasets.

The rest of this paper is organized as follows. SCC is briefly reviewed in Section 2. The proposed methods are described in Section 3 and Section 3.2, respectively. Experimental results are presented in Section 4. Finally, a conclusion is given in Section 5.

2. Self-Constructing Clustering (SCC)

Suppose

X = {x_{i} | 1 \leq i \leq N}

is a finite set of N unlabeled training instances, where

x_{i} = {[x_{i, 1} \dots x_{i, n}]}^{T} \in ℝ^{n}

is the ith instance. Each instance is a vector with n features. SCC [20,55] does clustering in a progressive way. Only one training cycle on the instances is performed. Each cluster is described by a Gaussian-like membership function characterized by the center and deviation induced from the data assigned to the cluster. The instances are considered one by one sequentially, and clusters are created incrementally.

Let J denote the number of currently existing clusters. Initially, no clusters exist and thus J = 0. When instance 1 comes in, the first cluster,

C_{1}

, is created and instance is assigned to it and J = 1. Then, for instance i, which is

x_{i}

,

i \geq 2

, the z-distance between instance i and every existing cluster,

C_{j}

, is calculated by

Z (i, j) = \sum_{k = 1}^{n} {(\frac{x_{i, k} - c_{j, k}}{σ_{j, k}})}^{2}

(1)

for

j = 1, 2, \dots, J

. Note that

c_{j} = {[c_{j, 1} \dots c_{j, n}]}^{T}

and

σ_{j} = {[σ_{j, 1} \dots σ_{j, n}]}^{T}

denote the mean and deviation, respectively, induced from the instances assigned to cluster

C_{j}

. The membership degree (MD) of instance i belonging to cluster

C_{j}

is defined as:

μ_{j} (x_{i}) = \exp {- Z (i, j)}

(2)

with the value lying in the range of (0, 1]. Let

S_{j}

denote the number of instances assigned to cluster

C_{j}

. There are two cases:

If all the MD’s are less than $ρ^{n}$ , where $ρ$ is a pre-specified similarity threshold in one dimension, i.e.,

$\exp {- Z (i, j)} < ρ^{n}$

(3)

for all existing clusters $C_{j}$ , $1 \leq j \leq J$ , a new cluster, $C_{J + 1}$ , is created and instance i is assigned to it by

$c_{J + 1} = x_{i}, σ_{J + 1} = {[σ_{0} \dots σ_{0}]}^{T}, S_{J + 1} = 1, J \leftarrow J + 1$

(4)

where $σ_{0}$ is a pre-specified constant.
Otherwise, instance i is assigned to the cluster with the largest MD, say cluster $C_{a}$ , i.e., center $c_{a} = {[c_{a, 1} \dots c_{a, n}]}^{T}$ , deviation $σ_{a} = {[σ_{a, 1} \dots σ_{a, n}]}^{T}$ , and size $S_{a}$ , are updated by

$σ_{a, k} \leftarrow {\frac{(S_{a} - 1) {(σ_{a, k} - σ_{0})}^{2} + S_{a} c_{a, k}^{2} + x_{i, k}^{2}}{S_{a}} - \frac{S_{a} + 1}{S_{a}} {[\frac{S_{a} c_{a, k} + x_{i, k}}{S_{a} + 1}]}^{2}}^{\frac{1}{2}} + σ_{0}, 1 \leq k \leq n,$

$c_{a} \leftarrow \frac{S_{a} c_{a} + x_{i}}{S_{a} + 1},$

(5)

$S_{a} \leftarrow S_{a} + 1 .$

Note that J is not changed in the latter case.

When all the instances have been considered, SCC stops with J clusters. Algorithm 1 can be summarized below.

Algorithm 1. SCC

for each instance, instance i,

1 \leq i \leq N

Compute Z(i, j),

1 \leq j \leq J

;
if Equation (3) holds
Create a new cluster according to Equation (4);
else
Instance i is assigned to the cluster with largest MD, according to Equation (5);
end if
end for
end SCC

Note that SCC takes the training set X as input and outputs J clusters

C_{1}, \dots, C_{J}

. SCC runs fast due to running through the training instances only once. Unlike K-means, the similarity threshold, instead of the number of clusters, is specified in advance by the user.

3. Proposed Methods

The clusters produced by SCC may depend on the feeding order of the instances and the accuracy of the result can be low. Furthermore, all dimensions are equally weighted in the clustering process, which may not be appropriate in certain applications, e.g., time-series clustering.

3.1. Iterative SCC (SCC-I)

For convenience, the proposed approach is abbreviated as SCC-I, standing for the iterative version of SCC. SCC-I consists of multiple iterations. An instance is allowed to be re-assigned to another cluster. The clustering work stops when all the assignments are stable, i.e., no assignment will be changed. A training cycle on the instances is performed in each iteration.

In the first iteration, SCC is applied. Consider the rth iteration,

r \geq 2

. For any instance

x_{i}

,

1 \leq i \leq N

, we first remove it from the cluster, say

C_{t}

, which

x_{i}

is assigned to. Three cases may arise:

If no instance remains assigned to $C_{t}$ , $C_{t}$ is deleted and the other clusters are re-named to be $C_{1}$ , ..., $C_{J - 1}$ . Then J is decreased by 1, i.e., $J \leftarrow J - 1$ .
If only one instance remains assigned to $C_{t}$ , then set $c_{t}$ to be this instance, set $σ_{t} = {[σ_{0} \dots σ_{0}]}^{T}$ , and set $S_{t} = 1$ .
Otherwise, the characteristics of $C_{t}$ are updated by.

$σ_{t, k} \leftarrow {\frac{(S_{t} - 1) {(σ_{t, k} - σ_{0})}^{2} + S_{t} c_{t, k}^{2} + x_{i, k}^{2}}{S_{t} - 2} - \frac{S_{t} - 1}{S_{t} - 2} {[\frac{S_{t} c_{t, k} + x_{i, k}}{S_{t} + 1}]}^{2}}^{\frac{1}{2}} + σ_{0}, 1 \leq k \leq n,$

$c_{t} \leftarrow \frac{S_{t} c_{t} + x_{i}}{S_{t} + 1},$

(6)

$S_{t} \leftarrow S_{t} + 1 .$

Then, we calculate the z-distance and MD between

x_{i}

and each existing cluster by Equations (1) and (2). A new cluster may be created and

x_{i}

is assigned to it as following Equation (4), or

x_{i}

is assigned to the cluster with the largest MD as following Equation (5).

The current rth iteration ends when all the instances are run through. If one of the cluster assignments has been changed in this iteration, the next iteration, i.e., the

(r + 1)

th iteration, proceeds. Otherwise, the assignments are stable and SCC-I stops with J clusters. The SCC-I Algorithm 2 can be summarized below.

Algorithm 2. SCC-I

Perform SCC on X;
repeat
for each instance, instance i, 1 ≤ i ≤ N
Remove instance i from its cluster and update the existing clusters;
Compute Z(i, j), 1 ≤ j ≤ J;
Create a new cluster or assign instance i to the cluster with largest MD;
end for
until assignments are stable;
end SCC-I

Note that SCC-I takes the training set X as input and outputs J clusters

C_{1}

, ...,

C_{J}

. Since re-assignments are allowed to be done, SCC-I can produce clusters that are less likely to be affected by the feeding order of the instances. As a result, more stable clusters can be obtained by SCC-I. By “more stable” we mean both the number of clusters and the clusters produced are less likely to be affected by the feeding order of the instances. However, SCC runs faster than SCC-I since SCC-I performs more than one iteration. To illustrate the advantage of SCC-I over SCC, a simple example is given in Appendix A.

3.2. Weighted SCC-I (SCC-IW)

In SCC and SCC-I, each dimension is equally weighted in the calculation of z-distances, as shown in Equation (1). For some applications, e.g., time-series clustering, where certain relevance exists between different dimensions, allowing different dimensions to be weighted differently could be a useful idea [56]. This motivates the development of the weighted SCC-I, abbreviated as SCC-IW.

Let the weighted z-distance between instance i and cluster

C_{j}

,

1 \leq i \leq N

,

1 \leq j \leq J

, be defined as

Z_{w} (i, j) = \sum_{k = 1}^{n} w_{j, k} {(\frac{x_{i, k} - c_{j, k}}{σ_{j, k}})}^{2}

(7)

where

w_{j} = {[w_{j, 1} \dots w_{j, n}]}^{T}

is the weight vector associated with

C_{j}

,

w_{j, k} \geq 0

for

1 \leq k \leq n

, and

w_{j, 1} + \dots + w_{j, n} = 1

. Accordingly, the MD of instance i belonging to

C_{j}

is defined as:

μ_{j} (x_{i}) = \exp {- Z_{w} (i, j)} .

(8)

Clearly, we have

0 \leq μ_{j} (x_{i}) \leq 1

. If

\exp {- Z_{w} (i, j)} < ρ

(9)

for all existing clusters

C_{j}

,

1 \leq j \leq J

, a new cluster is created; otherwise, instance i is assigned to the cluster with the largest MD, as described before.

Remark 1.

In Equation (3), the test is

e x p {- Z (i, j)} < ρ^{n}

where

ρ

is the pre-specified similarity threshold. Note that for

e x p {- Z (i, j)}

, we have

\begin{array}{l} \exp {- Z (i, j)} \\ = \exp {- \sum_{k = 1}^{n} {(\frac{x_{i, k} - c_{j, k}}{σ_{j, k}})}^{2}} \\ = \exp {- {(\frac{x_{i, 1} - c_{j, 1}}{σ_{j, 1}})}^{2}} \dots \exp {- {(\frac{x_{i, n} - c_{j, n}}{σ_{j, n}})}^{2}} \\ < ρ \dots ρ = ρ^{n} . \end{array}

For

e x p {- Z_{w} (i, j)}

, we have

\begin{array}{l} \exp {- Z_{w} (i, j)} \\ = \exp {- \sum_{k = 1}^{n} w_{j, k} {(\frac{x_{i, k} - c_{j, k}}{σ_{j, k}})}^{2}} \\ = \exp {- w_{j, 1} {(\frac{x_{i, 1} - c_{j, 1}}{σ_{j, 1}})}^{2}} \dots \exp {- w_{j, n} {(\frac{x_{i, n} - c_{j, n}}{σ_{j, n}})}^{2}} \\ = {\exp {- {(\frac{x_{i, 1} - c_{j, 1}}{σ_{j, 1}})}^{2}}}^{w_{j, 1}} \dots {\exp {- {(\frac{x_{i, n} - c_{j, n}}{σ_{j, n}})}^{2}}}^{w_{j, n}} \\ < ρ^{w_{j, 1}} \dots ρ^{w_{j, n}} \\ = ρ^{w_{j, 1} + \dots + w_{j, n}} = ρ . \end{array}

This is Equation (9).

Now the problem left is how the weights are determined to optimize the clustering work under consideration. Here, we consider time-series clustering as an exemplar application [56]. Firstly, it is required that the instances assigned to each cluster be close together as much as possible. In other words, we want to maximize:

\sum_{j = 1}^{J} \sum_{i = 1}^{N} u_{i, j} μ_{j} (x_{i})

(10)

where

u_{i, j} = 1

if instance i is assigned to cluster

C_{j}

and

u_{i, j} = 0

otherwise, for

1 \leq i \leq N

and

1 \leq j \leq J

, and

μ_{j} (x_{i})

is the MD defined in Equation (8). Note that

μ_{j} (x_{i})

is an exponential function which is non-linear on

w_{j, 1}, \dots, w_{j, n}

. Maximizing Equation (10) is a kind of non-linear optimization which is hard. However, maximizing

μ_{j} (x_{i})

is identical to minimizing

Z_{w} (i, j)

. Therefore, instead of maximizing Equation (10), we minimize:

\sum_{j = 1}^{J} \sum_{i = 1}^{N} u_{i, j} Z_{w} (i, j) .

(11)

Since

Z_{w} (i, j)

is linear on

w_{j, 1}, \dots, w_{j, n}

, minimizing Equation (11) is kind of linear optimization which is much easier. Secondly, since the neighboring dimensions are next to each other in the time line, the weights of neighboring dimensions should be close to each other. Therefore, we want to minimize:

\sum_{j = 1}^{J} \sum_{i = 1}^{n - 1} {w_{j, k} - w_{j, k + 1}}^{2} .

(12)

Combining Equation (11) and Equation (12), together with the constraints on weights, we’d like the weights to minimize

\sum_{j = 1}^{J} \sum_{i = 1}^{N} u_{i, j} Z_{w} (i, j) + α \sum_{j = 1}^{J} \sum_{k = 1}^{n - 1} {w_{j, k} - w_{j, k + 1}}^{2}

(13)

subject to \sum_{k = 1}^{n} w_{j, k} = 1, w_{j, k} \geq 0, k = 1, . . ., n, 1 \leq j \leq J

which, by Equation (7), is equivalent to minimizing

\sum_{j = 1}^{J} \sum_{i = 1}^{N} u_{i, j} \sum_{k = 1}^{n} w_{j, k} {(\frac{x_{i, k} - c_{j, k}}{σ_{j, k}})}^{2} + α \sum_{j = 1}^{J} \sum_{k = 1}^{n - 1} {w_{j, k} - w_{j, k + 1}}^{2}

(14)

subject to \sum_{k = 1}^{n} w_{j, k} = 1, w_{j, k} \geq 0, k = 1, . . ., n, 1 \leq j \leq J

where α is a positive real constant. Through quadratic programming, optimal values for the weights can be derived from Equation (14).

Now, we are ready to present the SCC-IW algorithm. We adopt

Z_{w} (i, j)

, instead of

Z (i, j)

, in SCC-IW. Also, whenever a new cluster is created, its weights are each initialized to be

\frac{1}{n}

. At the end of the current iteration, we minimize Equation (14) to find optimal weights which will be used in the next iteration. The SCC-IW Algorithm 3 can be summarized below.

Algorithm 3. SCC-IW

Perform SCC on X with weighted z-distance, and initialize the weights for each newly
created cluster;
repeat
for each instance, instance i,

1 \leq i \leq N

Remove instance i from its cluster and update the existing clusters;
Compute

Z_{w} (i, j)

,

1 \leq j \leq J

;
Create a new cluster or assign instance i to the cluster with largest MD, and initialize the
weights for each newly created cluster;
end for
Derive optimal weights by solving Equation (14) through quadratic programming;
until assignments are stable;
end SCC-IW

Note that SCC-IW takes the training set X as input and outputs J clusters

C_{1}

, ...,

C_{J}

.

It is not surprising that SCC-IW can work well. In each iteration, optimal weights are derived. For a dimension which is more useful for clustering, it is more important and is, therefore, given a larger weight. To illustrate how SCC-IW works, a simple example is given in Appendix B.

4. Experimental Results

In this section, we demonstrate empirically the superiority of our proposed methods. The proposed methods and others are applied to do clustering on benchmark datasets. Three external measures of evaluation, Fscore, Rand Index (RI), and Normalized Mutual Information (NMI) [59], and another three internal measures, Dunn index (DI), Davies–Bouldin index (DBI), and Silhouette index (SI) [60], are adopted.

Fscore = \sum_{k = 1}^{K} \frac{N_{k}}{N} * \max_{1 \leq j \leq J} {\frac{2 * \frac{N_{k j}}{N_{k}} * \frac{N_{k j}}{N_{j}}}{\frac{N_{k j}}{N_{k}} + \frac{N_{k j}}{N_{j}}}}

(15)

where K is the number of classes, J is the number of clusters, N is the size of the entire data set,

N_{k j}

is the number of data instances belonging to class k in cluster j,

N_{j}

is the size of cluster j, and

N_{k}

is the size of class k. A higher Fscore is better.

RI = \frac{a + b}{N (N - 1) / 2}

(16)

where a is the number of pairs of data instances having different class labels and belonging to different clusters, b is the number of pairs of data instances having the same cluster labels and belonging to the same clusters, and N is the size of the entire data set. A higher RI is better.

NMI = \frac{\sum_{k = 1}^{K} \sum_{j = 1}^{J} N_{k j} \log (\frac{N * N_{k j}}{N_{k} * N_{j}})}{\sqrt{(\sum_{k = 1}^{K} N_{k} \log \frac{N_{k}}{N}) * (\sum_{j = 1}^{J} N_{j} \log \frac{N_{j}}{N})}}

(17)

where K is the number of classes, J is the number of clusters, N is the size of the entire data set,

N_{k j}

is the number of data instances belonging to class k in cluster j,

N_{j}

is the size of cluster j, and

N_{k}

is the size of class k. A higher NMI is better.

DI = \min_{1 \leq i \leq J} {\min_{j \neq i} \frac{d_{\min} (C_{i}, C_{j})}{\max_{1 \leq l \leq J} diam (C_{l})}}

(18)

where

d_{\min} (C_{i}, C_{j})

is the minimum distance between clusters

C_{i}

and

C_{j}

, and

diam (C_{l})

is the largest distance between the instances contained in cluster

C_{l}

. A higher DI is better.

DBI = \frac{1}{J} \sum_{i = 1}^{J} \max_{j \neq i} {\frac{avg (C_{i}) + avg (C_{j})}{d_{cen} (C_{i}, C_{j})}}

(19)

where

avg (C)

is the average of the distances between the instance contained in cluster C, and

d_{cen} (C_{i}, C_{j})

is the between the centers of clusters

C_{i}

and

C_{j}

. A lower DBI is better.

SI = \frac{1}{N} \sum_{i = 1}^{N} \frac{b (i) - a (i)}{\max {a (i), b (i)}}

(20)

where

a (i)

is the average of the distances between instance

i

and all other instances within the same cluster,

b (i)

is the lowest average distance of instance

i

to all instances in any other cluster of which instance

i

is not a member, and

N

is the size of the entire data set. A higher SI is better.

4.1. Non Time-Series Datasets

To illustrate the effectiveness of SCC-I, fourteen benchmark non-time-series datasets are selected from the UCI repository [61] for the experiments. The characteristics of these datasets are shown in Table 1. For example, there are 569 instances in the Breast dataset. Each instance has 30 features or dimensions, and belongs to one of 2 classes. For each dataset, an instance belongs to one and only one class. We compare SCC-I with Kmeans [62], DSKmeans [59], Fuzzy C-means (FCM) [63], Gaussian mixture model (Gmm) [64], DBSCAN [65], and SCC [20]. The codes for K-means, DBSCAN, and FCM are adopted from Matlab [66], and the code for Gmm is adopted from [67]. We wrote the codes for DSKmeans, SCC, and SCC-I in Matlab.

Table 2 shows comparisons of Fscore, RI, and NMI among different methods for each dataset. To have fair comparisons among different methods, the number of clusters are tuned to be identical to the number of classes for each dataset. For K-means, the number of clusters, k, is set to be equal to the number of classes. For DSKmeans, the parameter

γ

is set to be a value between 0.01 and 5, and

η

is between 0.01 and 0.3. For SCC and SCC-I,

σ_{0}

and

ρ

are set to be values between 0.1 and 0.95. Also, each method performed 25 runs on a dataset and the averaged result is shown. For K-means and DSKmeans, each run started with a different set of initial seeds. For SCC and SCC-I, each run was given a different feeding order of the training instances. For FCM, the maximum number of iterations is set to be 50. For DBSCAN,

ε

is set to be a value between 0.1 and 3, while the minimum number of neighbors, minpts, required for a core point, is set to be between 1 and 10. In addition to the values of the measures, performance ranking is also indicated at the right side of ‘/’ for each dataset in Table 2. For example, consider the Breast dataset. The Fscore values are 0.9270, 0.8710, 0.9274, 0.7470, 0.7983, and 0.9264 by K-means, DSKmeans, FCM, Gmm, DBSCAN, SCC, and SCC-I, respectively. FCM has the best value, 0.9274, so it ranks as the first place, indicated by 1 at the right side of ‘/’; K-means has the second best value, 0.9270, so it ranks as the second place, indicated by 2 at the right side of ‘/’; etc. From this table, we can see that (1) SCC-I outperforms SCC significantly, and (2) SCC-I is no less effective than the other methods. Table 3 shows the averaged ranking of all the 14 datasets for each method. As can be seen, SCC-I is the best in Fscore and NMI, indicated by boldfaced numbers, and is the second best in RI.

Although the overall ranking of SCC-I is better than others, looking at individual results for each dataset there are some variations. For example, K-means outperforms SCC-I for Heart, Ionophere, and Seeds. Compared with K-means, SCC-I has advantages: (1) SCC-I considers deviation in the computation of distance; (2) SCC-I allows ellipsoidal shape of clusters. Note that SCC-I is less affected by the feeding order of instances, and thus can give a more stable and accurate clustering than SCC. However, given the number of clusters, the clusters obtained by K-means are not affected by the feeding order of instances. For the datasets with ellipsoidal shape of clusters, SCC-I is more likely to perform better on them. By contrast, for the datasets with spherical clusters, SCC-I may be inferior to K-means on such datasets. Table 2 also shows the execution time, in seconds, of each method on each dataset. The computer used for running the codes is equipped with Intel(R) Core(TM) i7-4770 CPU 3.40 GHz, 16 GB RAM, and Matlab R2011b. The times shown in the table only provide an idea of how efficiently these methods can run. Note that SCC-I runs slower than SCC. SCC only performs one training cycle on the instances, while SCC-I requires two or more training cycles. SCC-I takes more training time than other baselines due to some factors: (1) The codes of these baselines, e.g., K-means and FCM, were adopted from established websites, while SCC-I were written by graduate students; (2) SCC-I has to compute z-distances and Gaussian values, which is more computationally expensive; (3) In order to do re-assignment, the operation of removing instances from clusters is done during the clustering process. Comparisons of DBI, DI, and SI among different methods for each dataset are shown in Table 4. As can be seen from the table, SCC-I is better than other methods. SCC-I gets the lowest DBI for 9 out of 14 datasets, the highest DI for 9 out of 14 datasets, and the highest SI for 7 out of 14 datasets.

Now we use the paired t-test [68] to test the real significant differences between SCC-I and other methods. Table 5 shows the t-values based on the values of Fscore, NMI, and DBI, respectively, under the 90% confidence level. Note that we have 14 datasets involved. So the degree of freedom is 13 and the corresponding threshold is 1.771. From Table 5, we can see that all the t-values are greater than 1.771, except for Gmm with NMI. Therefore, by statistical test we observe that SCC-I shows significantly better performance than other methods. A multiple comparison test may be used since there are multiple algorithms involved. Analysis of variance (ANOVA) [69] provides such tests. We have tried with ANOVA in two ways. Firstly, we used ANOVA to return a structure that can be used to determine which pairs of algorithms are significantly different. The results similar to those shown in Table 5 were obtained. Secondly, we used ANOVA to test the null hypothesis that all algorithms perform equally well against the alternative hypothesis that at least one algorithm is different from the others. A p-value that is smaller than 0.05 indicates that at least one of the algorithms is significantly different from the others. However, it is not certain which ones are significantly different from each other.

4.2. Time-Series Datasets

Next, we show the effectiveness of SCC-IW in clustering time series data. Ten benchmark time series datasets are taken from the UCR repository [70] for the experiments. The characteristics of these datasets are shown in Table 6. In addition to the previous methods, we also compare with TSKmeans [56] here. We wrote the code for TSKmeans in Matlab.

Table 7 and Table 8 show the Fscore, RI, and NMI obtained by different methods for each dataset. Again, each method performed 25 runs on a dataset and the averaged result is shown. From these two tables, we can see that (1) SCC-IW outperforms both SCC and SCC-I significantly, and (2) SCC-IW is no less effective th.an the other methods. Table 8 shows the averaged ranking of all the 10 time series datasets for each method. As can be seen, SCC-IW is the best in Fscore, RI, and NMI. However, SCC-IW runs slower than SCC and SCC-I, since weights are involved in SCC-IW and they have to be optimally updated in each training cycle. Comparisons of DBI, DI, and SI among different methods for each dataset are shown in Table 9. As can be seen from the table, SCC-I is better than other methods. SCC-I gets the lowest DBI for 8 out of 10 datasets, the highest DI for 5 out of 10 datasets, and the highest SI for 4 out of 10 datasets. The main reason that SCCIW outperforms all the other competing classifiers in Table 10 is the consideration of both weights and deviations in SCC-IW. Gmm, SCC and SCC-I do not use weights. K-means, DSKmeans, and FCM do not use deviations nor weights in the clustering process. TSKmeans considers weights, but deviations are not involved.

Now we use the paired t-test [68] to test the real significant differences between SCC-IW and other methods. Table 11 shows the t-values based on the values of Fscore, RI, and DBI, respectively, at the 90% confidence level. Note that we have 10 datasets involved. So the degree of freedom is 9 and the corresponding threshold is 1.833. From the above table, we can see that all the t-values are greater than 1.833, except for DSKmeans and FCM with RI. Therefore, by statistical test we observe that SCC-IW shows significantly better performance than other methods.

4.3. Compairsons with Other Methods

Recently, evolutionary algorithms, e.g., simulated annealing and differential evolution, have been proposed to perform clustering [71]. The evolutionary algorithms can perform clustering using either a fixed or variable number of clusters and find the clustering that is optimal with respect to a certain validity index. Either a population of solutions or only one solution can be used. The single-solution-based evolutionary algorithms have smaller evaluation count but their solution quality is usually not as good as those that are population-based.

Siddiqi and Sait [72] propose a heuristic for data-clustering problems, abbreviated as FCM . It comprises two parts, a greedy algorithm and a single-solution-based heuristic. The first part selects the data points that can act as the centroids of well-separated clusters. The second part performs clustering with the objective of optimizing a cluster validity index. The proposed heuristic consists of five main components: (1) genes; (2) fitness of genes; (3) selection; (4) mutation operation; and (5) diversification. The objective functions used in the proposed heuristic are the Calinski–Harabasz index and Dunn index. Zhang et al. [73] propose a clustering algorithm, called ICFSKM, for clustering large dynamic data in industrial IoT. Two cluster operations, cluster creating and cluster merging, are defined to integrate the current pattern into the previous one for the final clustering result. Also, k-mediods is used for modifying the clustering centers according to the new arriving objects. Table 12 shows a DI comparison between SCC-I and HDC, and Table 13 shows a NMI comparison between SCC-I and ICFSKM, for some datasets. In these tables, the datasets Balance scale, Banknote authentication, Landsat satellite, Pen-based digits, Waveform-5000, and Wine are also selected from the UCI repository [61]. The results for HDC and ICFSKM are copied directly from [65,66]. We can see that SCC-I is comparable to HDC and ICFSKM. Note that HDC and ICFSKM are goal-oriented. For example, the objective functions used in HDC include the Dunn index. Therefore, DI values can be optimized by HDC intentionally. Furthermore, the objective function is usually computationally intensive and the evolutionary algorithms are considered to be slow.

4.4. Setting of α

In Equation (14), the difference between the weights of neighboring dimensions is controlled by

α

. For

α

= 0, no constraints are imposed on the weight differences. As

α

increases, neighboring dimensions are forced to be increasingly equally weighted and SCC-IW will behave more and more like SCC-I. Therefore, the setting of α could affect the performance of SCC-IW. In [56], a constant

g

is defined as

g = \sum_{i = 1}^{N} \sum_{k = 1}^{n} {(x_{i, k} - m_{k})}^{2}

where

m_{k} = \frac{\sum_{i = 1}^{N} x_{i, k}}{N}

and it is shown empirically that the performance of TSKmeans varies with the value of

\frac{α}{g}

[56]. Figure 1 shows how SCC-IW performs on four datasets with different values of

\frac{α}{g}

. Note that the horizontal line is scaled in

\log (\frac{α}{g})

. When

\frac{α}{g}

gets larger, SCC-IW performs increasingly like SCC-I.

5. Conclusions

SCC is an exclusive clustering method, performing only one training cycle on the training instances. Clusters are created incrementally. However, the clusters produced may depend on the feeding order in which the instances are considered, and assignment errors are more likely to occur. Also, all dimensions are equally weighted in the clustering process, which may not be suitable in certain applications, e.g., time-series clustering. We have presented two improvements, SCC-I and SCC-IW. SCC-I performs two or more training cycles iteratively, and allows instances to be re-assigned afterwards. In this way, the clusters produced are less likely to be affected by the feeding order of the instances. On the other hand, SCC-IW allows each dimension to be weighted differently in the clustering process. The values of the weights are adaptively learned from the data. Experiments have shown that SCC-IW performs effectively in clustering time-series data.

SCC-I and SCC-IW take more training time due to: (1) they have to compute z-distances and Gaussian values, which is computationally expensive; (2) in order to do re-assignment, the operation of removing instances from clusters is done during the clustering process. We will investigate these issues to reduce the training time in the future. Spectral clustering [74] and multidimensional scaling [75] deal with extraction of features from original ones. SCC-IW has probably some relationship with these methods since SCC-IW is also somehow able to extract some new “axis or principal components” by the adaptation of the weights. It will be interesting to explore such a relationship in the future.

Author Contributions

Conceptualization, Z.-Y.W. and S.-J.L.; methodology, Z.-Y.W. and S.-J.L; software, Z.-Y.W., C.-Y.W., and Y.-T.L.; validation, Z.-Y.W., C.-Y.W. and Y.-T.L.; formal analysis, Z.-Y.W. and S.-J.L.; investigation, S.-J.L.; resources, S.-J.L.; data curation, Z.-Y.W., C.-Y.W., and Y.-T.L.; writing—original draft preparation, Z.-Y.W., C.-Y.W., and Y.-T.L.; writing—review and editing, S.-J.L.; visualization, C.-Y.W. and Y.-T.L.; supervision, S.-J.L.; project administration, Y.-T.L.; funding acquisition, S.-J.L.

Funding

This research was funded by the Ministry of Science and Technology under grants MOST-103-2221-E-110-047-MY2, MOST-104-2622-E-110-014-CC3, and MOST-106-2221-E-110-080. The APC was funded by National Sun Yat-Sen University.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

A simple example is given here to illustrate how SCC-I works. Suppose X has the following 12 training instances:

x_{1} = {[0.30 0.60]}^{T}; x_{2} = {[0.70 0.35]}^{T}; x_{3} = {[0.50 0.52]}^{T};

x_{4} = {[0.35 0.38]}^{T}; x_{5} = {[0.19 0.89]}^{T}; x_{6} = {[0.78 0.20]}^{T};

x_{7} = {[0.62 0.25]}^{T}; x_{8} = {[0.24 0.81]}^{T}; x_{9} = {[0.29 0.89]}^{T};

x_{10} = {[0.40 0.65]}^{T}; x_{11} = {[0.28 0.48]}^{T}; x_{12} = {[0.24 0.89]}^{T} .

Note that

N = 12

and

n = 2

. These instances are shown in Figure A1a, marked as circles. Let

ρ = 0.55

and

σ_{0} = 0.2

. Below, we consider two feeding orders of instances.

• The feeding order of

x_{1}

, ...,

x_{12}

.

After performing SCC in the first iteration, there are six clusters:

C_{1}

,

C_{2}

,

C_{3}

,

C_{4}

,

C_{5}

, and

C_{6}

, as shown in Figure A1b, with

c_{1} = {[0.350 0.625]}^{T}, σ_{1} = {[0.271 0.235]}^{T};

c_{2} = {[0.660 0.300]}^{T}, σ_{2} = {[0.257 0.271]}^{T};

c_{3} = {[0.500 0.520]}^{T}, σ_{3} = {[0.200 0.200]}^{T};

c_{4} = {[0.315 0.430]}^{T}, σ_{4} = {[0.250 0.271]}^{T};

c_{5} = {[0.240 0.870]}^{T}, σ_{5} = {[0.241 0.240]}^{T};

c_{6} = {[0.780 0.200]}^{T}, σ_{6} = {[0.200 0.200]}^{T} .

The clusters are numbered and wrapped in dashed contours, with their centers marked with crosses. Instances

x_{1}

and

x_{10}

are assigned to

C_{1}

,

x_{2}

and

x_{7}

are assigned to

C_{2}

,

x_{3}

is assigned to

C_{3}

,

x_{4}

and

x_{11}

are assigned to

C_{4}

,

x_{5}

,

x_{8}

,

x_{9}

, and

x_{12}

are assigned to

C_{5}

, and

x_{6}

is assigned to

C_{6}

. After the second iteration, we have 4 clusters:

C_{1}

,

C_{2}

,

C_{3}

, and

C_{4}

, as shown in Figure A1c, with:

c_{1} = {[0.370 0.563]}^{T}, σ_{1} = {[0.301 0.277]}^{T};

c_{2} = {[0.700 0.267]}^{T}, σ_{2} = {[0.280 0.276]}^{T};

c_{3} = {[0.350 0.380]}^{T}, σ_{3} = {[0.200 0.200]}^{T};

c_{4} = {[0.240 0.870]}^{T}, σ_{4} = {[0.241 0.240]}^{T} .

Instances

x_{1}

,

x_{3}

,

x_{10}

, and

x_{11}

are assigned to

C_{1}

,

x_{2}

,

x_{6}

, and

x_{7}

are assigned to

C_{2}

,

x_{4}

is assigned to

C_{3}

, and

x_{5}

,

x_{8}

,

x_{9}

, and

x_{12}

are assigned to

C_{4}

. After the third iteration, we have 3 clusters:

C_{1}

,

C_{2}

, and

C_{3}

, as shown in Figure A1d, with:

Figure A1. Clusters produced with the first feeding order.

c_{1} = {[0.366 0.526]}^{T}, σ_{1} = {[0.288 0.305]}^{T};

c_{2} = {[0.700 0.267]}^{T}, σ_{2} = {[0.280 0.276]}^{T};

c_{3} = {[0.240 0.870]}^{T}, σ_{3} = {[0.241 0.240]}^{T} .

Instances

x_{1}

,

x_{3}

,

x_{4}

,

x_{10}

, and

x_{11}

are assigned to

C_{1}

,

x_{2}

,

x_{6}

, and

x_{7}

are assigned to

C_{2}

, and

x_{5}

,

x_{8}

,

x_{9}

, and

x_{12}

are assigned to

C_{3}

. In the fourth iteration, no assignment has been changed. Therefore, SCC-I stops with three clusters

C_{1}

,

C_{2}

, and

C_{3}

as shown above, with 5, 3, and 4 instances assigned to them, respectively.

• The feeding order of

x_{9}

,

x_{10}

,

x_{4}

,

x_{7}

,

x_{5}

,

x_{11}

,

x_{2}

,

x_{12}

,

x_{3}

,

x_{1}

,

x_{6}

,

x_{8}

. After performing SCC in the first iteration, there are 5 clusters:

C_{1}

,

C_{2}

,

C_{3}

,

C_{4}

, and

C_{5}

, as shown in Figure A2a, with:

Figure A2. Clusters produced with the second feeding order.

c_{1} = {[0.240 0.870]}^{T}, σ_{1} = {[0.240 0.240]}^{T};

c_{2} = {[0.350 0.625]}^{T}, σ_{2} = {[0.271 0.235]}^{T};

c_{3} = {[0.315 0.430]}^{T}, σ_{3} = {[0.250 0.271]}^{T};

c_{4} = {[0.700 0.267]}^{T}, σ_{4} = {[0.280 0.276]}^{T} .

c_{5} = {[0.500 0.520]}^{T}, σ_{5} = {[0.200 0.200]}^{T} .

Instances

x_{1}

,

x_{5}

,

x_{8}

, and

x_{12}

are assigned to

C_{1}

,

x_{2}

and

x_{10}

are assigned to

C_{2}

,

x_{3}

and

x_{6}

are assigned to

C_{3}

,

x_{4}

,

x_{7}

, and

x_{11}

are assigned to

C_{4}

, and

x_{9}

is assigned to

C_{5}

. Iterations 2 and 3 are performed subsequently. After the fourth iteration, we have 3 clusters:

C_{1}

,

C_{2}

, and

C_{3}

, as shown in Figure A2b, with:

c_{1} = {[0.240 0.870]}^{T}, σ_{1} = {[0.241 0.240]}^{T};

c_{2} = {[0.366 0.526]}^{T}, σ_{2} = {[0.288 0.305]}^{T};

c_{3} = {[0.700 0.267]}^{T}, σ_{3} = {[0.280 0.276]}^{T} .

Instances

x_{1}

,

x_{5}

,

x_{8}

, and

x_{12}

are assigned to

C_{1}

,

x_{2}

,

x_{3}

,

x_{6}

,

x_{9}

, and

x_{10}

are assigned to

C_{2}

, and

x_{4}

,

x_{7}

, and

x_{11}

are assigned to

C_{3}

. Then the cluster assignments are stable and so SCC-I stops with three clusters

C_{1}

,

C_{2}

, and

C_{3}

as shown above, with 4, 5, and 3 instances assigned to them, respectively.

Note that SCC produces 6 clusters, Figure A1b, with the first feeding order and 5 clusters, Figure A2a, with the second feeding order, and the two sets of clusters are different. However, SCC-I produces 3 clusters, Figure A1d and Figure A2b, with both feeding orders and the two sets of clusters are essentially the same. Clearly, the clusters obtained by SCC-I are more stable and reasonable.

Appendix B

Another simple example is given here to illustrate how SCC-IW works. Suppose X has the following 15 training instances:

x_{1} = {[0.450 0.739 0.044 0.865 0.641 0.036]}^{T};

x_{2} = {[0.240 0.985 0.957 0.808 0.601 0.053]}^{T};

x_{4} = {[0.125 0.411 0.769 0.165 0.769 0.340]}^{T};

x_{5} = {[0.613 0.842 0.262 0.807 0.438 0.953]}^{T};

x_{6} = {[0.520 0.574 0.394 0.879 0.656 0.022]}^{T};

x_{7} = {[0.753 0.812 0.258 0.897 0.050 0.482]}^{T};

x_{8} = {[0.114 0.493 0.783 0.674 0.588 0.499]}^{T};

x_{9} = {[0.176 0.491 0.787 0.760 0.601 0.326]}^{T};

x_{10} = {[0.172 0.620 0.714 0.802 0.692 0.089]}^{T};

x_{11} = {[0.997 0.813 0.213 0.884 0.166 0.613]}^{T};

x_{12} = {[0.147 0.498 0.738 0.889 0.111 0.100]}^{T};

x_{13} = {[0.287 0.052 0.193 0.821 0.625 0.090]}^{T};

x_{15} = {[0.009 0.831 0.282 0.842 0.885 0.561]}^{T} .

Note that

N = 15

and

n = 6

. Each instance is a short time series comprising 6 consecutive time samplings. Let

ρ = 0.125

and

σ_{0} = 0.125

Below, we consider the feeding order of

x_{1}

, ...,

x_{15}

.

By SCC, 7 clusters, $C_{1}$ , ..., $C_{7}$ , are obtained, with size being 3, 1, 5, 1, 1, 3, and 1, respectively. Instances , $x_{6}$ , and $x_{13}$ are assigned to $C_{1}$ ; , $x_{8}$ , $x_{9}$ , $x_{10}$ , and $x_{12}$ are assigned to $C_{3}$ ; and , $x_{11}$ , and $x_{14}$ are assigned to $C_{6}$ . $C_{2}$ , $C_{4}$ , $C_{5}$ , and $C_{7}$ are singletons, containing $x_{2}$ , $x_{4}$ , $x_{5}$ , and $x_{15}$ , respectively.
By SCCI-I, convergence is achieved in the 3rd iteration with 4 clusters, $C_{1}$ , ..., $C_{4}$ , with size being 5, 5, 4, and 1, respectively. Instances $x_{1}$ , $x_{2}$ , $x_{6}$ , $x_{10}$ , and $x_{13}$ are assigned to $C_{1}$ ; $x_{3}$ , $x_{4}$ , $x_{8}$ , $x_{9}$ , and $x_{12}$ are assigned to $C_{2}$ ; $x_{5}$ , $x_{7}$ , $x_{11}$ , and $x_{14}$ are assigned to $C_{3}$ ; and $x_{15}$ is assigned to $C_{4}$ .
By SCC-IW, convergence is also achieved in the 3rd iteration, but with only 3 clusters, $C_{1}$ , $C_{2}$ , $C_{3}$ , with size being 5, 5, and 5, respectively. Instances $x_{1}$ , $x_{2}$ , $x_{6}$ , $x_{10}$ , and $x_{13}$ are assigned to $C_{1}$ ; $x_{3}$ , $x_{4}$ , $x_{8}$ , $x_{9}$ , and $x_{12}$ are assigned to $C_{2}$ ; and $x_{5}$ , $x_{7}$ , $x_{11}$ , $x_{14}$ , and $x_{15}$ are assigned to $C_{3}$ . The instances assigned to the clusters are shown in Figure A3a–c. The weights associated with the cluster are:

$w_{1} = {[0.134 0.232 0.259 0.211 0.095 0.069]}^{T};$

$w_{2} = {[0.000 0.000 0.055 0.221 0.333 0.391]}^{T};$

$w_{3} = {[0.385 0.328 0.225 0.062 0.000 0.000]}^{T}$

Which are depicted in Figure A3d. The clustering done by SCC-IW seems to be the most suitable, intuitively. All the instances assigned to cluster

C_{1}

have similar time samplings at indices 2, 3, and 4 which are manifested by large weights at these indices. Similarly, all the instances assigned to cluster

C_{2}

have similar time samplings at indices 4, 5, and 6, which are manifested by large weights at these indices, and all the instances assigned to cluster

C_{3}

have similar time samplings at indices 1, 2, and 3, which are manifested by large weights at these indices.

Figure A3. Clusters produced by SCC-IW.

References

Olson, D.L.; Shi, Y. Introduction to Business Data Mining; McGraw-Hill/Irwin Englewood Cliffs: Boston, MA, USA, 2007. [Google Scholar]
Theodoridis, S.; Koutroumbas, K. Pattern Recognition; Elsevier: Amsterdam, The Netherlands, 2008. [Google Scholar]
Li, W.; Jaroszewski, L.; Godzik, A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 2001, 17, 282–283. [Google Scholar] [CrossRef]
Lee, S.-J.; Ouyang, C.-S.; Du, S.-H. A neuro-fuzzy approach for segmentation of human objects in image sequences. IEEE Trans. Cybern. 2003, 33, 420–437. [Google Scholar]
Filipovych, R.; Resnick, S.M.; Davatzikos, C. Semi-supervised cluster analysis of imaging data. NeuroImage 2011, 54, 2185–2197. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jiang, J.Y.; Liou, R.J.; Lee, S.J. A fuzzy self-constructing feature clustering algorithm for text classification. IEEE Trans. Knowl. Data Eng. 2011, 23, 335–349. [Google Scholar] [CrossRef]
Xu, R.-F.; Lee, S.-J. Dimensionality reduction by feature clustering for regression problems. Inf. Sci. 2015, 299, 42–57. [Google Scholar] [CrossRef]
Wang, M.; Yu, Y.; Lin, W. Adaptive neural-based fuzzy inference system approach applied to steering control. In Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks—Part II, Wuhan, China, 26–29 May 2009; pp. 1189–1196. [Google Scholar]
Xu, Y.; Olman, V.; Xu, D. Clustering gene expression data using a graph-theoretic approach: An application of minimum spanning trees. Bioinformatics 2002, 18, 536–545. [Google Scholar] [CrossRef] [Green Version]
Wei, C.-C.; Chen, T.-T.; Lee, S.-J. K-NN based neuro-fuzzy system for time series prediction. In Proceedings of the14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Honolulu, HI, USA, 1–3 July 2013; pp. 569–574. [Google Scholar]
Can, F.; Ozkarahan, E.A. Concepts and effectiveness of the cover-coefficient based clustering methodology for text databases. ACM Trans. Database Syst. 1990, 15, 483–517. [Google Scholar] [CrossRef]
Feldman, R.; Sanger, J. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Lee, S.-J.; Jiang, J.-Y. Multilabel text categorization based on fuzzy relevance clustering. IEEE Trans. Fuzzy Syst. 2014, 22, 1457–1471. [Google Scholar] [CrossRef]
Liao, C.-L.; Lee, S.-J. A clustering based approach to improving the efficiency of collaborative filtering recommendation. Electron. Commer. Res. Appl. 2016, 18, 1–9. [Google Scholar] [CrossRef]
Alvarez, F.M.; Troncoso, A.; Riquelme, J.C.; Ruiz, J.S.A. Energy time series forecasting based on pattern sequence similarity. IEEE Trans. Knowl. Data Eng. 2011, 23, 1230–1243. [Google Scholar] [CrossRef]
Wang, Z.-Y.; Lee, S.-J. A neuro-fuzzy based method for TAIEX forecasting. In Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC), Lanzhou, China, 13–16 July 2014; Volume 1, pp. 579–584. [Google Scholar]
Everitt, B. Cluster Analysis; Wiley: West Sussex, Chichester, UK, 2011. [Google Scholar]
Aghabozorgi, S.; Shirkhorshidi, A.S.; Wah, T.Y. A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE 2015, 10, e0144059. [Google Scholar]
Kohonen, T. Self-Organizing Maps; Springer-Verlag: Berlin, Germany, 1995. [Google Scholar]
Lee, S.-J.; Ouyang, C.-S. A neuro-fuzzy system modeling with self-constructing rule generation and hybrid SVD-based learning. IEEE Trans. Fuzzy Syst. 2003, 11, 341–353. [Google Scholar]
Park, H.-S.; Jun, C.-H. A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 2009, 36, 3336–3341. [Google Scholar] [CrossRef]
Sculley, D. Web-scale k-means clustering. In Proceedings of the19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 1177–1178. [Google Scholar]
Capo, M.; Pérez, A.; Lozano, J.A. An efficient k-means clustering algorithm for massive data. arXiv 2018, arXiv:1801.02949. [Google Scholar]
Abdalgader, K. Centroid-based lexical clustering. IntechOpen 2018. [Google Scholar] [CrossRef]
Rezaei, M. Improving a centroid-based clustering by using suitable centroids from another clustering. J. Classif. 2019, 1–14. [Google Scholar] [CrossRef]
Sarmiento, A.; Fondon, I.; Durán-Díaz, I.; Cruces, S. Centroid-based clustering with αβ-divergences. Entropy 2019, 21, 196. [Google Scholar] [CrossRef] [Green Version]
Kraskov, A.; Stogbauer, H.; Andrzejak, R.G.; Grassberger, P. Hierarchical clustering based on mutual information. arXiv 2003, arXiv:q-bio/0311039v2. [Google Scholar]
Szekely, G.J.; Rizzo, M.L. Hierarchical clustering via joint between-within distances: Extending Ward’s minimum variance method. J. Classif. 2005, 22, 151–183. [Google Scholar] [CrossRef]
Achtert, E.; Bohm, C.; Kroger, P. DeLi-Clu: Boosting robustness, completeness, usability, and efficiency of hierarchical clustering by a closest pair ranking. Lect. Notes Comput. Sci. 2006, 3918, 119–128. [Google Scholar]
Achtert, E.; Bohm, C.; Kroger, P.; Zimek, A. Mining hierarchies of correlation clusters. In Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM), Vienna, Austria, 3–5 July 2006; pp. 119–128. [Google Scholar]
Zhang, W.; Zhao, D.; Wang, X. Agglomerative clustering via maximum incremental path integral. Pattern Recognit. 2013, 46, 3056–3065. [Google Scholar] [CrossRef]
Gagolewski, M.; Bartoszuk, M.; Cena, A. Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 2016, 363, 8–23. [Google Scholar] [CrossRef]
Figueiredo, M.A.T.; Jain, A.K. Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 381–396. [Google Scholar] [CrossRef] [Green Version]
Pal, K.; Keller, J.; Bezdek, J. A possibilistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Syst. 2005, 13, 517–530. [Google Scholar] [CrossRef]
Fellows, M.R.; Guo, J.; Komusiewicz, C.; Niedermeier, R.; Uhlmann, J. Graph-based data clustering with overlaps. Discret. Optim. 2011, 8, 2–17. [Google Scholar] [CrossRef] [Green Version]
Pérez-Suárez, A.; Martinez-Trinidad, J.F.; Carrasco-Ochoa, J.A.; Medina-Pagola, J.E. OClustR: A new graph-based algorithm for overlapping clustering. Neuro-Computing 2013, 121, 234–247. [Google Scholar]
Baadel, S.; Thabtah, F.; Lu, J. MCOKE: Multi-cluster overlapping k-means extension algorithm. Int. J. Comput. Control Quantum Inf. Eng. 2015, 9, 374–377. [Google Scholar]
Améndola, C.; Faugére, J.-C.; Sturmfels, E. Moment varieties of Gaussian mixtures. J. Algebraic Stat. 2016, 7, 14–28. [Google Scholar] [CrossRef] [Green Version]
Kriegel, H.-P.; Kroger, P.; Sander, J.; Zimek, A. Density-based clustering. WIREs Data Min. Knowl. Discov. 2011, 1, 231–240. [Google Scholar] [CrossRef]
Heredia, L.C.; Mor, A.R. Density-based clustering methods for unsupervised, separation of partial discharge sources. Int. J. Electr. Power Energy Syst. 2019, 107, 224–230. [Google Scholar] [CrossRef]
Wang, T.; Ren, C.; Luo, Y.; Tian, J. NS-DBSCAN: A density-based clustering algorithm in network space. Int. J. Geo-Inf. 2018. [Google Scholar] [CrossRef] [Green Version]
Cheng, C.-H.; Fu, A.W.; Zhang, Y. Entropy-based subspace clustering for mining numerical data. In Proceedings of the 5th ACM International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 15—18 August 1999; pp. 84–93. [Google Scholar]
Kailing, K.; Kriegel, H.-P.; Kroger, P. Density-connected subspace clustering for high-dimensional data. In Proceedings of the SIAM International Conference on Data Mining (SDM’04), Lake Buena Vista, FL, USA, 22–24 April 2004; pp. 246–257. [Google Scholar]
Agrawal, R.; Gehrke, J.; Gunopulos, D.; Raghavan, P. Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 2005, 11, 5–33. [Google Scholar] [CrossRef]
Kriege, H.-P.; Kroger, P.; Zimek, A. Subspace clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2012, 2, 351–364. [Google Scholar] [CrossRef]
Luo, S.; Zhang, C.; Zhang, W.; Cao, X. Consistent and specific multi-view sub-space clustering. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–3 February 2018; pp. 3730–3737. [Google Scholar]
Zhang, T.; Ji, P.; Harandi, M.; Huang, W.; Li, H. Neural collaborative subspace clustering. arXiv 2019, arXiv:1904.10596v1. [Google Scholar]
Frey, B.J.; Dueck, D. Clustering by passing messages between data points. Science 2007, 315, 972–976. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Shi, C.; Liu, Y.; Zhang, P. Weighted community detection and data clustering using message passing. arXiv 2018, arXiv:1801.09829v1. [Google Scholar] [CrossRef] [Green Version]
Liao, T.W. Clustering of time series data-a survey. Pattern Recognit. 2005, 38, 1857–1874. [Google Scholar] [CrossRef]
Aghabozorgi, S.; Shirkhorshidi, A.S.; Wah, T.Y. Time-series clustering-a decade review. Inf. Syst. 2015, 53, 16–38. [Google Scholar] [CrossRef]
Maharaj, E.A.; D’Urso, P.; Caiado, J. Time Series Clustering and Classification; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
Wang, W.; Yang, J.; Muntz, R. STING: A statistical information grid approach to spatial data mining. In Proceedings of the International Symposium on Very Large Data Bases. Athens, Greece, 25–29 August 1997; pp. 186–195. [Google Scholar]
Aghabozorgi, S.; Wah, T.Y. Stock market co-movement assessment using a three-phase clustering method. Expert Syst. Appl. 2014, 41, 1301–1314. [Google Scholar] [CrossRef]
Ouyang, C.-S.; Lee, W.-J.; Lee, S.-J. A TSK-type neuro-fuzzy network approach to system modeling problems. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2005, 35, 751–767. [Google Scholar] [CrossRef]
Huang, X.; Ye, Y.; Xiong, L.; Lau, R.; Jiang, N.; Wang, S. Time series k-means: A new k-means type smooth subspace clustering for time series data. Inf. Sci. 2016, 367, 1–13. [Google Scholar] [CrossRef]
Wang, Z.-Y. Some Variants of Self-Constructing Clustering. Master’s Thesis, National Sun Yat-Sen University, Kaohsiung, Taiwan, 2017. [Google Scholar]
Roelofsen, P. Time Series Clustering. Master’s Thesis, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands, 2018. [Google Scholar]
Huang, X.; Ye, Y.; Guo, H.; Cai, Y.; Zhang, H.; Li, Y. DSKmeans: A new kmeans-type approach to discriminative subspace clustering. Knowl. Based Syst. 2014, 70, 293–300. [Google Scholar] [CrossRef]
Model Evaluation: Quantifying the Quality of Predictions. Available online: https://scikit-learn.org/stable/modules/model_evaluation.html#model-evaluation (accessed on 1 September 2018).
The UCI machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/about.html (accessed on 5 December 2019).
K-Means. Available online: https://www.mathworks.com/help/stats/kmeans.html (accessed on 1 September 2018).
Fuzzy C-Means. Available online: https://www.mathworks.com/help/fuzzy/fcm.html (accessed on 1 September 2018).
Gaussian Mixture Model. Available online: https://en.wikipedia.org/wiki/Mixture model (accessed on 1 September 2018).
Dbscan. Available online: https://www.mathworks.com/help/fuzzy/dbscan.html (accessed on 1 November 2019).
Matlab. Available online: https://www.mathworks.com/products/matlab.html (accessed on 1 September 2018).
Gmm Source Code. Available online: http://blog.pluskid.org/?p=39 (accessed on 1 September 2018).
Carrasco, J.; del Mar Rueda, S.M.; Herrera, F. rNPBST: An R package covering non-parametric and Bayesian statistical tests. In Proceedings of the International Conference on Hybrid Artificial Intelligence Systems, La Rioja, Spain, 21–23 June 2017; Volume 1, pp. 281–292. [Google Scholar]
ANOVA. Available online: https://www.mathworks.com/help/stats/anova1.html (accessed on 1 December 2019).
Dau, H.A.; Bagnall, A.J.; Kamgar, K.; Yeh, C.-C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.; Keogh, E.J. The UCR time series archive. arXiv 2018, arXiv:1810.07758. [Google Scholar]
Cuevas, E.; Santuario, E.; Zaldivar, D.; Perez-Cisneros, M. An improved evolutionary algorithm for reducing the number of function evaluations. Intell. Autom. Soft Comput. 2016, 22, 177–192. [Google Scholar] [CrossRef]
Siddiqi, U.F.; Sait, S.M. A new heuristic for the data clustering problem. IEEE Access. 2017, 5, 6801–6812. [Google Scholar] [CrossRef]
Zhang, Q.; Zhu, C.; Yang, L.T.; Chen, Z.; Zhao, L.; Li, P. An incremental CFS algorithm for clustering large data in industrial internet of things. IEEE Trans. Ind. Infomat. 2017, 13, 1193–1201. [Google Scholar] [CrossRef]
Arias-Castro, E.; Chen, G.; Lerman, G. Spectral clustering based on local linear approximations. Electron. J. Stat. 2011, 5, 1537–1587. [Google Scholar] [CrossRef] [Green Version]
Borg, I.; Groenen, P. Modern Multidimensional Scaling: Theory and Applications, 2nd ed.; Springer: New York, NY, USA, 2005. [Google Scholar]

Figure 1. Performance of SCC-IW vs. α.

Table 1. Characteristics of 14 non-time-series datasets.

Dataset	#Instances	#Features	#Classes
Breast	569	30	2
Ecoli	336	7	8
Ionophere	351	33	2
Breast_Tissue	106	9	6
SPECT Heart	267	44	2
Seeds	210	7	3
Sonar	208	60	2
User_Knowledge	403	5	4
Musk	476	166	2
Vehicle	846	18	4
Glass	214	9	6
Heart	270	12	2
Iris	150	4	3
Yeast	1484	8	10

Table 2. Performance comparisons of Fscore, Rand Index (RI), and Normalized Mutual Information (NMI) on 14 non-time-series datasets.

Dataset		K-Means	DSKmeans	FCM	Gmm	DBSCAN	SCC	SCC-I
Breast	Fscore	0.9270/2	0.8710/4	0.9274/1	0.7470/7	0.8663/5	0.7983/6	0.9264/3
	RI	0.8660/1	0.7940/4	0.8660/1	0.6515/7	0.7316/5	0.6953/6	0.8630/3
	NMI	0.6232/1	0.5214/5	0.6152/2	0.2449/7	0.5215/4	0.3333/6	0.6075/3
	Time(s)	0.004	0.200	0.009	0.060	0.005	0.020	0.270
Heart	Fscore	0.7442/2	0.6749/6	0.7932/1	0.6880/4	0.6339/7	0.6875/5	0.6885/3
	RI	0.6318/2	0.5678/6	0.6700/1	0.5746/4	0.5107/7	0.5708/5	0.5780/3
	NMI	0.2096/2	0.1102/7	0.2647/1	0.1205/6	0.1844/3	0.1564/5	0.1580/4
	Time(s)	0.003	0.020	0.010	0.010	0.002	0.001	0.054
Ionophere	Fscore	0.7101/3	0.7067/5	0.7072/4	0.7966/1	0.7224/2	0.6966/6	0.6892/7
	RI	0.5818/3	0.5816/4	0.5795/5	0.6948/1	0.5957/2	0.5529/7	0.5610/6
	NMI	0.1243/3	0.1224/4	0.1194/5	0.3016/1	0.2570/2	0.0532/7	0.1035/6
	Time(s)	0.004	0.100	0.005	0.020	0.004	0.010	0.140
Musk	Fscore	0.5718/5	0.6258/4	0.5539/7	0.5589/6	0.6675/2	0.6565/3	0.6690/1
	RI	0.5037/4	0.5027/5	0.5015/7	0.5020/6	0.5074/1	0.5065/2	0.5061/3
	NMI	0.0164/5	0.0220/2	0.0086/6	0.0086/6	0.0393/1	0.0170/4	0.0214/3
	Time(s)	0.009	0.360	0.020	0.140	0.012	0.020	0.090
Sonar	Fscore	0.5530/6	0.6668/1	0.5519/7	0.5920/4	0.5630/5	0.6553/3	0.6645/2
	RI	0.5032/2	0.4993/7	0.5030/3	0.5053/1	0.4999/6	0.5010/5	0.5011/4
	NMI	0.0088/6	0.0215/4	0.0085/7	0.0117/5	0.0272/3	0.0290/2	0.0475/1
	Time(s)	0.005	0.040	0.020	0.020	0.002	0.008	0.070
SPECT	Fscore	0.6944/5	0.7255/4	0.6136/7	0.6916/6	0.7732/2	0.7648/3	0.7901/1
	RI	0.5313/6	0.5773/4	0.4986/7	0.5395/5	0.6418/3	0.6540/1	0.6503/2
	NMI	0.0885/4	0.0601/6	0.1560/2	0.0634/5	0.0939/3	0.0153/7	0.1798/1
	Time(s)	0.004	0.090	0.010	0.020	0.003	0.010	0.180
Ecoli	Fscore	0.6306/6	0.7527/3	0.5977/7	0.7362/4	0.7080/5	0.7647/2	0.7843/1
	RI	0.7973/5	0.8673/2	0.7895/6	0.8500/4	0.7701/7	0.8547/3	0.8697/1
	NMI	0.5925/6	0.6612/2	0.5543/7	0.6273/4	0.6057/5	0.6466/3	0.6787/1
	Time(s)	0.008	1.020	0.030	0.200	0.002	0.020	0.130
Glass	Fscore	0.4808/4	0.5358/3	0.4525/6	0.4783/5	0.4018/7	0.5378/2	0.5740/1
	RI	0.6698/3	0.6143/5	0.7023/1	0.6785/2	0.5598/7	0.5658/6	0.6267/4
	NMI	0.3274/4	0.3757/3	0.2967/6	0.3201/5	0.2816/7	0.3782/2	0.4804/1
	Time(s)	0.005	0.270	0.016	0.020	0.002	0.008	0.100
Iris	Fscore	0.8479/6	0.9603/1	0.8926/3	0.8774/4	0.7061/7	0.8579/5	0.9029/2
	RI	0.8429/6	0.9498/1	0.8797/4	0.8818/3	0.7820/7	0.8580/5	0.8898/2
	NMI	0.7116/6	0.8648/1	0.7433/5	0.7940/2	0.5797/7	0.7516/4	0.7727/3
	Time(s)	0.003	0.060	0.002	0.036	0.001	0.006	0.040
Yeast	Fscore	0.4398/3	0.4227/5	0.3840/6	0.4597/1	0.3525/7	0.4252/4	0.4550/2
	RI	0.7490/1	0.6345/6	0.7192/4	0.7099/5	0.7227/3	0.5556/7	0.7405/2
	NMI	0.2769/2	0.2318/4	0.1785/7	0.2481/3	0.2045/6	0.2060/5	0.2879/1
	Time(s)	0.050	15.900	0.170	3.500	0.015	0.070	4.700
Breast_Tis	Fscore	0.5572/6	0.5683/3	0.5639/4	0.5605/5	0.5898/2	0.5276/7	0.6122/1
	RI	0.7869/2	0.7711/4	0.7887/1	0.7849/3	0.6635/6	0.6535/7	0.7339/5
	NMI	0.5203/5	0.5429/3	0.5213/4	0.5189/6	0.5745/1	0.4959/7	0.5688/2
	Time(s)	0.005	0.100	0.005	0.040	0.001	0.005	0.030
Seeds	Fscore	0.8905/2	0.8861/4	0.9002/1	0.8714/6	0.8754/5	0.8607/7	0.8885/3
	RI	0.8693/3	0.8668/4	0.8789/1	0.8658/5	0.8404/7	0.8449/6	0.8698/2
	NMI	0.6743/5	0.6841/4	0.6911/3	0.7232/1	0.6694/6	0.6556/7	0.6943/2
	Time(s)	0.003	0.080	0.003	0.040	0.002	0.008	0.080
User_Know	Fscore	0.5199/4	0.4919/6	0.5402/2	0.5060/5	0.5256/3	0.4879/7	0.6064/1
	RI	0.6916/2	0.6459/6	0.6763/3	0.6752/4	0.6718/5	0.5865/7	0.7357/1
	NMI	0.3062/2	0.2451/6	0.2888/3	0.2628/5	0.2718/4	0.2295/7	0.4593/1
	Time(s)	0.007	0.360	0.040	0.400	0.004	0.016	0.320
Vehicle	Fscore	0.4264/5	0.4563/4	0.4191/6	0.4587/3	0.4003/7	0.4673/2	0.4788/1
	RI	0.6539/1	0.5830/4	0.6521/2	0.6428/3	0.5494/6	0.5466/7	0.5631/5
	NMI	0.1283/6	0.1590/4	0.0986/7	0.1733/2	0.1302/5	0.1686/3	0.1986/1
	Time(s)	0.010	0.950	0.030	0.440	0.007	0.030	0.270

Table 3. Averaged ranking of 14 non-time-series datasets.

	K-Means	DSKmeans	FCM	Gmm	DBSCAN	SCC	SCC-I
Fscore	4.2	3.8	4.4	4.4	4.7	4.4	2.1
RI	2.9	4.4	3.2	3.8	5.1	5.3	3.1
NMI	4.1	3.9	4.6	4.1	4.1	4.9	2.1

Table 4. Performance comparisons of Davies–Bouldin index (DBI), Dunn index (DI), and Silhouette index (SI) on 14 non-time-series datasets.

Dataset		K-Means	DSKmeans	FCM	Gmm	DBSCAN	SCC-I
Breast	DBI	1.2336	0.9218	1.2415	1.1938	1.0814	0.8836
	DI	0.0838	0.1457	0.0838	0.0853	0.1452	0.1452
	SI	0.5765	0.6613	0.5683	0.6028	0.5459	0.5459
Heart	DBI	1.8844	1.9028	1.8983	1.8894	1.3821	1.4799
	DI	0.3644	0.3715	0.2499	0.2551	0.1787	0.3604
	SI	0.3484	0.3314	0.3444	0.3467	0.3458	0.2785
Ionophere	DBI	0.5498	0.5813	1.7182	1.8497	2.0965	0.4926
	DI	0.4841	0.4480	0.0703	0.3837	0.1012	0.5706
	SI	0.5706	0.4944	0.4097	0.5162	0.5376	0.6050
Musk	DBI	1.3069	1.3336	1.3594	1.2995	1.1038	0.7281
	DI	0.1894	0.2056	0.1963	0.2508	0.1559	0.5793
	SI	0.5321	0.5215	0.5182	0.5319	0.5278	0.5319
Sonar	DBI	1.8199	1.7128	1.8483	1.6240	1.3181	0.5805
	DI	0.1337	0.2312	0.1786	0.2376	0.1458	0.3789
	SI	0.3319	0.3754	0.3318	0.3872	0.1541	0.4163
SPECT	DBI	1.4525	1.2578	1.8714	1.2676	1.1655	0.3567
	DI	0.2585	0.3853	0.1566	0.3268	0.1379	0.1287
	SI	0.6739	0.7257	0.4610	0.7308	0.3021	0.7467
Ecoli	DBI	1.1673	1.3571	1.8659	0.9335	0.8678	0.7825
	DI	0.0765	0.0956	0.0398	0.0661	0.0703	0.0947
	SI	0.4940	0.5685	0.2872	0.3981	0.5390	0.6269
Glass	DBI	0.9956	1.1442	1.7902	1.1721	2.1930	0.6346
	DI	0.1489	0.0633	0.0442	0.1392	0.2051	0.2060
	SI	0.7042	0.4512	0.4421	0.5138	0.6672	0.6831
Iris	DBI	0.8281	0.6113	0.8436	0.6172	0.8012	0.8157
	DI	0.0694	0.1448	0.0701	0.1056	0.0619	0.0899
	SI	0.6959	0.6656	0.6891	0.6461	0.6949	0.6959
Yeast	DBI	1.3526	1.6716	2.7041	1.8856	1.0550	0.8318
	DI	0.0399	0.0327	0.0208	0.0321	0.0167	0.0400
	SI	0.3339	0.2959	0.0263	0.2911	0.8213	0.3321
Breast_Tis	DBI	0.8691	1.0705	1.0286	1.0715	0.7800	0.8583
	DI	0.1146	0.1096	0.0449	0.1041	0.1093	0.1174
	SI	0.5926	0.5330	0.4629	0.4960	0.5610	0.6109
Seeds	DBI	0.9436	0.9612	0.9448	0.6905	0.9313	0.9451
	DI	0.1259	0.1125	0.0800	0.0988	0.0493	0.1259
	SI	0.6209	0.5972	0.6196	0.6196	0.6063	0.6208
User_Know	DBI	1.6013	1.6626	1.6933	1.6118	1.6979	1.7487
	DI	0.0957	0.0989	0.0842	0.0776	0.0758	0.1053
	SI	0.3097	0.2949	0.2473	0.2676	0.2593	0.3018
Vehicle	DBI	1.1933	1.2974	1.6017	1.1883	1.0709	0.9569
	DI	0.0846	0.0884	0.0668	0.0951	0.0573	0.0969
	SI	0.4793	0.4372	0.3986	0.3872	0.4864	0.6170

Table 5. Pairwise comparisons by t-test for non-time-series datasets.

SCC-I vs	K-Means	DSKmeans	FCM	Gmm	DBSCAN
Fscore	3.3559	2.5383	2.7673	2.7686	3.9622
NMI	2.6634	2.3637	2.4157	1.7677	2.2601
DBI	3.3703	3.6033	4.4584	3.3139	2.5186

Table 6. Characteristics of 10 time-series datasets.

Dataset	#Instances	#Features	#Classes
SynControl	600	60	6
Coffee	56	286	2
Light7	143	319	7
OSU_Leaf	442	427	6
Sony_Surface	621	70	2
Trace	200	275	4
CBF	930	128	3
ECGFiveDays	884	136	2
FaceFour	350	112	4
OliveOil	60	570	4

Table 7. Performance comparisons of Fscore, RI, and NMI on 10 time series datasets—Part I.

Dataset		K-Means	TSKmeans	DSKmeans	FCM
CBF	Fscore	0.6359/5	0.7028/2	0.6890/3	0.6316/6
	RI	0.7071/4	0.7331/2	0.7028/5	0.6973/6
	NMI	0.3616/5	0.4716/2	0.4701/3	0.3364/7
	Time(s)	0.030	1.900	1.600	0.140
Coffee	Fscore	0.7578/5	0.7425/6	0.8441/2	0.8912/1
	RI	0.6667/3	0.6572/4	0.7662/2	0.8052/1
	NMI	0.3230/4	0.3051/6	0.5067/2	0.6001/1
	Time(s)	0.040	0.100	0.010	0.010
ECG5	Fscore	0.5157/7	0.7091/2	0.5943/3	0.5147/8
	RI	0.4999/7	0.5780/2	0.5033/3	0.4999/7
	NMI	0.0007/7	0.1462/2	0.0327/3	0.0006/8
	Time(s)	0.010	1.500	1.050	0.030
Face4	Fscore	0.6468/5	0.7394/2	0.6631/4	0.5765/8
	RI	0.7443/5	0.7977/2	0.7448/4	0.6881/7
	NMI	0.4585/4	0.6140/2	0.4493/5	0.3777/8
	Time(s)	0.009	0.780	0.160	0.070
Light7	Fscore	0.5779/3	0.5677/4	0.5652/5	0.3664/8
	RI	0.8181/1	0.7937/4	0.8142/2	0.6317/8
	NMI	0.4990/2	0.4796/5	0.4890/4	0.2588/8
	Time(s)	0.020	1.500	0.500	0.020
Oil	Fscore	0.8212/3	0.8175/4	0.8148/5	0.8226/2
	RI	0.8558/3	0.8524/5	0.8480/6	0.8757/2
	NMI	0.6906/2	0.6603/5	0.6688/4	0.6809/3
	Time(s)	0.009	1.100	0.050	0.060
OSU_leaf	Fscore	0.4154/3	0.4068/5	0.4070/4	0.3411/8
	RI	0.7456/1	0.7447/2	0.7391/3	0.5895/8
	NMI	0.2233/3	0.2091/5	0.2159/4	0.1030/8
	Time(s)	0.090	5.300	2.300	0.350
Sony_Surf	Fscore	0.8022/4	0.7883/5	0.7445/6	0.8610/1
	RI	0.6947/3	0.6863/4	0.6415/6	0.7710/1
	NMI	0.3828/3	0.3674/4	0.2727/6	0.4907/1
	Time(s)	0.009	0.420	0.400	0.009
Syntheic	Fscore	0.7256/3	0.7543/1	0.7284/2	0.6393/6
	RI	0.8763/2	0.8908/1	0.8722/3	0.8386/6
	NMI	0.7859/3	0.8143/1	0.7756/4	0.6946/6
	Time(s)	0.010	0.560	1.200	0.040
Trace	Fscore	0.5491/8	0.5820/5	0.6143/2	0.5643/6
	RI	0.7498/7	0.7401/8	0.7493/7	0.7521/2
	NMI	0.5160/6	0.5142/8	0.5698/2	0.5204/5
	Time(s)	0.008	0.600	0.500	0.040

Table 8. Performance comparisons of Fscore, RI, and NMI on 10 time-series datasets—Part II.

Dataset		Gmm	SCC	SCC-I	SCC-IW
CBF	Fscore	0.6101/7	0.5822/8	0.6694/4	0.8032/1
	RI	0.6770/7	0.5925/8	0.7192/3	0.7516/1
	NMI	0.3559/6	0.2837/8	0.4302/4	0.4860/1
	Time(s)	0.470	0.040	1.040	1.200
Coffee	Fscore	0.7366/7	0.6849/8	0.7619/4	0.7745/3
	RI	0.6358/6	0.5331/8	0.6295/7	0.6451/5
	NMI	0.2649/7	0.1329/8	0.3109/5	0.3318/3
	Time(s)	0.070	0.003	0.020	0.120
ECG5	Fscore	0.5438/4	0.5321/5	0.5178/6	0.7395/1
	RI	0.5013/4	0.5003/5	0.5001/6	0.6147/1
	NMI	0.0029/4	0.0014/5	0.0009/6	0.1750/1
	Time(s)	0.040	0.040	0.430	1.600
Face4	Fscore	0.6136/6	0.6107/7	0.7008/3	0.8185/1
	RI	0.7065/6	0.6339/8	0.7470/3	0.8655/1
	NMI	0.4220/6	0.3846/7	0.5163/3	0.7540/1
	Time(s)	0.090	0.007	0.060	0.540
Light7	Fscore	0.4967/7	0.5082/6	0.5819/2	0.6036/1
	RI	0.7783/6	0.692/7	0.8033/3	0.7809/5
	NMI	0.4110/6	0.3985/7	0.4985/3	0.5048/1
	Time(s)	0.130	0.010	0.100	0.680
Oil	Fscore	0.8103/6	0.6979/8	0.7728/7	0.8671/1
	RI	0.8548/4	0.6984/8	0.8042/7	0.8823/1
	NMI	0.6579/6	0.5155/8	0.6250/7	0.7422/1
	Time(s)	2.400	0.004	0.030	0.650
OSU_leaf	Fscore	0.3606/7	0.3612/6	0.4295/1	0.4252/2
	RI	0.7086/6	0.5978/7	0.7265/5	0.7350/4
	NMI	0.1663/6	0.1636/7	0.2562/1	0.2398/2
	Time(s)	0.36	0.030	0.700	2.200
Sony_Surf	Fscore	0.6604/8	0.6869/7	0.8033/3	0.8361/2
	RI	0.5627/7	0.5335/8	0.6838/5	0.7275/2
	NMI	0.1569/7	0.0814/8	0.3432/5	0.4245/2
	Time(s)	0.020	0.030	0.320	0.410
Syntheic	Fscore	0.5924/7	0.6789/5	0.5524/8	0.7164/4
	RI	0.8128/7	0.8588/5	0.7175/8	0.8694/4
	NMI	0.6378/7	0.7703/5	0.6161/8	0.7932/2
	Time(s)	1.000	0.290	0.030	0.450
Trace	Fscore	0.5500/7	0.6128/3	0.5896/4	0.7622/1
	RI	0.7500/5	0.7506/4	0.7514/3	0.8313/1
	NMI	0.5188/6	0.5696/3	0.5452/4	0.7649/1
	Time(s)	0.300	0.010	0.100	0.620

Table 9. Performance comparisons of DBI, DI, and SI on 10 time-series datasets.

Dataset		K-Means	TSKmeans	DSKmeans	FCM	Gmm	SCC-IW
CBF	DBI	1.9847	2.2397	2.2115	1.5132	1.6953	1.2029
	DI	0.2980	0.2968	0.2695	0.2882	0.3198	0.3452
	SI	0.2862	0.2236	0.2294	0.2071	0.3135	0.3981
Coffee	DBI	1.5204	1.4311	1.4311	1.7645	1.4311	1.1966
	DI	0.3008	0.2586	0.2586	0.2633	0.3045	0.5019
	SI	0.4353	0.4447	0.4447	0.2749	0.4447	0.6344
ECG5	DBI	1.2276	1.2320	1.2318	1.2287	1.2168	1.0752
	DI	0.0433	0.0384	0.0386	0.0274	0.0574	0.0239
	SI	0.5446	0.5424	0.5161	0.5445	0.5445	0.5452
Face4	DBI	1.4030	1.9185	1.8185	1.8166	1.3659	1.3381
	DI	0.3399	0.2176	0.2469	0.2742	0.3284	0.2735
	SI	0.4301	0.2547	0.2279	0.2475	0.4036	0.2523
Light7	DBI	1.4136	1.8542	1.6855	1.4740	1.6682	1.2476
	DI	0.3324	0.2275	0.2871	0.2811	0.3151	0.2181
	SI	0.3493	0.1827	0.2921	0.2494	0.2914	0.2025
Oil	DBI	1.0658	1.0559	1.5853	1.6679	1.2671	0.6984
	DI	0.3941	0.3000	0.1660	0.2279	0.3062	0.4959
	SI	0.5317	0.5261	0.2448	0.1893	0.4837	0.4494
OSU_leaf	DBI	2.0969	2.1381	2.3428	2.0592	2.2647	1.3514
	DI	0.2569	0.2712	0.2561	0.1837	0.2507	0.1365
	SI	0.2661	0.2529	0.2256	0.1241	0.2347	0.2755
Sony_Surf	DBI	2.0878	1.8717	1.9832	2.6212	2.0551	0.8246
	DI	0.2673	0.2246	0.2112	0.2314	0.2449	0.2196
	SI	0.2737	0.2612	0.2021	0.2296	0.2579	0.2267
Syntheic	DBI	2.0156	2.2560	3.0803	3.3852	1.2028	1.2416
	DI	0.3088	0.2972	0.2694	0.2571	0.2518	0.6108
	SI	0.4830	0.4575	0.2430	0.1936	0.3557	0.4368
Trace	DBI	0.7461	0.9415	1.0117	1.0132	0.6862	1.3145
	DI	0.1466	0.1029	0.0462	0.0662	0.1434	0.2315
	SI	0.7013	0.4461	0.5390	0.6779	0.6926	0.6023

Table 10. Averaged ranking of 10 time-series datasets.

	K-Means	TSKmeans	DSKmeans	FCM
Fscore	4.6	3.6	3.6	5.4
RI	3.6	3.4	4.1	4.8
NMI	3.9	4.0	3.7	5.5
	Gmm	SCC	SCC-I	SCC-IW
Fscore	6.6	6.3	4.2	1.7
RI	5.8	6.8	5.0	2.5
NMI	6.1	6.6	4.6	1.5

Table 11. Pairwise comparisons by t-test for time series datasets.

SCC-IW vs	K-Means	TSKmeans	DSKmeans	FCM	Gmm
Fscore	4.3834	4.3753	3.2550	2.2606	6.4330
RI	1.9429	1.9228	1.3644	1.7049	3.8533
DBI	2.5339	3.7467	3.5688	2.9619	2.0451

Table 12. DI comparison between SCC-I and HDC.

Method	Glass	Ionosphere	Sonar	Vehicle	Heart
SCC-I	0.2060	0.5706	0.3789	0.0989	0.3604
HDC	0.2450	0.1924	0.3698	0.1054	0.1165
Method	Balance scale	Banknote authentication	Landsat satellite	Pen-based digits	Waveform-5000
SCC-I	0.4472	0.1746	0.2468	0.0743	0.4962
HDC	0.1579	0.0969	0.0650	0.0408	0.3384

Table 13. NMI comparison between SCC-I and ICFSKM.

Method	Iris	Wine	Yeast
SCC-I	0.7727	0.8334	0.2879
ICFSKM	0.8030	0.7810	0.3930

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.-Y.; Wu, C.-Y.; Lin, Y.-T.; Lee, S.-J. Weighted z-Distance-Based Clustering and Its Application to Time-Series Data. Appl. Sci. 2019, 9, 5469. https://doi.org/10.3390/app9245469

AMA Style

Wang Z-Y, Wu C-Y, Lin Y-T, Lee S-J. Weighted z-Distance-Based Clustering and Its Application to Time-Series Data. Applied Sciences. 2019; 9(24):5469. https://doi.org/10.3390/app9245469

Chicago/Turabian Style

Wang, Zhao-Yu, Chen-Yu Wu, Yan-Ting Lin, and Shie-Jue Lee. 2019. "Weighted z-Distance-Based Clustering and Its Application to Time-Series Data" Applied Sciences 9, no. 24: 5469. https://doi.org/10.3390/app9245469

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Weighted z-Distance-Based Clustering and Its Application to Time-Series Data

Abstract

1. Introduction

2. Self-Constructing Clustering (SCC)

3. Proposed Methods

3.1. Iterative SCC (SCC-I)

3.2. Weighted SCC-I (SCC-IW)

4. Experimental Results

4.1. Non Time-Series Datasets

4.2. Time-Series Datasets

4.3. Compairsons with Other Methods

4.4. Setting of α

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI