Next Article in Journal
Big Data Analytics Correlation Taxonomy
Next Article in Special Issue
A Sentiment-Statistical Approach for Identifying Problematic Mobile App Updates Based on User Reviews
Previous Article in Journal
Radar Emitter Identification under Transfer Learning and Online Learning
 
 
Article
Peer-Review Record

A Fast Method for Estimating the Number of Clusters Based on Score and the Minimum Distance of the Center Point

Information 2020, 11(1), 16; https://doi.org/10.3390/info11010016
by Zhenzhen He, Zongpu Jia * and Xiaohong Zhang
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Information 2020, 11(1), 16; https://doi.org/10.3390/info11010016
Submission received: 25 November 2019 / Revised: 17 December 2019 / Accepted: 19 December 2019 / Published: 25 December 2019
(This article belongs to the Special Issue Big Data Integration)

Round 1

Reviewer 1 Report

The authors presented a new approach for estimation of the number of clusters. The paper is well written and contains all its important parts. The authors also presented a comprehensive review of existing approaches of determining the number of clusters; however, despite using several datasets for assessment of the developed methodology, there is significant limitation, which makes the manuscript premature to be published at this moment – the reference sets taken for comparison of various clustering approaches contained very small number of clusters. In order to verify, whether newly developed methodology is indeed efficient, there should be at least several sets with more than 10 or even 20 clusters compared. 

The developed methodology and the whole paper can get interest among readers; however, it should be more thoroughly proved that it is valid and effective.

Author Response

Thank you very much for your proposal. For your proposal, I have made the following changes.

Point 1:

In order to verify, whether newly developed methodology is indeed efficient, there should be at least several sets with more than 10 or even 20 clusters compared. 

Response 1:

In order to better verify the effectiveness of the developed method, I also verified the following two datasets. The specific information of these two data sets is as follows: Movement_libras: This dataset is a real dataset. It contains 366 data points, has 91 attributes, and is divided into 15 clusters. D22: The data set is an artificial data set. It contains 2211 points, has 2 attributes, and is divided into 22 clusters. Experimental results show the effectiveness of the method. I have also added it to my experiment report.

Thanks again for your suggestions and guidance.

Reviewer 2 Report

The authors present a fast method for estimating the number of clusters, which is an important issue in clustering. The proposed method is quite straightforward, so the technical novelty is low. I have the following comments:

1. The authors should carefully proofread the paper., e.g., "Related Knowledge" Section should be revised to either "Background" or " Related Work"
2. The authors provide results on several small datasets. Can they provide results on a larger and more challenging dataset, e.g., a subset of the MNIST? Is the method expected to work on such high dimensional datasets. If no experiments can be conducted, the authors should limit the scope of the method and better discuss the limitation of the proposed approach
3. Several recent works are missing from the literature:

Caron, Mathilde, et al. "Deep clustering for unsupervised learning of visual features." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
Passalis, Nikolaos, and Anastasios Tefas. "Discriminative clustering using regularized subspace learning." Pattern Recognition 96 (2019): 106982.
Yang, Bo, et al. "Towards k-means-friendly spaces: Simultaneous deep learning and clustering." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.

Author Response

Thank you very much for your proposal. For your suggestions, I have made the following changes.

Point 1:

The authors should carefully proofread the paper., e.g., "Related Knowledge" Section should be revised to either "Background" or " Related Work"

Response 1:

After careful thinking, I have changed "Relevant Knowledge" to "Background".

Point 2:

The authors provide results on several small datasets. Can they provide results on a larger and more challenging dataset, e.g., a subset of the MNIST? Is the method expected to work on such high dimensional datasets. If no experiments can be conducted, the authors should limit the scope of the method and better discuss the limitation of the proposed approach

Response 2:

In order to verify the effectiveness of the developed method on high-dimensional datasets, I verified the following two datasets. The details of these two datasets are as follows: Mnist_123: This data set is a subset of MNIST. It contains 649 data points, has 784 dimensions, and is divided into 3 clusters, which are handwritten digits "1", "2", and "3". Movement_libras: This dataset is a real dataset. It contains 366 data points, 91 attributes, and is divided into 15 clusters.

Point 1:

Several recent works are missing from the literature:

Caron, Mathilde, et al. "Deep clustering for unsupervised learning of visual features." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
Passalis, Nikolaos, and Anastasios Tefas. "Discriminative clustering using regularized subspace learning." Pattern Recognition 96 (2019): 106982.
Yang, Bo, et al. "Towards k-means-friendly spaces: Simultaneous deep learning and clustering." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.

Response 1:

I carefully read the three articles you recommended. The first article is "Deep Clustering for unsupervised learning of Visual Features". It mainly describes a method of combining clustering with convolutional nerves. This method uses clustering to update the convolution weights and is used to solve large-scale end-to-end training of convolutional networks. The second article is "Discriminative clustering using regularized subspace learning ". It mainly describes how to use regularization to optimize the mapping of data to low-dimensional space. Make the newly obtained data more consistent with the distribution of the original data, and perform cluster analysis and discriminant analysis better. The third article is "Towards k-means-friendly spaces: Simultaneous deep learning and clustering". It mainly describes a method of combining dimensionality reduction and clustering. The author designed the DNN structure and associated joint optimization criteria, and used DNN to achieve dimensionality reduction and improve the ability of non-linear transformation. But my research is to determine the optimal number of clusters for cluster analysis. In order to solve the problem that the clustering algorithm needs to manually enter the number of clusters.

Thanks again for your suggestions and guidance.

Round 2

Reviewer 1 Report

Additional experiments were conducted. The manuscript can be accepted in the present form.

Reviewer 2 Report

The authors addressed my concerns.

Back to TopTop