A Fast Method for Estimating the Number of Clusters Based on Score and the Minimum Distance of the Center Point
Round 1
Reviewer 1 Report
The authors presented a new approach for estimation of the number of clusters. The paper is well written and contains all its important parts. The authors also presented a comprehensive review of existing approaches of determining the number of clusters; however, despite using several datasets for assessment of the developed methodology, there is significant limitation, which makes the manuscript premature to be published at this moment – the reference sets taken for comparison of various clustering approaches contained very small number of clusters. In order to verify, whether newly developed methodology is indeed efficient, there should be at least several sets with more than 10 or even 20 clusters compared.
The developed methodology and the whole paper can get interest among readers; however, it should be more thoroughly proved that it is valid and effective.
Author Response
Thank you very much for your proposal. For your proposal, I have made the following changes.
Point 1:
In order to verify, whether newly developed methodology is indeed efficient, there should be at least several sets with more than 10 or even 20 clusters compared.
Response 1:
In order to better verify the effectiveness of the developed method, I also verified the following two datasets. The specific information of these two data sets is as follows: Movement_libras: This dataset is a real dataset. It contains 366 data points, has 91 attributes, and is divided into 15 clusters. D22: The data set is an artificial data set. It contains 2211 points, has 2 attributes, and is divided into 22 clusters. Experimental results show the effectiveness of the method. I have also added it to my experiment report.
Thanks again for your suggestions and guidance.
Reviewer 2 Report
The authors present a fast method for estimating the number of clusters, which is an important issue in clustering. The proposed method is quite straightforward, so the technical novelty is low. I have the following comments:
1. The authors should carefully proofread the paper., e.g., "Related Knowledge" Section should be revised to either "Background" or " Related Work"
2. The authors provide results on several small datasets. Can they provide results on a larger and more challenging dataset, e.g., a subset of the MNIST? Is the method expected to work on such high dimensional datasets. If no experiments can be conducted, the authors should limit the scope of the method and better discuss the limitation of the proposed approach
3. Several recent works are missing from the literature:
Caron, Mathilde, et al. "Deep clustering for unsupervised learning of visual features." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
Passalis, Nikolaos, and Anastasios Tefas. "Discriminative clustering using regularized subspace learning." Pattern Recognition 96 (2019): 106982.
Yang, Bo, et al. "Towards k-means-friendly spaces: Simultaneous deep learning and clustering." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
Author Response
Thank you very much for your proposal. For your suggestions, I have made the following changes.
Point 1:
The authors should carefully proofread the paper., e.g., "Related Knowledge" Section should be revised to either "Background" or " Related Work"
Response 1:
After careful thinking, I have changed "Relevant Knowledge" to "Background".
Point 2:
The authors provide results on several small datasets. Can they provide results on a larger and more challenging dataset, e.g., a subset of the MNIST? Is the method expected to work on such high dimensional datasets. If no experiments can be conducted, the authors should limit the scope of the method and better discuss the limitation of the proposed approach
Response 2:
In order to verify the effectiveness of the developed method on high-dimensional datasets, I verified the following two datasets. The details of these two datasets are as follows: Mnist_123: This data set is a subset of MNIST. It contains 649 data points, has 784 dimensions, and is divided into 3 clusters, which are handwritten digits "1", "2", and "3". Movement_libras: This dataset is a real dataset. It contains 366 data points, 91 attributes, and is divided into 15 clusters.
Point 1:
Several recent works are missing from the literature:
Caron, Mathilde, et al. "Deep clustering for unsupervised learning of visual features." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
Passalis, Nikolaos, and Anastasios Tefas. "Discriminative clustering using regularized subspace learning." Pattern Recognition 96 (2019): 106982.
Yang, Bo, et al. "Towards k-means-friendly spaces: Simultaneous deep learning and clustering." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
Response 1:
I carefully read the three articles you recommended. The first article is "Deep Clustering for unsupervised learning of Visual Features". It mainly describes a method of combining clustering with convolutional nerves. This method uses clustering to update the convolution weights and is used to solve large-scale end-to-end training of convolutional networks. The second article is "Discriminative clustering using regularized subspace learning ". It mainly describes how to use regularization to optimize the mapping of data to low-dimensional space. Make the newly obtained data more consistent with the distribution of the original data, and perform cluster analysis and discriminant analysis better. The third article is "Towards k-means-friendly spaces: Simultaneous deep learning and clustering". It mainly describes a method of combining dimensionality reduction and clustering. The author designed the DNN structure and associated joint optimization criteria, and used DNN to achieve dimensionality reduction and improve the ability of non-linear transformation. But my research is to determine the optimal number of clusters for cluster analysis. In order to solve the problem that the clustering algorithm needs to manually enter the number of clusters.
Thanks again for your suggestions and guidance.
Round 2
Reviewer 1 Report
Additional experiments were conducted. The manuscript can be accepted in the present form.
Reviewer 2 Report
The authors addressed my concerns.