Next Article in Journal
Research on an Optimal Path Planning Method Based on A* Algorithm for Multi-View Recognition
Next Article in Special Issue
Learning Distributed Representations and Deep Embedded Clustering of Texts
Previous Article in Journal
Stimulation Montage Achieves Balanced Focality and Intensity
Previous Article in Special Issue
Binary Horse Optimization Algorithm for Feature Selection
 
 
Article
Peer-Review Record

Agglomerative Clustering with Threshold Optimization via Extreme Value Theory

Algorithms 2022, 15(5), 170; https://doi.org/10.3390/a15050170
by Chunchun Li 1, Manuel Günther 2, Akshay Raj Dhamija 1,†, Steve Cruz 1,†, Mohsen Jafarzadeh 1,†, Touqeer Ahmad 1,† and Terrance E. Boult 1,*
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Algorithms 2022, 15(5), 170; https://doi.org/10.3390/a15050170
Submission received: 15 April 2022 / Revised: 12 May 2022 / Accepted: 17 May 2022 / Published: 20 May 2022
(This article belongs to the Special Issue Nature-Inspired Algorithms in Machine Learning)

Round 1

Reviewer 1 Report

The manuscript introduces a new concept of using Extreme Value Theory to a threshold 
selection for clustering tasks. 

Some remarks:

1- Line 10,023: "exceedingly well", "impressive results of FINCH" -> avoid using very strong 
statements;
2- In Introduction there are many statements without references;
3- Fig 2 was not mentioned or even discussed in the text. I prefer the authors to discuss the figures along with the text, avoiding extensive caption;
4- Figs 1 and 2 should be in Section 3. The Introduction just points the main aspects 
of the investigation, relating the crucial points of the study and the literature;
5- Line 82: "actual nearest-neighbor linkages follow an Extreme Value Theory" -> is it valid for all datasets?
6- Section 2-> as the authors are mentioning automatic selection of parameters, I suggest the
to read the following literature: 10.1016/j.engappai.2019.04.007
7- Line 148: the acronym ANN is widely used for artificial neural networks;
8- The acronym NMI was not defined before its first use;
9- The performance comparison just considers the K-Means as the baseline. I suggest the use of more robust techniques, such as Fuzzy c-means or SOM.

Author Response

Please see the pdf file

Author Response File: Author Response.pdf

Reviewer 2 Report

Many abbreviations are not described in the text (e.g. FINCH, EVT).

The main result, PEACH algorithm, has also an abbreviated name without description.

The bibliography overview is not complete. The authors do not mention classical agglomerative approaches for clustering, e.g. GRASP or Greedy Agglomerative Heuristic algorithms.

English must be improved. E.g.  closet -> closest,

Some sentences must be supported by additional details. E.g., "We report the results of PEACH, which is fully automatic and does not use the test set to optimize parameters. " All algorithms are "fully automatic". Probably, it is self-configuring? It seems like the authors developed a self-configuring bersion of FINCH.

It all tables, it is not clear which kind of values (which measure) is given. The reader must guess. The same for values 166,1000,1166 in Table 4.

 

However, in general, the author presented an original and efficient approach. The paper can be published after the correction.

 

Author Response

Please see the pdf file

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper introduces a fully automatic clustering threshold selection to classic agglomerative clustering. It is out of question that the article is original and I support publishing it after some minor corrections:

The abbreviations should be explained right on the first use (for example FINCH and ECT in abstract,etc..).

Figure 1 should not be in the introduction section (we don’t know at that time what is IJB-B 512). The „footnotes” under each table are too long and should be separated from the title. Figure 2 is not at the right place in the article. On the other hand, Automatic Hierarchical Clustering is denoted by AHC in the text, while in Figure 2 it is denoted by HAC. The concept of ground truth here is not correctly explained. Also the NMI abbreviation is used but not explained till 6.2. (evaluation metrics). Some other unexplaied abbreviations in the introduction: IARPA, BIRCH, MNIST. Section 2 should be titled as Literature. In my oppinion Figure 3 should be inserted after line 195 before conjecture 1. I think it is obvious what CDF denotes but it should be explained anyway. Other abbreviations on Figure 3 (LFW, IJB B 1845) should be explained. Formula (4) is wrong, tau should be switched to tau(w). Later in the article tau(w)/tau(r) is often confused with simple tau, authors should correct it (in line 268,316,under figure 5). Authors described the datasets used in the article in 7.1 (as a part of the result). I think the datasets used should be described before Section 3 (EVT theory) as a separate part of the methodology section. This would solve a lot of interpretation problems. The role of PCA is not quite clear in the process. I think it should also be included in Figure 5 too. Table 1 should be in the results section.

What I really lack is the comparison of the computation times of the different algorithms. I would like to see how "PEACH" performs with respect to computation time.

To sum it up the article uses an „in medias res” style using a lot of concepts and abbreviations which will be only explained/revealed later in the article. Some restructuring and correction is necessary.

Author Response

Please see the pdf file

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

accept

Back to TopTop