Next Article in Journal
Fiducial Inference on the Right Censored Birnbaum–Saunders Data via Gibbs Sampler
Next Article in Special Issue
Refined Mode-Clustering via the Gradient of Slope
Previous Article in Journal
Weighted Log-Rank Statistics for Accelerated Failure Time Model
 
 
Article
Peer-Review Record

Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering

Stats 2021, 4(2), 359-384; https://doi.org/10.3390/stats4020024
by Manabu Ichino 1,*, Kadri Umbleja 2 and Hiroyuki Yaguchi 1
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Stats 2021, 4(2), 359-384; https://doi.org/10.3390/stats4020024
Submission received: 30 March 2021 / Revised: 22 April 2021 / Accepted: 12 May 2021 / Published: 18 May 2021
(This article belongs to the Special Issue Recent Developments in Clustering and Classification Methods)

Round 1

Reviewer 1 Report

The submitted paper contain an interesting  proposal for unsupervised classification of histogram valued multivariate data

The paper is well written. The procedure is based on quantile representation over discrete grids for the histograms. Methodologies for classifying multivariate imprecise data are demanded and, therefore, possible answers for this problem should me considered.

Main concerns

The Introduction should include much more detail about available procedures for classifying imprecise data, as interval valued data or histogram valued data, with detailed information about their strengths and weakness. By taking account these weakness of the available procedures, it would appear in a natural way justification for new proposals, as the included in this work.

Authors are not exploring, for representing histograms, the possibility of using the quantile function, the whole quantile function, instead of this discrete grid based approximation. In this way, Wasserstein distance would appear as the natural dissimilarity for the discrepancy between histograms, and the proposed distance, in this current submission, would be an approximation of it, at least in the univariate level. In this way, Wasserstein baricenters could be defined for sumarizing the new clusters created joining multivariate histograms. There are recent developments around Wasserstein distance, which is a hot topic in this momment. I understand that the current submission includes a whole proposal, but at least conections with these very related technologies should be mentioned, and a systematic check in the published literature about methodologies for imprecise data based on Wasserstein distances.

As other approaches were not detailed in the introduction, the submitted manuscript didn’t include any comparison with other approaches in the included artificial and real data examples. But, after identifying other available methodologies for classifiying imprecise data in the introduction, these comparison should be included.

 

Minor things

Related with typos

First paragraph in the introduction:  ‘over view’  overview 

First paragraph in the discussion: ‘seoection’   selection

Author Response

Please see attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

The topic of the authors' research is very actual since the problem of cluster structure formation in terms of qualitative and quantitative criteria has not unique solution nowadays. The authors have proposed one of the possible solutions of this problem.

 

The manuscript is well structured, it contains all sections for this type of publication. The abstract briefly reflects the content of the manuscript. However, to my best mind, in this case, it is necessary to compare the proposed technique with other effective clustering techniques in this subject area.

  1. For example, density-based DBSCAN and OPTICS clustering algorithms allow us to allocate the complex shape clusters based on the density of the objects distribution within clusters. These algorithms form the cluster structure without the request of the number of clusters initialization. For this reason, I have a question: Why agglomerative hierarchical clustering algorithm has been used? I think that authors should substantiate this.
  2. The authors have used compactness as the main criterion for cluster structure formation. To my mind. it is necessary to compare the effectiveness of this criterion with other well-known criteria which are used in this subject area.
  3. The authors confirm the effectiveness of the proposed technique using synthetic datasets with a limited quantity of objects. I think that it is necessary also to confirm the proposed technique effectiveness using more complex datasets.

Author Response

Please see attachment. 

Author Response File: Author Response.pdf

Reviewer 3 Report

In this article, the authors present a hierarchical method for analyzing histogram-valued symbolic data by unsupervised feature selection.
First of all, the article is very poorly structured. It is not well organized and well-proof-read, making it hard to understand. The presentation needs considerable improvement.
English usage is too low for a scientific article. There are too many syntax and spelling errors. 
It has not been followed a proper methodology. 
Specifically, the abstract does not have the proper flow. It does not clearly explain the content of the article.
The introduction section is missing relevant references to the problem. The authors have not mentioned the main contribution of the work. 
The relevant literature is incomplete and outdated.
The presented methodology is not supported by research articles and the novelty was unclear. 
The authors missed discussion and comparison with the state-of-the-art.
The graphic illustrations are not properly designed and do not record the required details to the reader. Also, the authors do not provide a proper discussion of the figures.
The results remain without compelling evidence of research novelty.
The experiments analysis and configurations have not been presented in detail for reproducibility.
The discussion part is not presented technically and, also, does not provide future directions and extensions of the study.
The conclusion part is missing.                                                                        In summary, the chosen approach has not yet allowed to reflect the value of the proposed model in experimental research and to emphasize the nature of the application of the solution in practice.

Author Response

Please see attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

My recommendation is to accept it in present form

Reviewer 2 Report

To my best mind, the corrected manuscript can be accepted in its present form. 

Reviewer 3 Report

In the updated version sent by the authors, my suggestions were not implemented.

The authors added only one reference and reinforced section 4.5 "Analysis of the US Weather Data".

The authors should follow the suggested corrections.

First of all, the article is very poorly structured. It is not well organized and well-proof-read, making it hard to understand. The presentation needs considerable improvement.

English usage is too low for a scientific article. 

There are several syntactic, grammatical and punctuation problems. Αslo, there are several lengthy sentences.

It has not been followed a proper methodology. 

Specifically, the abstract does not have the proper flow. It does not clearly explain the content of the article. Specifically, the abstract section should be rewritten differently, following the structure.1)background, 2)motivation, 3) gap challenges, 4) proposed approach, 5) evaluation and results from 6) significance.

The introduction section is missing relevant references to the problem. The authors have not mentioned the main contribution of the work. 

The relevant literature is incomplete and outdated.

The presented methodology is not supported by research articles and the novelty was unclear. The proposed technique using synthetic datasets. That is a big limitation.

The authors missed discussion and comparison with the state-of-the-art.

The graphic illustrations are not properly designed and do not record the required details to the reader. Also, the authors do not provide a proper discussion of the figures.

The results remain without compelling evidence of research novelty.

The experiments analysis and configurations have not been presented in detail for reproducibility.

The discussion part is not presented technically and, also, does not provide future directions and extensions of the study.

The conclusion part is missing.  Present in summary (numerical) the results that emerged. What is their superiority?

I recommend the authors include more up-to-date references and from MDPI as well

In summary, the chosen approach has not yet allowed to reflect the value of the proposed model in experimental research and to emphasize the nature of the application of the solution in practice.

Back to TopTop