Next Article in Journal
Rapid Tooling for Microinjection Moulding of Proof-of-Concept Microfluidic Device: Resin Insert Capability and Preliminary Validation
Next Article in Special Issue
Exploring the Role of Self-Adaptive Feature Words in Relation Quintuple Extraction for Scientific Literature
Previous Article in Journal
Research on the Correlation of Safety Risk of Railway Bridge Construction Based on Meta-Analysis
Previous Article in Special Issue
Application of Remote Sensing and Geographic Information System Technologies to Assess the Impact of Mining: A Case Study at Emalahleni
 
 
Article
Peer-Review Record

A New Permutation-Based Method for Ranking and Selecting Group Features in Multiclass Classification

Appl. Sci. 2024, 14(8), 3156; https://doi.org/10.3390/app14083156
by Iqbal Muhammad Zubair 1, Yung-Seop Lee 2 and Byunghoon Kim 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Appl. Sci. 2024, 14(8), 3156; https://doi.org/10.3390/app14083156
Submission received: 9 March 2024 / Revised: 3 April 2024 / Accepted: 6 April 2024 / Published: 9 April 2024
(This article belongs to the Special Issue Machine-Learning-Based Feature Extraction and Selection)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Upon careful examination of your manuscript, I am convinced of its considerable potential and value. However, to elevate it to a superior level, improvements in several areas are necessary:

1.       The conclusion of the abstract does not clearly demonstrate the advantages of the proposed method over existing approaches. I recommend more precise distillation and emphasis of the manuscript's core contributions in this section, to make it more engaging and attractive to readers.

2.       The descriptions on combining group features in Section 4.2 on page 8, should be included in Section 3 on page 4 to improve the manuscript's readability.

3.       I recommend that the authors to review recent literature and describe this in the Introduction and Related Work sections. Careful adherence to citation standards is essential, especially given the repeated citation of Reference [1] throughout the paper.

4.       Figure 2 seems to insufficiently illustrate the permutation schemes for individual and group features, potentially hindering readers' comprehension of the method. A revision of Figure 2 is suggested.

5.       The layout of Figure 3 is not consistent with the other figures, necessitating an adjustment.

6.       The manuscript fails to discuss the elements G1, G2, G3, G4, and G5 shown in Figure 3. An explanation of these elements is necessary.

Comments on the Quality of English Language

no

Author Response

Original Manuscript ID: applsci-2931928              

Original Article Title: “A new permutation-based method for ranking and selecting group features in multiclass classification

To: Editor

Re: Response to reviewers

Dear editor,

Thank you for allowing a resubmission of our manuscript, with an opportunity to address the reviewers’ comments.

We are uploading (a) our point-by-point response to the comments (response to reviewers), and (b) an updated manuscript with yellow highlighting indicating changes.

Best regards,

Byunghoon Kim

Reviewer#1: Upon careful examination of your manuscript, I am convinced of its considerable potential and value. However, to elevate it to a superior level, improvements in several areas are necessary:

Author response:  Thank you for your valuable comment. Following the reviewer’s suggestions, we have made improvements in several areas of this study.

Reviewer#1, Concern#1:  The conclusion of the abstract does not clearly demonstrate the advantages of the proposed method over existing approaches. I recommend more precise distillation and emphasis of the manuscript's core contributions in this section, to make it more engaging and attractive to readers.

Author response: Thank you for the valuable comment. Following the reviewer’s suggestion, we have revised the conclusion of the abstract to emphasize the advantages of the proposed method over existing methods and to highlight the core contribution of this study.

“The results highlight the capability of the proposed method, which not only selects a few significant group features but also provides the relative importance and ranking of all group features. Furthermore, the proposed method outperforms the existing method in terms of accuracy and F1 score.”

We also updated the introduction of the abstract by adding shortcomings of the existing method.

“Existing group feature selection methods only select a few of the most important group features, without providing insight into the relative importance of all group features.”

Reviewer#1, Concern # 2: The descriptions on combining group features in Section 4.2 on page 8, should be included in Section 3 on page 4 to improve the manuscript's readability.

Author response:  Thank you for the comment. As per the reviewer’s recommendation, we have incorporated the description of combining group features into the first paragraph of section 3.

“In this section, we present a novel permutation-based method for group feature ranking and selection. The method involves several key steps. First, the datasets that were used for this study have high dimensionality. In each dataset, many features were correlated or relevant to each other and had a common effect on the target variable. Therefore, the relevant features form groups or clusters. We used the “agglomerative hierarchical clustering” technique to find these groups in high-dimensional datasets. This method helped us group similar features so that we could observe the patterns more clearly. Despite splitting datasets into groups, the dimensionality within each group remained high. Hence, we applied the lasso algorithm to eliminate irrelevant and redundant individual features, ensuring that only the most informative features were retained for further analysis.”

Reviewer#1, Concern # 3: I recommend that the authors to review recent literature and describe this in the Introduction and Related Work sections. Careful adherence to citation standards is essential, especially given the repeated citation of Reference [1] throughout the paper.

Author response:  Thank you for the insightful feedback. In response to the reviewer’s suggestion, we have added recent literature to the manuscript.

“Feature selection is an important task for high-dimensional low-sample-size (HDLSS) datasets that are prevalent across various domains, such as text recognition, finance, and gene expression microarrays. HDLSS datasets are characterized by many features relative to the limited number of available samples. For instance, in microarray datasets, the number of features (representing genes) is often greater than thousands, whereas the sample size remains substantially small [1].

Reducing dimensionality not only decreases model complexity but also improves model prediction accuracy [2].

Group-feature selection methods aim to discard irrelevant and redundant group features which caused the decrease in classification accuracy while retaining only informative group features, thereby enhancing the computational efficiency and classification performance [5].

Feature ranking and selection are crucial areas in machine learning and data mining [15].

However, IG is biased toward features that have a large number of different values [28]”.

To prevent the repetition of the old reference [1], we have replaced it with new references. Specifically, recent reference [1] and reference [5] have been updated to old reference [1]. Additionally, we have removed the old reference [1] from the second paragraph of section 2.2.

“Therefore, only a few methods, have demonstrated the ability to select group features based on their rankings.”

  1. Cavalheiro, L.P.; Bernard, S.; Barddal, J.P.; Heutte, L. Random forest kernel for high-dimension low sample size classification. Statistics and Computing 2024, 34, 9.
  2. Jiménez, F.; Sánchez, G.; Palma, J.; Miralles-Pechuán, L.; Botía, J.A. Multivariate feature ranking with high-dimensional data for classification tasks. IEEE Access 2022, 10, 60421-60437.
  3. Wang, Y.; Li, X.; Ruiz, R. Weighted general group lasso for gene selection in cancer classification. IEEE transactions on cybernetics 2018, 49, 2860-2873.
  4. Theng, D.; Bhoyar, K.K. Feature selection techniques for machine learning: a survey of more than two decades of research. Knowledge and Information Systems 2024, 66, 1575-1637.
  5. Göcs, L.; Johanyák, Z.C. Feature Selection with Weighted Ensemble Ranking for Improved Classification Performance on the CSE-CIC-IDS2018 Dataset. Computers 2023, 12, 147.

Reviewer#1, Concern # 4: Figure 2 seems to insufficiently illustrate the permutation schemes for individual and group features, potentially hindering readers' comprehension of the method. A revision of Figure 2 is suggested.

Author response:  Thank you for the comment. Following the review’s suggestion, we have updated the permutation scheme figure to provide greater emphasis on individual and group features. This modification aims to facilitate a clearer understanding of the concept of individual and group permutation for readers.

Reviewer#1, Concern # 5: The layout of Figure 3 is not consistent with the other figures, necessitating an adjustment.

Author response: Thank you for the valuable comment. As per the review’s suggestion, we have changed the layout of Figure 3 and made it consistent with other figures.

                                   (a)

                                   (b)

                                  (c)

 

Figure 4. Relative importance among group features of the GLA-BRA-180 (a), CLL-SUB-111 (b), and TOX-171 datasets (c).

Reviewer#1, Concern # 6: The manuscript fails to discuss the elements G1, G2, G3, G4, and G5 shown in Figure 3. An explanation of these elements is necessary.

Author response:  Thank you for your valuable comment. we changed the names of the elements G1, G2, G3, G4, and G5 to L1, L2, L3, L4, and L5. Additionally, we explained these elements in section 4.2.

“Thus, we determined the important group features. Figure 4 shows the relative importance of all the group features, where L1, L2, . . ., L5 represent the group features. With the help of relative importance, these group features can be ranked based on their importance. In Figure 4(a), L1 has the largest importance, followed by L4, L2, L3, and L5. In Figure 4(b), L1 emerged as the most important group feature, followed by L3, L4, L2, and L5. Similarly, in Figure 4(c), L3 demonstrated the largest importance, followed by L5, L1, L4, and L2.”

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Reviewer #

This study introduces a deployed model of the group-lasso technique to address the challenge of determining the relative importance of multiclass datasets. The authors employed the prediction score technique and assessed the impact of participant features in eliminating irrelevant and redundant features. The findings of this study have the potential to enhance the performance of machine learning algorithms in high-dimensional data classification. However, certain issues need to be taken into account, as outlined below:

In abstract:

(1) The authors should highlight the critical shortcomings of existing feature group selection models to underscore the novelty of their study. They should go beyond merely acknowledging the inherent problem of feature selection and provide a clear comparison with current models to demonstrate their unique contribution.

 

(2) The study should clearly delineate which parameters have been enhanced in comparison to previous methods, such as improvements in accuracy, speed, precision, or any other relevant metrics.

In introduction:

(3) I recommend incorporating recent studies to substantiate the novelty of the proposed idea. The authors predominantly relied on older references to elucidate the concept, which may not adequately showcase the current relevance and originality of the idea.

 

(4) Please review the concept of "To this end, we propose a new group feature importance metric that measures the increase in the prediction error of the classification model after permuting the values of a group of features together, thereby disrupting the relationship between the group feature and the target." as it appears to potentially reflect an incorrect understanding or formulation.

In related work:

(5) Kindly consider including a brief introduction to the concepts discussed in the section before delving into the title of the subsection.

(6) I suggest replacing outdated references with more recent studies in the field to enhance the relevance and currency of the research.

(7) Please highlight the significance of the proposed idea in advancing the framework established by Zubair and Kim [1].

In method:

(8) Some problems require further clarification, as outlined below:

- When selecting a feature group, it may necessitate altering the targets to eliminate certain individual features. Please provide more clarity on this issue.

- Kindly elaborate on the mechanism behind the prediction's score and the training accuracy. The score could potentially rise by restricting contributors in the prediction process. However, this restriction may result in a loss of accuracy in decision-making at times.

(9) Please define W(t) (as depicted in Equation (3)) similarly to other parameters.

(10) For Algorithm 1, I suggest refining the algorithm's framework to adhere to a standard format. Consider incorporating clear start and end lines for procedures, as well as handling other similar cases for presenting the algorithm effectively.

In results:

 

(11) The results presented in Table 4 indicated that the outcomes achieved by the Lasso method closely resembled those obtained through the proposed approach in this study. However, further clarification is needed to justify the significance of minimal errors in generating variations in the results. The current explanations provided do not sufficiently address this issue.

Comments for author File: Comments.pdf

Author Response

Please find the attached file.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The authors of the article propose a new method for group feature selection based on permutations, which allows for systematically computing relative importance scores for all group features and ranking them accordingly. This method is used to select the most important group features. The authors demonstrate the effectiveness of their method on high-dimensional real-world data with a limited number of samples. They show that aggressive dimensionality reduction at the early stages can improve results. The proposed method also facilitates the interpretation of group features and demonstrates competitiveness in classification tasks compared to existing methods. Additionally, the authors note that their method can be applied to data with larger sample sizes and various data types, and they suggest additional directions for future research.

1.      The title, abstract, and introduction were found to be appropriate. The presentation of the proposed innovations and the overall organization of the article, especially in the introduction, are positive. However, the introductory part of the paper could be improved by introducing quantitative indicators of the benefits obtained.

2.      The authors note that from their analysis of the literature, they have noticed that methods for ranking and selecting group features, especially for high-dimensional datasets, are presented in a limited number. However, the authors should more thoroughly analyze the current state of research on the problem under consideration. In particular, they should analyze additional studies: https://digitum.um.es/digitum/bitstream/10201/121146/3/Multivaria..s.pdf; https://doi.org/10.3390/computers12080147; https://doi.org/10.1016/j.csda.2019.106839; https://doi.org/10.1109/ACCESS.2019.2947701 and others.

In the analysis of the results obtained, it would also be good to provide a comparison of the quantitative indicators of the effectiveness of the proposed method with the existing ones. What has been improved and by how much?

3.      I recommend the authors put a chart in the paper to illustrate the presented method. This will help the reader to understand the paper easily.

4.      In this paper, three datasets were used to numerically analyze the results obtained using the proposed method. It would be nice to add links to the corresponding datasets.

5.      A detailed analysis of the results in Table 4 should be added.

 

Comments for author File: Comments.pdf

Author Response

Please find the attached file.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have carefully reviewed your paper and I believe it has great potential.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors tried diligently to address the concerns raised.

Reviewer 3 Report

Comments and Suggestions for Authors

Accept in present form

Back to TopTop