Next Article in Journal
Variational Principle and Diverse Wave Structures of the Modified Benjamin-Bona-Mahony Equation Arising in the Optical Illusions Field
Next Article in Special Issue
Evaluating Lean Facility Layout Designs Using a BWM-Based Fuzzy ELECTRE I Method
Previous Article in Journal
A Study on Groupoids, Ideals and Congruences via Cubic Sets
Previous Article in Special Issue
A Spherical Fuzzy Multi-Criteria Decision-Making Model for Industry 4.0 Performance Measurement
 
 
Article
Peer-Review Record

Feature Selection Methods for Extreme Learning Machines

by Yanlin Fu 1,*, Qing Wu 1,2,*, Ke Liu 3 and Haotian Gao 4
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Submission received: 1 August 2022 / Revised: 14 August 2022 / Accepted: 26 August 2022 / Published: 30 August 2022
(This article belongs to the Special Issue Soft Computing with Applications to Decision Making and Data Mining)

Round 1

Reviewer 1 Report (Previous Reviewer 1)

The problem statement is not clear. Is it dimensionality reduction or classification? It can't be both! Otherwise, any reported performance cannot be assessed in isolation, and one cannot attribute the runtime performance (classification vs dimension reduction) to any of the proposed methods. 

In other words, one could use the proposed classification using any other proposed dimensionality reduction methods proposed in the literature. For example, what if authors used their proposed "dimensionality reduction scheme" and used FTSVM for classification?  Therefore, when comparing the classification scheme with that of state-of-the art and showing a superior accuracy, I cannot attribute the higher accuracy to the dimensionality reduction method and/or the proposed classification scheme. 

Moreover, Section 4.1 is absolutely wrong and unscientific. Why does zeroing "Z-axis" matter? why would someone conclude that Figure 4 shows the "correctness" of the proposed dimensionality reduction? What was your fitness function in assessing the dimensionality reduction technique in isolation?

In my opinion, the authors should consider breaking this work into two separate papers and avoid convoluting the two proposed models. This way, they can clearly communicate their contribution to each task separately and provide very specific experiments that support their claim!  

Author Response

      Thank you for your comments and suggestions. We have carefully revised the manuscript according to them, marking the modified or added parts in red. Details are mentioned as follows:

       In 2004, Mangasarian presented a new algorithm called the reduced feature support vector machine (RFSVM), which combined a diagonal matrix E with nonlinear SVM classifiers [20]. Similarly, Bai [21] proposed a wrapper feature selection method that added a diagonal matrix E to a TWSVM (FTSVM), effectively identifying the relevant features and improving the performance of the TWSVM. Both used wrapper methods simultaneously accomplishing the feature selection and classification processes.

       Extreme learning machines (ELMs), first proposed by Huang [14], are learning algorithms for single-hidden-layer feedforward networks (SLFNs) that have attracted significant attention because of their high efficiency and robustness [15]. ELMs randomly generate input bias and weights and obtain outputs by computing the Moore–Penrose generalized inverse matrix H. Nevertheless, according to the results in [16], ELMs have greater generalization ability and higher learning speed than SVMs and TWSVMs.

         This paper proposed two feature selection methods, FELM and FKELM. Both can simultaneously complete the feature selection and classification processes as well. The proposed algorithms have the ability of dimensionality reduction and classification. According to the characters of feature selection, the algorithms can remove redundant features and reduce the input space's dimensionality to enhance the classification performance. Similar to RFSVM and FTSVM, the proposed methods are the complete process.

Section 4.1 shows the performance of dimensionality reduction. A comparison of the z-values before and after reducing the dimensionality using FELM is shown in Fig. 4. It can be seen clearly that the z-values has significant changes after dimensionality reduction. However, the selected dataset we selected has certain contingency and particularity leading to z-values becoming zero. According to your valuable suggestions, we only showed the performance of classification and deleted section 4.1. Accordingly, the abstract and conclusion have been revised.

Author Response File: Author Response.pdf

Reviewer 2 Report (Previous Reviewer 3)

The article has been improved according to the reviewer's suggestions. Hence, I can recommend accepting the article. 

Author Response

Thank you for the recommendation for accepting the paper.

Author Response File: Author Response.pdf

Reviewer 3 Report (New Reviewer)

The research was conducted on the interesting topic of data preprocessing. The authors try to research feature selection algorithms based on an ELM and a KELM (kernel extreme learning machine).

 

It has a logical structure, all necessary sections exist as well.

 

But from my side, I see a few points, which could be updated:

 

1. Introduction should be extended using more clearly the scientific novelty of this paper.

2. Conclusion section should be extended with more detailed limitations of the conducted study explanation, as we saw a wide range of datasets.

3. A lot of references are outdated and unlinked. Please fix it by using 3-5 years old papers in high-impact journals.

Author Response

Thank you for your valuable suggestions. We have cautiously revised the manuscript, marking the revised or added content in red.

  1. The introduction has been extended as follows:

This paper makes the following three contributions:

  • A wrapper feature selection method is proposed for the ELM, called FELM. In FELM, the corresponding objective function and hyperplane are introduced by adding a feature selection matrix, a diagonal matrix with element either 1or 0 to the objective functions of the ELM. FELM can effectively reduce the dimensionality of the input space and remove redundant features.
  • FELM is extended to the nonlinear case (called FKELM) by incorporating a kernel function that combines generalization and memorization radial basis function (RBF) kernels into FELM. FKELM can obtain high classification accuracy, and extensive generalization by fully applying the property of memorization of training data.
  • A feature ranking strategy is proposed, which can evaluate features based on their contributions to the objective functions of the FELM and FKELM. After obtaining the best matrix E, the resulting methods significantly improve the classification accuracy, generalization performance, and learning speed.

 

  1. The conclusion section has been extended with more detailed limitations of the conducted study explanation:

In general, ELM has better generalization capacity and higher efficiency than traditional SVM. In order to improve efficiency and classification accuracy, in this study, two new algorithms FELM and FKELM were proposed, where they used a feature ranking strategy and a memorization-generalization kernel respectively. Both algorithms can be applied to datasets with small or medium sample sizes. FELM and FKELM can simultaneously complete the feature selection and classification processes, improving their training efficiency.

Experiments on artificial and benchmark datasets demonstrated that the proposed approaches have higher classification accuracy and higher learning speed than RFSVM and FTSVM. According to the Friedman statistical method and the Nemenyi classification performance test, FKELM is ranked first, followed by FELM, RFSVM, and FTSVM. Based on these ranks, FKELM exhibits significantly better performance than FTSVM, while FELM has slightly better performance than RFSVM and significantly better performance than FTSVM. In most cases, FKELM is the most accurate in the classification performance, whereas FTSVM is the worst.

By comparing the classification accuracy on ultra-high-dimension datasets, for example, Colon and Leukemia datasets, it was concluded that FELM and FKELM have slightly lower performance than RFSVM. As seen in table 5, it can be obtained that FELM and FKELM can complete the classification process at a significantly high learning speed. Therefore, the focuses of our future research are improving the accuracy and robustness of the proposed algorithms on such ultra-high-dimension datasets and using the feature ranking strategy on other improved ELM models. Furthermore, it should be noted that in this study, we only verified the classification accuracy. In future studies, we will also attempt to verify the regression capacity of the proposed algorithms.

  1. The references have been updated for the last five years.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report (Previous Reviewer 1)

After reviewing authors comments and submitted changes, I would "Accept in present form".

 

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Summary: 

Authors are making the case that given superior generalization capability of ELM, this new learning paradigm may be effective in feature selection. The basis of their claim (as pointed in line 74) is ELM can outperform SVM and since SVM can be used for feature selection, it only makes sense for one to wonder to investigate ELM for feature selection task as well.

 

Comments:

    1. (1) Writing requires a lot of improvement. There are many language mistakes and typos.  

    1. (a) Between 37 to 49, for example, the authors are throwing in a lot of terms without defining them. For example, what do mean by “Wrapper”? You need to define any technical term used and provide citation for further study by the reader. 

    1. (b) In the same paragraph the sentences are not cohesive. 

    2.  
    1. (2) Figure 1 is cited without explaining what each box is about and why that is important/related to the content of this paper? 

    2.  
    1. (3) Why compare ELM with SVM in terms of feature selection/extraction and no other more sophisticated deep learning approaches such as deep generative models (e.g., deep autoencoders)? This needs to be addressed in the introduction section as well when authors are giving motivations for their work. 

    2.  
    1. (4) Authors refer to “Wrapper-type feature selection” method over and over again. They must clearly define what to do they mean by “Wrapper” and provide necessary citations.  

     

    1. (5) I find “Introduction” section odd. Usually, in the Introduction section one provides motivation for their work and does not review the literature. Reviewing the literature is reserved for “Related Works” section. I would strongly recommend rewriting these two sections to provide a clearer motivation for your work without using many technical terms and survey the existing work along with “ELM” learning algorithm in Section 2.  

    1. Annotation in line 103 is very confusion. Use both superscript and subscript to distinguish between instance id and feature ids. 

    1. (a) What is “T”?  

    1. (b) How is “t” is different than “x”? How are they related?   

    1. (c) Section 2.1 is not useful at a lot. It is very confusion and needs to be rewritten! 

    1.  
    2. (7) Section 2.2 needs to be rewritten. Many languages problem that makes this section incomprehensible. Authors should make efforts to make it extremely clear how this section is related to their work!  

    1.  
    2. (8) Section 3.0 many language-problems throughout this section that make it incomprehensible. Moreover, authors should make the effort to explain their approach verbally along with mathematical notations instead of just dumping the formulas and leaving it to the reader to figure it out. I really could not understand how they are performing feature selection!  

    1.  
    2. (9) Fundamental problem in 4.1: Why use “artificial dataset”? No argument provided. How is efficacy of the proposed dimension reduction measured? Proper efficacy can be measured by showing how it improves classification, for example. There is no justification for the used dataset and the verification process.  

     

    1. (10) In Figure 8 and 9, I can see proposed feature-reduced ELM classifiers are compared against SVMs. Authors need to show how their approach is compared against plain ELM. What if the gain is marginal? Then as a user, I can only use ELM.  

    1.  
    2. (11) Where is the conclusion of the paper? What are the possible unanswered questions that one should consider?

Comments for author File: Comments.pdf

Reviewer 2 Report

The manuscript presents a method for feature selection using extreme learning machines (ELM).

The authors mention that ELMs are very popular and feature selection methods specifically tailored for ELMs should be investigated. However, I feel that this problem has been investigated in the past (based on a simple search on Google Scholar using "feature selection extreme learning machine" keywords). The authors mention in the Related Work section several methods for feature selection using SVMs but no method method for feature selection using ELMs (although there are many works in scholar dealing with it).

Therefore, it is not clear to me whether the 1st out of the 3 mentioned contributions of the paper is indeed a contribution.

The 2nd mentioned contribution is (based on the authors' claims) similar to what have been proposed in the past for SVMs. I don't understand why applying the same technique for ELMs is different.

To summarize, I don't see what is the contribution of this study. The authors should elaborate more and compare their methodology against other feature selection techniques tailored for ELMs. Explain what is the novelty of this study by showcasing why this method is different than other feature selection methods for ELMs.

The second major drawback of this study is the experimental protocol. It is not clear how the authors split the data in training and testing sets. Also, it is not clear whether or not they use a validation set for setting the proposed method's hyperparameters. For example, in Figure 4, what is the y-axis? Is it classification accuracy on training, testing or validation (if any) set? Similar for Figure 5. Without knowing how the dataset was treated and thus the accuracy numbers are obtained, it is impossible to judge the correctness of the experimental framework and the validity of the results.

In Section 3 I cannot understand the argument for evaluating the method's performance on dimensionality reduction (vectors along all axes are randomly generated). Also, I feel that referring to Fig.2 is a typo. Figure 3, however, seems to be broken (there is no Fig.3(c)).

Reviewer 3 Report

The authors need to take in consideration the following suggestions:

1.     The introduction needs to be improved by discussing current articles because only a few new articles have been included in it.

2.     The contribution is not clear.

3.     A new section named “results discussion” needs to be added in order to condense all obtained results. I recommend presenting a table where the qualitative and quantitative features of your proposal and the other proposals reviewed in order to show the main advantages and disadvantages of your proposal.

 

4.     The conclusion needs to be improved, adding quantitative results not only qualitative results. In addition, it is important to mention, what is the next with the investigation?

Back to TopTop