Next Article in Journal
Exploring the Eating Disorder Examination Questionnaire, Clinical Impairment Assessment, and Autism Quotient to Identify Eating Disorder Vulnerability: A Cluster Analysis
Previous Article in Journal
Semantic Predictive Coding with Arbitrated Generative Adversarial Networks
 
 
Article
Peer-Review Record

Beyond Cross-Validation—Accuracy Estimation for Incremental and Active Learning Models

Mach. Learn. Knowl. Extr. 2020, 2(3), 327-346; https://doi.org/10.3390/make2030018
by Christian Limberg 1,2,*, Heiko Wersing 2 and Helge Ritter 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Mach. Learn. Knowl. Extr. 2020, 2(3), 327-346; https://doi.org/10.3390/make2030018
Submission received: 14 July 2020 / Revised: 8 August 2020 / Accepted: 24 August 2020 / Published: 1 September 2020

Round 1

Reviewer 1 Report

The manuscript presents interesting results with novel materials to be published. The minor revision will be helpful:

(1) Too many "can" are utilized, which is not necessary in describing the turth.

(2) too may "we" are utilized. In my opinion, passive sentences are better.

(3) addtional remarks are expected in order to discuss the special contribution or possible applications.

 

Author Response

The manuscript presents interesting results with novel materials to be published. The minor revision will be helpful:

==> Thank you for your appreciation of the contents of the paper and your useful suggestions to improve the manuscript further:

(1) Too many "can" are utilized, which is not necessary in describing the turth.
(2) too may "we" are utilized. In my opinion, passive sentences are better.

==> We substituted a large number of the "can" and replaced the majority of the "we" with passive voice in the entire article.

(3) additional remarks are expected in order to discuss the special contribution or possible applications.

==> We added another paragraph at the end of the conclusion (Line 437) that discusses possible applications and further possible future work related to the applications.

Reviewer 2 Report

A novel approach (CGEM) to predict the accuracy of a classifier is introduced.

  • The importance of estimating classification accuracy in incremental and life long learning tasks is emphasized. The developed approach is demonstrated to outperform the existing techniques in predicting the accuracies of incremental learning on analytical and real-world data sets. 
  • The work presented in the paper is extensive and addresses all the nitty-gritty of the proposed method.
  • Authors have evaluated their methodology for 4 different classifiers and the results show that it works for all of them. This ensures that the method is not for a specific classifier. 
  • The method proposed by the authors involves regression. Authors have reported that they tried various regression methods and then selected one (which is not DNN). This is good as some regression tasks can be handled even without DNNs.  
  • Authors have tried their method on artificial datasets as well as a few real-world datasets, and their method works well for both of them. 

Overall the proposed method is evaluated in a good manner and works well for all the considered scenarios.

 

However, the reviewer has the following concerns against publication of the manuscript in the present form. 

 

Major concerns:

The paper is quite weak in the literature review, especially on incremental learning, which is the area of focus in the paper

 

Evaluation of the proposed method is done on all possible cases/scenarios and the method shows better results than conventional methods. (However, how the method performs compared to the best parameters in conventional methods is not addressed. example: comparing with cross-validation is done considering 5-fold cross-validation. k-fold cross-validation with k=5 might not be the best cross-validation). 

 

A comparison of computational cost in training and applying the CGEM vs using a conventional technique such as cross-validation is useful, which is missing. 

 

 

Minor concerns:

(Section 3) The language used in explaining the approach is unnecessarily complex, which makes it difficult to comprehend the proposed method. 

There are a few grammatical and spelling errors in the writing. 

Some unimportant results are presented (Figure 6), which can be moved to the appendix. 

 

 

 

Author Response

A novel approach (CGEM) to predict the accuracy of a classifier is introduced.

The importance of estimating classification accuracy in incremental and life long learning tasks is emphasized. The developed approach is demonstrated to outperform the existing techniques in predicting the accuracies of incremental learning on analytical and real-world data sets.
The work presented in the paper is extensive and addresses all the nitty-gritty of the proposed method.
Authors have evaluated their methodology for 4 different classifiers and the results show that it works for all of them. This ensures that the method is not for a specific classifier.
The method proposed by the authors involves regression. Authors have reported that they tried various regression methods and then selected one (which is not DNN). This is good as some regression tasks can be handled even without DNNs.
Authors have tried their method on artificial datasets as well as a few real-world datasets, and their method works well for both of them.

Overall the proposed method is evaluated in a good manner and works well for all the considered scenarios.

However, the reviewer has the following concerns against publication of the manuscript in the present form.

==> Thank you for the appreciation of the contents of the paper and your useful suggestions to improve the manuscript further. Regarding your concerns:

 

A) Major:

1. The paper is quite weak in the literature review, especially on incremental learning, which is the area of focus in the paper

==> We have extended the literature review for incremental learning (Line 91).
We also included a small paragraph about active learning literature reviews and included recent papers.

2. Evaluation of the proposed method is done on all possible cases/scenarios and the method shows better results than conventional methods. (However, how the method performs compared to the best parameters in conventional methods is not addressed. example: comparing with cross-validation is done considering 5-fold cross-validation. k-fold cross-validation with k=5 might not be the best cross-validation).

==> We added an evaluation of the parameters of both baseline models (CV and ITT) with respect to changes in the accuracy estimation error (Line 336 and Figure 7).
Changing the number of folds for CV does not seem to have a huge effect in estimating the accuracy. However, for ITT the window size parameter is in fact crucial. If it is too small there can be noise and averaging a small window results in coarser granularity, but on the other hand, if the window is too big, there is too much delay for a precise accuracy prediction. Our window size choice of 30 seems to be a good trade-off (see Fig. 7).

3. A comparison of computational cost in training and applying the CGEM vs using a conventional technique such as cross-validation is useful, which is missing.

==> We introduced a distinct subsection (Subsection 5.4) for discussing computational cost with respect to computation time and memory for CGEM, CV and ITT.


B) Minor concerns:

1. (Section 3) The language used in explaining the approach is unnecessarily complex, which makes it difficult to comprehend the proposed method.

==> We simplified the wording of section 3 for a better understanding.

2. There are a few grammatical and spelling errors in the writing.

==> We have carefully proofread everything and hope to now have eliminated all errors.

3. Some unimportant results are presented (Figure 6), which can be moved to the appendix.

==> We moved Figure 6 and 9 to the appendix for a better reading flow.

Reviewer 3 Report

The article presents a novel semi-supervised accuracy estimation approach. The Configram Estimation approach presents interesting results.
In general, the paper is presented in a manner clear. But the article presents several experiments with several datasets, which does challenging to remember the dataset used in each moment.
Some notes:
1.- In line 147. Are the parameters related to the working classifier or the accuracy estimator?
2.- Figure 4. In the case of GPR, it is clear that the results show a clear example of overfitting. But for me, the case of mlp is strange since the error is bigger using the train set than the test set.
3.- In line 267. For the analytical experiments, Q= 10 classifiers are used. What? And in the case of real-world experiments Q=6...
It could be interesting to know what classifiers are used and what six are selected for the case of real-world experiments.
4.- To use the proposed approach, how the database should be divided? What percentage of samples has to be unlabel?

Author Response

The article presents a novel semi-supervised accuracy estimation approach. The Configram Estimation approach presents interesting results.
In general, the paper is presented in a manner clear. But the article presents several experiments with several datasets, which does challenging to remember the dataset used in each moment.
Some notes:

==> Thank you for your positive feedback and your valuable notes to improve the manuscript further:

1.- In line 147. Are the parameters related to the working classifier or the accuracy estimator?

==> The parameters are related to the working classifier, this is not a mistake.

2.- Figure 4. In the case of GPR, it is clear that the results show a clear example of overfitting. But for me, the case of mlp is strange since the error is bigger using the train set than the test set.

==> You are completely right, mlp had slightly better test accuracy. However the difference is so small that it is not significant and also since mlp performed worse compared to other approaches we did not further dig into this.

3.- In line 267. For the analytical experiments, Q= 10 classifiers are used. What? And in the case of real-world experiments Q=6...
It could be interesting to know what classifiers are used and what six are selected for the case of real-world experiments.

==> Q determines the number of instances of the same classifier that were trained for collecting more training data (configrams) for the CGEM regression model. Since the real world experiments were quite complex (because we have several high dimensional data sets, different classifier types, each condition is repeated several times and also we have different querying techniques) we limited the number of classifier instances to 6 to have a feasible training time for our evaluation. Thus, this has been a pragmatic choice and we tried to make this more clear in Section 3.

4.- To use the proposed approach, how the database should be divided? What percentage of samples has to be unlabel?

==> This is a very good point. We complemented our evaluation by an analysis of different train/test splits (Line 343 and Figure 8). We found out that CGEM is very stable also when the training set is reduced to up to 10% of the data set. Thank you for this recommendation, we think that this finding motivates our approach even further.

 

Round 2

Reviewer 2 Report

All comments are addressed properly. 

Recommended for publication. 

Back to TopTop