Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Normalized Information Criteria and Model Selection in the Presence of Missing Data

Mathematics 2021, 9(19), 2474; https://doi.org/10.3390/math9192474

by Nitzan Cohen and Yakir Berchenko^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Mathematics 2021, 9(19), 2474; https://doi.org/10.3390/math9192474

Submission received: 1 August 2021 / Revised: 6 September 2021 / Accepted: 27 September 2021 / Published: 3 October 2021

(This article belongs to the Special Issue Methods and Applications of Statistics in the Social and Health Sciences)

Round 1

Reviewer 1 Report

In this paper, the authors propose normalized-AIC (Akaike information criterion) and normalized-BIC (Bayesian information criterion) for model-selection when there are missing data. Three extend normalized information-criteria methods for AIC are also proposed, which are: post-selection imputation, Akaike sub-model averaging, and minimum-variance averaging. Experimental results show the better performance of normalized-AIC and normalized-BIC. However, I have some comments for the authors:

Novelty and importance of the target problem should be further emphasized by specific theoretical analysis instead of simply stating the facts and results.
AIC and BIC related methods are introduced in the paper for model selection, but how to choose this kind of criteria and why this one is better than the other in some cases? Please introduce more detailed information of their benefits and comparations.
There are several notations in the article, recommended to provide a separate notation table for easy reference.
Some tables are not clear, especially one with multiple results in them. Please provide the clearly divided format.
The element cubes with different shadows and shapes shown in Figure 2 need to be depicted and compared separately in it.
Maybe the author can read more recent studies regarding sparse matrices analysis methods, matrix factorization, non-negative latent factor analysis, and other related feature extraction techniques to improve the method of this paper. For instance, the following papers are highly relevant: An Instance-frequency-weighted Regularization Scheme for Non-negative Latent Factor Analysis on High Dimensional and Sparse Data. IEEE Transactions on System Man Cybernetics: Systems, 2021, 51(6): 3522-3532; Energy Consumption Prediction of a CNC Machining Process With Incomplete Data, IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 5, pp. 987-1000, May 2021; Randomized latent factor model for high-dimensional and sparse matrices from industrial applications, IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 1, pp. 131-141, January 2019; A parallel matrix factorization based recommender by alternating stochastic gradient decent, Engineering Applications of Artificial Intelligence 25 (7), 1403-1412 37, 2012.

Author Response

Novelty and importance of the target problem should be further emphasized by specific theoretical analysis instead of simply stating the facts and results.

Response: we reedited and clarified the motivation and novelty.
AIC and BIC related methods are introduced in the paper for model selection, but how to choose this kind of criteria and why this one is better than the other in some cases? Please introduce more detailed information of their benefits and comparations.

Response: Please see the third paragraph in the introduction.
There are several notations in the article, recommended to provide a separate notation table for easy reference.

Response: Some of the notations are subsection-specific defined and used in a specific context (and are not “global”), and therefore using a table is less relevant.
Some tables are not clear, especially one with multiple results in them. Please provide the clearly divided format.

Response: Sure, as written in MDPI’s instruction, the paper was prepared in a free-style format, but before publication we will update everything there to the journal’s specific style.
The element cubes with different shadows and shapes shown in Figure 2 need to be depicted and compared separately in it.

Response: Please see what we added to the caption; we are not sure we understand this comment, and we hope the reviewer would find this pleasing (if not, please elaborate your meaning here).
Maybe the author can read more recent studies regarding sparse matrices analysis methods, matrix factorization, non-negative latent factor analysis, and other related feature extraction techniques to improve the method of this paper. For instance, the following papers are highly relevant: An Instance-frequency-weighted Regularization Scheme for Non-negative Latent Factor Analysis on High Dimensional and Sparse Data. IEEE Transactions on System Man Cybernetics: Systems, 2021, 51(6): 3522-3532; Energy Consumption Prediction of a CNC Machining Process With Incomplete Data, IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 5, pp. 987-1000, May 2021; Randomized latent factor model for high-dimensional and sparse matrices from industrial applications, IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 1, pp. 131-141, January 2019; A parallel matrix factorization based recommender by alternating stochastic gradient decent, Engineering Applications of Artificial Intelligence 25 (7), 1403-1412 37, 2012.

Response: We have added all the references reviewer 1 asked.

Reviewer 2 Report

In this paper, the authors propose a modification of both the AIC and the BIC for dealing with incomplete datasets due to missing values. The paper is quite well written and the topic is of interest. The proposed scores perform better than the classical scores. However, I have the following concerns that I hope the authors will address.

I am unsure of the definition of the normalized AIC score. Is it AIC_j/n_j, or (-log(theta_j)+k_j)/n_j? I think that the end of Section 2.1 should be rewritten with greater clarity.
I am also unsure of the suitability of the computational efficiency comparison between the non-normalized and the proposed normalized information criterion methods. Under the assumption of no missingness (after imputation), it is obvious that the size of the dataset with missing values is lower. The true computational efficiency could be estimated by measuring the computational cost (in terms of time or number of operations) of each kind of model comparison score.
In Section 2.2 the authors introduce an unknown value h, that could be estimated, in the authors’ words. I think this term needs greater clarification.

Minor issues

On page 6 the authors state that “which is similar to”, after applying logarithms to an inequality representing model comparison. As the logarithm is a monotone function, would not be this a simple equivalence?
Replace “process” with “processes” when the 3^rd person is used (for instance, “each method processes”).
Replace “yeild” with “yields”, and “yeilding" with “yielding”.

Author Response

I am unsure of the definition of the normalized AIC score. Is it AIC_j/n_j, or (-log(theta_j)+k_j)/n_j? I think that the end of Section 2.1 should be rewritten with greater clarity.

Response: the normalized AIC score is NOT (-log(theta_j)+k_j)/n_j. Rather, it is
(-logLikelihood(theta_j)+k_j)/n_j. And since (-logLikelihood(theta_j)+k_j) is by definition AIC_j then there should be no confusion. The appendix elaborate further.
I am also unsure of the suitability of the computational efficiency comparison between the non-normalized and the proposed normalized information criterion methods. Under the assumption of no missingness (after imputation), it is obvious that the size of the dataset with missing values is lower. The true computational efficiency could be estimated by measuring the computational cost (in terms of time or number of operations) of each kind of model comparison score.

Response: In this section we compare the computational cost of our method to the computational cost of single imputation (in fact, we are not considering all the “costs” of single-imputation, and we are giving it “some slack“ / the benefit of the doubt). The reviewer is right, of course, that the cost following imputation would be larger; however, the question we address there is: will it be larger by just a constant amount? In fact, we show that it can be much greater than that, and that the cost could be exponentially larger (when using imputation instead of our method). We hope this helps, but if we did not understand the comment by reviewer 2 please tell us and explain your suggestion/request.
In Section 2.2 the authors introduce an unknown value h, that could be estimated, in the authors’ words. I think this term needs greater clarification.

Response: we added the following:
“The scalar parameter h is known as the entropy of the distribution g. Estimation of a scalar is generally much simpler than estimation of a distribution, and thus estimating h is easier compared to estimating g. However, here we do not even require estimating h and thus we do not discuss it further; this is left for future developments.”

Minor issues

On page 6 the authors state that “which is similar to”, after applying logarithms to an inequality representing model comparison. As the logarithm is a monotone function, would not be this a simple equivalence?

Response: Yes, this is correct, thank you. We corrected accordingly.
Replace “process” with “processes” when the 3^rd person is used (for instance, “each method processes”).

Response: Thank you. We corrected accordingly.
Replace “yeild” with “yields”, and “yeilding" with “yielding”.

Response: Thank you. We corrected accordingly.

Round 2

Reviewer 2 Report

The authors have addressed the minor issues I pointed out, so I am in favor of the acceptance of the paper.

Article Menu

Normalized Information Criteria and Model Selection in the Presence of Missing Data

Further Information

Guidelines

MDPI Initiatives

Follow MDPI