Next Article in Journal / Special Issue
dexter: An R Package to Manage and Analyze Test Data
Previous Article in Journal
Adaptation and Validation of the Arabic Version of the University Student Engagement Inventory (A-USEI) among Sport and Physical Education Students
Previous Article in Special Issue
COPS in Action: Exploring Structure in the Usage of the Youth Psychotherapy MATCH
 
 
Article
Peer-Review Record

Exploring Approaches for Estimating Parameters in Cognitive Diagnosis Models with Small Sample Sizes

Psych 2023, 5(2), 336-349; https://doi.org/10.3390/psych5020023
by Miguel A. Sorrel 1,*, Scarlett Escudero 2, Pablo Nájera 1, Rodrigo S. Kreitchmann 3 and Ramsés Vázquez-Lira 2
Reviewer 1:
Reviewer 2:
Psych 2023, 5(2), 336-349; https://doi.org/10.3390/psych5020023
Submission received: 24 March 2023 / Revised: 20 April 2023 / Accepted: 25 April 2023 / Published: 27 April 2023
(This article belongs to the Special Issue Computational Aspects and Software in Psychometrics II)

Round 1

Reviewer 1 Report

The paper does not introduce any new ideas. It is a study of the relative performance of various estimation techniques for the DINA model, comparing small and large sample sizes and involving both simulated data and one popular real life data set. As such, it can be useful, given the bewildering abundance of techniques and packages for the DINA and the many other CDM models.

The general conclusion is that the choice of method does not matter much when the sample is large, while MCMC methods are preferable with small samples. This is not unexpected but there are other thought-provoking findings as well. For example, some models specially developed for small samples (NPC, R-DINA) seem to emerge as winners when using simulated data, and losers when applied to real data -- a fact that warrants some discussion in my view.

There is some confusion about the relation between Bayes modal estimation and MCMC. Of course, given a distribution of MCMC samples, one can call their mean the EAP estimate, and their mode the MAP. But what is usually meant by BM is an optimization technique that regularizes ML with a subjective prior while MCMC is based on sampling, with the general idea that starting from a misspecified prior should not matter much except perhaps prolong convergence. Using a bad prior with BM matters dramatically.

Generally, the explanation of MCMC methods on page 3, especially of Gibbs sampling, is very superficial and should be improved. Another point that has been glossed over is HMC -- the authors mention its potential advantages but later they say just that they tried it out and were satisfied it produced results similar to those from GS. This is fine, but was it faster, slower, easier, more difficult?

Written in beautiful English, the paper is relatively short and can be extended in the directions specified above.

     

Author Response

R1.C1. The paper does not introduce any new ideas. It is a study of the relative performance of various estimation techniques for the DINA model, comparing small and large sample sizes and involving both simulated data and one popular real life data set. As such, it can be useful, given the bewildering abundance of techniques and packages for the DINA and the many other CDM models.

Response: We thank the reviewer for his/her positive comment and careful review. As pointed out by the reviewer and stated in the last paragraph of the introduction, the goal is not to introduce a new method, but to specify the best way to proceed according to the sample size in a simple manner that is useful for applied researchers.

 

R1.C2. The general conclusion is that the choice of method does not matter much when the sample is large, while MCMC methods are preferable with small samples. This is not unexpected but there are other thought-provoking findings as well. For example, some models specially developed for small samples (NPC, R-DINA) seem to emerge as winners when using simulated data, and losers when applied to real data -- a fact that warrants some discussion in my view.

Response: Thank you for the opportunity to ellaborate more on this. As the reviewer notes, R-DINA and NPC performed well in the simulation study and worse in terms of classification agreement in the empirical study. It should be noted that in the simulation study we used a guessing and slip equal to 0.20 for all items. This would be equivalent to an R-DINA model whose φ parameter was equal to 0.20. That is, it can be understood that the R-DINA model is actually the generating model of the data since DINA reduces to restricted-DINA when . We have made this explicit in the text (Page 6): “Thus, the generating model coincides with the R-DINA since DINA reduces to R-DINA when  [36], with   = 0.20 in this case. It is common to generate item parameters in this way in simulation studies as it will allow to study in a simple way the recovery of item parameters [2, 42]”.

 

Note that this is not necessarily the case in the empirical study, where guessing and slip could differ, and indeed appear to differ, for each item. We have added a table (see Table 1) with the estimated item parameters for each item and each estimation procedure with the full sample for evidence of this. With this variability, the R-DINA model may not be adequate for these data. Nájera et al. (2023) illustrate how popular relative fit statistics such as SABIC, AIC, BIC, and CAIC can be used to judge whether the fit of R-DINA is comparatively worse than that of DINA. We have incorporated this information into the text to classify this (Page 7): “The item parameters obtained with each of the estimation methods are reported in Table 1. Taking as a reference the estimates for MMLE-EM, it can be observed how in general guessing and slip differ for the same item as is the case of item 13, with a guessing close to zero (0.013) but a high slip (0.335), and guessing and slip also differ across items, with items such as item 2 where guessing and slip are low (0.016 and 0.041, respectively) and others such as item 8 where guessing and slip are high (0.444 and 0.182, respectively). That is, contrary to what occurs in the simulation study, these estimates would match the DINA model, which allows estimating different guessing and slip for each item and a loss of fit would be expected with the R-DINA, which estimates a single parameter common to all items. Consistent with this, the relative fit statistics led to retain the DINA model. (AICDINA = 9394.606 vs. AICR-DINA = 11686.39 y BICDINA = 10658.430 vs. BICR-DINA = 12783.130).”

 

We have addressed this matter in the Discussion section (Page 12): “Note however that the performance of R-DINA and NPC in terms of classification agreement in the simulation study was worse compared to the estimation of the DINA model. As argued in the method section, this is to be expected considering the variability in the estimated guessing and slip parameters and the results of the relative fit indices, which showed a preference for the DINA model. Exploring this prior to interpreting any model will therefore prove crucial.”

 

We trust this will help to clarify this disparity in results, which is congruent with the simulated condition and the relative fit data obtained with the empirical data.

 

 

 

R1.C3: There is some confusion about the relation between Bayes modal estimation and MCMC. Of course, given a distribution of MCMC samples, one can call their mean the EAP estimate, and their mode the MAP. But what is usually meant by BM is an optimization technique that regularizes ML with a subjective prior while MCMC is based on sampling, with the general idea that starting from a misspecified prior should not matter much except perhaps prolong convergence. Using a bad prior with BM matters dramatically.

 

Response: Thank you for pointing this out. Following the reviewer's recommendation, we have clarified this in the text (Page 6): “As a final note regarding the prior distribution, it is important to clarify that although both BM and MCMC require establishing a prior distribution, the effect of this choice may be greater for BM. This is so because BM regularizes the ML estimation using that prior distribution, while MCMC will only take the prior as a starting point but will generate a posterior distribution through sampling.”


R1.C4. Generally, the explanation of MCMC methods on page 3, especially of Gibbs sampling, is very superficial and should be improved.

 

Response: Thank you for pointing out this out. We have developed the text a little further to clarify this (Page 3): “An approach that has started to gain popularity in the psychometrics field is a Bayesian use of MCMC methods. The frequentist approach considers model parameters as fixed and provides point-estimates for those parameters. On the contrary, the Bayesian approach seek the posterior distribution of the model parameters. This posterior distribution is typically represented in terms of posterior mean and standard deviation, which would be the equivalent to the frequentist point-estimate and standard errors. MCMC methods are a class of algorithms for sampling from a probability distribution. To perform this process, it is necessary to define a complete likelihood and prior distribution for the parameters, which will be used to calculate a combined posterior distribution. MCMC techniques are employed to produce samples from this joint posterior distribution. Specifically, these methods use the previous sample values to randomly generate the next sample value, generating a Markov chain, estimating a posterior distribution. Each random sample is used to generate the next random sample, hence the chain [29]. One particular MCMC method, the Gibbs sampler, is very widely applicable and efficient to a broad class of Bayesian problems. Gibbs sampling is a special case of the Metropolis-Hastings algorithm, which is a generalized form of the Metropolis algorithm [30]. Gibbs sampling is applicable in situations where the joint distribution is not explicitly known or it is challenging to directly obtain samples from it, but the conditional distribution of each variable is known and can be more easily sampled from. Thus, the basic idea of Gibbs sampling is to iteratively sample from the conditional distribution, rather than drawing directly from the joint posterior distribution [31]. This sampler is used by default in popular software like Just Another Gibbs Sampler (JAGS) [32].”

 

R1.C5. Another point that has been glossed over is HMC -- the authors mention its potential advantages but later they say just that they tried it out and were satisfied it produced results similar to those from GS. This is fine, but was it faster, slower, easier, more difficult?

Response 5: Thank you for the oportunity to ellaborate more on this. We have addressed this in the text (Page 5): The computation times with Stan were considerably slower than those of JAGS. For example, in the simulation study, for a replication with 100 examinees, JAGS required 1.108 minutes and Stan 11.252. Since the results were basically identical, we conducted the complete study using JAGS. Another reason to prefer this software is that Zhan et al. (2019) [44] provide in their article the codes for other models besides DINA, so the researcher interested in applying other models can take advantage of this.”


R1.C6. Written in beautiful English, the paper is relatively short and can be extended in the directions specified above.

 

Response: We thank again the reviewer for his/her positive comment and careful review, which helped improve the manuscript.

 

 

Reviewer 2 Report

This is a very interesting and very well written manuscript on the comparison of different estimation techniques to fit the DINA model to small sample size datasets. The study is conducted in a clear and  thorough way. Here are some suggestions for improvement:

-As the MMLE-EM plays such a dominant role in the paper and in the existing packages to estimate the model, it would be insightful to give the marginal likelihood in the paper

 -Twenty replications is not very much: Do you have some indications that this number is enough to draw solid conclusions? Please give them in the text, or consider increasing the number of replications (preferred option).

 -Page 5: ” In a real situation, prior distributions should be established by the researcher considering the available evidence…”: I agree but at this point (or later in the paper) you might want to warn the reader as well, as in the case of small sample sizes, the prior can have important effects on the posterior.

 -I am wondering about the role of the starting values. In latent class analysis, commonly you consider multiple starting sets and then take the best results (highest likelihood). Otherwise you might end up in a local optima using MMLE. How is this dealt with in the present study. From the current manuscript it cant be ruled out that the suboptimal performance of MMLE is (partly) due to some solutions being ended up in a local optimum. So you may want to address this issue.

Author Response

R2.C1. This is a very interesting and very well written manuscript on the comparison of different estimation techniques to fit the DINA model to small sample size datasets. The study is conducted in a clear and thorough way. Here are some suggestions for improvement:

 

Response: We thank the reviewer for his/her positive feedback and careful review.

 

R2.C2. As the MMLE-EM plays such a dominant role in the paper and in the existing packages to estimate the model, it would be insightful to give the marginal likelihood in the paper.

 

Response: Following the reviewer's recommendation, we have included the marginal likelihood in the paper (see Equation 1).

 

R2.C3. Twenty replications is not very much: Do you have some indications that this number is enough to draw solid conclusions? Please give them in the text, or consider increasing the number of replications (preferred option).

 

Response: Please note that as indicated in the notes of Tables 2 and 3 the standard deviations of both studys are very low, which is an indication that there is a consistency in the results of just 20 replicas. The MCMC method is computationally demanding and that was the reason to keep the number of replications low. Nonetheless, we understand the reviewer’s concern and have incresed the number of replicates to 100 in the simulation study. We have updated the text, tables, and figures accordingly. With respect to the empirical study, we have chosen to include the distribution over the 20 samples in a new figure (see Figure 3). As can be seen, the conclusions drawn in the manuscript cannot be expected to be modified by increasing the number of samples, so we have chosen to maintain it, noting the implication of this decision in the discussion section. (Page 13): “Finally, the number of samples extracted in the empirical study was relatively small (i.e., 20). Although examination of the distribution of the results obtained (see Figure 3) shows that the conclusions drawn will be stable, the number of replicates could be increased for greater precision in order to interpret the averages more carefully. In relation to the previous point, in order to do this, it would be convenient to examine ways to speed up MCMC estimation.”

 

R2.C4. Page 5: “In a real situation, prior distributions should be established by the researcher considering the available evidence…”: I agree but at this point (or later in the paper) you might want to warn the reader as well, as in the case of small sample sizes, the prior can have important effects on the posterior.

 

Response: We agree with the reviewer on the importance of emphasizing this. We have included the following text (Page 6): “Also note that in small sample sizes, the established prior can have important effects on the posterior.”

 

R2.C5. I am wondering about the role of the starting values. In latent class analysis, commonly you consider multiple starting sets and then take the best results (highest likelihood). Otherwise you might end up in a local optima using MMLE. How is this dealt with in the present study. From the current manuscript it cant be ruled out that the suboptimal performance of MMLE is (partly) due to some solutions being ended up in a local optimum. So you may want to address this issue.

 

Response: Thank you for pointing this out. We understand the reviewer's concern. In this case we chose to use the default option in the R package GDINA, whereby three sets of starting values are considered, and the best set of initial values is selected based on the observed log likelihood. We understand that generally applied researchers will apply this default option. However, we have checked that increasing the number of sets of starting values to 200 did not change the results. We have clarified this in the text (Page 4): “As specified by default in the package, three sets of starting values have been generated and the best set according to the observed log-likelihood is used. This is done to avoid the problem of local optima using MMLE.”

 

We thank again the reviewer for his/her positive comment and careful review, which helped improve the manuscript.

 

Round 2

Reviewer 1 Report

Although rather terse (my fault), my recommendations have been perfectly understood and implemented. The paper matches the purpose of this special issue quite well and I recommend publishing in the present form.

Back to TopTop