Parameter Estimation of KST-IRT Model under Local Dependence
Round 1
Reviewer 1 Report
This is an interesting paper that discusses the connection between knowledge space theory (KST) and item response theory (IRT) models under a computational perspective. I have the following general remarks:
- I tried to understand the technical details of the introduction, but some points remain unclear to me. For instance, the equations at the bottom of p. 7 are derivatives for the model given in equations (10) and (11), but they contain, for instance, a parameter beta that does not appear in these former equations. Please clarify these derivations or point to the literature if these equations are derived elsewhere.
- Please provide the R functions used in this study. You can either share the functions via a data repository or as an R package.
- Please provide some details on the computation times on your hardware. It is hinted at several points that both approaches (MML-EM and EM-with-Gibbs) differ in their computation times.
- Please provide some measure for the accuracy of the bias of the parameter estimation techniques, e.g. a confidence interval for the values in Tables 2 to 4. Please also report directly in these tables what these numbers mean. To my understanding, these values are only based on 50 simulations per conditions (as stated on p. 11 as Nrep), so it might be difficult to interpret these values clearly.
- Although I like the idea of simulation study 2 - I generate data under one model and estimate another model to check the robustness of an approach - I am not sure whether these results can be summarized numerically in the form of Tables 3 and 4. After all, the meaning of the individual parameters differs between the data generation model and the estimated model, and we cannot expect them to be close to each other. I suggest to give the typical range of the estimated parameters under each condition instead.
- Finally, I suspect there are clear patterns that occur when an IRT model is applied to a dataset where a KST model would be more suitable, and vice versa. Ideally, one should be able to detect such patterns. What do you recommend with regard to checking the model fit?
Overall, I recommend the publication of the paper once these points have been addressed.
Author Response
Thank you for providing your valuable comments. We appreciate the reviewer's input as it will significantly enhance the quality of our manuscript and make it more relevant to the article's readership. Based on the reviewer's feedback, we have carefully incorporated the suggested comments into the manuscript. Please do not hesitate to inform us if further points require clarification or additional input. We are committed to addressing any remaining concerns or questions to ensure the overall quality of our work.
The reviewer's comments and the authors' responses are numbered below. The reviewer's comments are in black font, and the author's responses are in red.
Point 1: I tried to understand the technical details of the introduction, but some points remain unclear to me. For instance, the equations at the bottom of p. 7 are derivatives for the model given in equations (10) and (11), but they contain, for instance, a parameter beta that does not appear in these former equations. Please clarify these derivations or point to the literature if these equations are derived elsewhere.
Response 1: Thank you for pointing out the clarity with the notations. At the bottom of page 7, we presented the first derivative of the equation (10) with respect to each item parameter where , and b terms are, respectively, the probabilities of careless errors and lucky guesses from the BLIM model and the difficulty parameters from the Rasch model. and used in page 7 correspond to the terms which were first introduced on page 4 (lines 168-169), and the difficulty parameter corresponds to the IRT model presented on p. 3 (line 114).
We agree that the KST-IRT models can be presented with improved clarity. To make the connection more apparent, we repeated the definitions of the three KST-IRT parameters in equations (10) and (11) on page 7 with proper equation references. Derivation of the item response function follows the guideline given by Bock and Aitkin (1981) and Bock and Lieberman (1970) reference provided in the manuscript.
Point 2: Please provide the R functions used in this study. You can either share the functions via a data repository or as an R package.
Response 2: We appreciate the reviewer's valuable feedback regarding the R functions used in this research. We have taken the reviewer's suggestion into account and provided the specific R function used directly in the manuscript. However, we regret to inform you that no prepared R packages are available to estimate the KST-IRT model parameters. As our simulation study was conducted with a limited number of items and focused on two specific knowledge structures, the current implementation of the estimation in R is tailored to these limitations.
To address this limitation, we have included additional details about the estimation method and the estimation process in the Method and Discussion sections. This should provide readers with a better understanding of the estimation process and its limitations, given the scope of this study.
Point 3: Please provide some details on the computation times on your hardware. It is hinted at several points that both approaches (MML-EM and EM-with-Gibbs) differ in their computation times.
Response 3: Thank you for suggesting that the computation times can be indicated more clearly. The details of the computation time are now added in the simulation section.
Point 4: Please provide some measure for the accuracy of the bias of the parameter estimation techniques, e.g. a confidence interval for the values in Tables 2 to 4. Please also report directly in these tables what these numbers mean. To my understanding, these values are only based on 50 simulations per conditions (as stated on p. 11 as Nrep), so it might be difficult to interpret these values clearly.
Response 4: Thank you for your comment and request for additional information regarding the accuracy of the parameter estimation techniques and the interpretation of the values in new Tables 1 to 3(old Tables 2 to 4). In the parentheses of the tables, we provided the square root of the MSE for all conditions. In our case, the MSE is the sum of the estimate's sample variance and the squared bias. We agree that the RMSE alone may lack in distinctly presenting the bias apart from the spread of the simulated estimates.
To address this concern and provide a more comprehensive understanding of the accuracy of the bias in the estimates, we have included an extended comparison and discussion on the decomposed RMSE, the bias, and the standard error in the manuscript. These additional details should offer readers a clearer insight into the estimation accuracy. Furthermore, to improve the presentation of the contents in the tables and enhance their interpretability, we have added footnotes to each table, indicating the values, the averaged estimate, and the RMSE contained in each cell.
Lastly, we would like to address your observation that the values in the tables are based on 50 simulations per condition, potentially raising concerns about interpretability. We agree that this limitation needs to be acknowledged, and we have now explicitly mentioned the number of simulations conducted per condition in the relevant section of the manuscript.
Point 5: Although I like the idea of simulation study 2 - I generate data under one model and estimate another model to check the robustness of an approach - I am not sure whether these results can be summarized numerically in the form of Tables 3 and 4. After all, the meaning of the individual parameters differs between the data generation model and the estimated model, and we cannot expect them to be close to each other. I suggest to give the typical range of the estimated parameters under each condition instead.
Response 5: We appreciate the reviewer's valuable comment on the use of Tables 3 and 4, which are now Tables 2 and 3 after an edit. We wish to confirm if the reviewer intended to comment on simulation study 1 as the corresponding tables pertain to simulation study 1. Our response below assumes that the reviewer's comment is on simulation study 1.
We acknowledge that the estimated parameters of the misspecified estimation models may not closely approximate the parameter values of the model used for data generation. The role of the difficulty parameters in the KST-IRT model can seem different as they are used to calculate the state probabilities. However, the guessing and slipping parameters serve the same purpose in IRT and KST-IRT models, thus making them comparable in meaning. When local independence is assumed such that the knowledge structure corresponds to the power set, the difficulty parameter should also have the same purpose and meaning in the IRT and the KST-IRT models (Noventa et al., 2021).
The main goal of the manuscript is to compare the parameter recovery when the models used for the data generation and the parameter estimation are distorted or misspecified. This misspecification provides the trigger for retrieving different estimates. The authors intended to highlight this contrast, with and without misspecification, by comparing the new Table 1 (old Table 2) to Tables 2 and 3 (old Tables 3 and 4), where the new Table 1 showcases the typical estimates and their RMSE's under correct specification of the models.
We appreciate your insightful observation and have ensured that the manuscript clearly conveys the purpose of the presented tables.
Point 6: Finally, I suspect there are clear patterns that occur when an IRT model is applied to a dataset where a KST model would be more suitable, and vice versa. Ideally, one should be able to detect such patterns. What do you recommend with regard to checking the model fit?
Response 6: Thank you for raising this crucial point regarding the model fit and its implications for IRT and KST models. Detecting clear patterns that indicate when one model is more suitable than the other can be challenging and may not always manifest in straightforward ways. As demonstrated in our simulation studies, using a misspecified model for estimation can result in underperformance, but it is important to note that such estimates might still be acceptable to some extent if the misspecification is not overlooked.
To assess model fit, the likelihood-based approach is a suitable starting point. Likelihood-based methods are commonly used in both IRT and KST model estimation. However, knowledge structures can yield different model fits, especially with the KST or KST-IRT models, and the choice of state probability function in the KST-IRT model can also alter the fit.
Once again, we appreciate the reviewer's feedback on this point, and we have added the above discussion in the Discussion section as our future directions.
Reviewer 2 Report
Feedback for the manuscript entitled “parameter estimation of KST-IRT model under local dependence”
This study proposed a KST-IRT model that incorporates knowledge state theory (KST) into item response theory (IRT) models. As mentioned in the abstract, this study was intended to explore the estimation of the parameters involved in KST-IRT models (i.e., two estimation methods including Marginal Maximum Likelihood and Gibbs sampling) and the comparisons of the estimates of the KST-IRT models with those of the traditional combination of the Rasch model plus local independence. The structure of the whole paper was well-presented. The proposed models and estimation methods were well elaborated. Two simulation studies fit two main research goals.
Below are my suggestions and concerns that I hope can help the authors improve the quality of the manuscript.
1. In the simulation studies, the authors introduce five models, including RM, C+SLM, C+LKS, 2Q+SLM, and 2Q+LKS. Why did the author use “C” and “2Q” as models’ names? It would be great if the authors can provide some elaboration.
2. Why did the authors use 50-time replications in the simulation studies? Is there any rationale behind the choice of this number? Is the 50-time replication stable enough to get consistent parameter estimates?
3. The author mentioned they used the LTM package to obtain difficulty parameter? What r package did they use to obtain the estimates of guessing and slipping parameters?
4. In simulation study 1 (page 11), the authors mentioned power set, 2Q, but they did not provide the definition of Q. What is the Q? Does the number of Q matter in this simulation?
5. It seems that the authors can eliminate Table 1. I saw the same information presented in simulation study 2 without a table.
6. The authors presented Table 2 on page 12 but did not discuss the content of Table 2 in text (i.e., the first paragraph on page 12).
7. There are several Greek letters in the tables but did not provide the definitions for these Greek letters, such as b (difficulty), eta (guessing) or beta (slipping).
8. In Table 4, there are abbreviations (i.e., DG and FM). The authors provided the definitions for these two abbreviations in the title of Table 5. It would be good to have the definitions in Table 4 for the first time.
9. In the title of Table 4, I am not sure what the full mis-specified structure K meant.
10. In the discussion, I was wondering if there were existing literature (i.e., estimation approaches) that could be brought in for discussion with the findings in this study.
11. Are there any limitations related to two simulation studies? Are there any future research directions?
Author Response
Thank you for providing your valuable comments. We appreciate the reviewer's input as it will significantly enhance the quality of our manuscript and make it more relevant to the article's readership. Based on the reviewer's feedback, we have carefully incorporated the suggested comments into the manuscript. Please do not hesitate to inform us if further points require clarification or additional input. We are committed to addressing any remaining concerns or questions to ensure the overall quality of our work.
The reviewer's comments and the authors' responses are numbered below. The reviewer's comments are in black font, and the author's responses are in red.
Point 1: In the simulation studies, the authors introduce five models, including RM, C+SLM, C+LKS, 2Q+SLM, and 2Q+LKS. Why did the author use "C" and "2Q" as models' names? It would be great if the authors can provide some elaboration.
Response 1: Thank you for requesting further clarification. The calligraphic "C" and "" each represents the graded chain and the power set of the knowledge structures following the traditional KST notation. The knowledge structures were included in the models' names as KST-IRT models are formulated differently depending on this specification of the knowledge structures. We have now added the description of each model on page 11 to elaborate on the distinction between the five models.
We agree that the names of the models can be more clearly informed to the readers, and we added a brief explanation regarding how the model names were chosen above the bullet points on page 11.
Point 2: Why did the authors use 50-time replications in the simulation studies? Is there any rationale behind the choice of this number? Is the 50-time replication stable enough to get consistent parameter estimates?
Response 2: Thank you for inquiring about the replication number in the simulation study. We opted for 50-time replication based on established guidelines from relevant references that implement the approach of Bock and Aitkin (1980), such as Park et al. (2015), which also aimed for parameter estimation using MML. Experimenting with smaller or larger replications, the authors found that 50 replications offered a robust and consistent estimation. We acknowledge the importance of providing a clear rationale for our choice of replication number, and we now added an explanation regarding the selection of 50- replications in the manuscript to better inform our readers about our simulation design.
Point 3: The author mentioned they used the LTM package to obtain difficulty parameter? What r package did they use to obtain the estimates of guessing and slipping parameters?
Response 3: As the reviewer stated, we have used the LTM package when the assumed model is the Rasch model. For the KST-IRT parameter estimation, the package that implements the approach introduced in the manuscript is not yet available. When the assumed model is the KST-IRT model, Section 3 presented the derivation of the components needed for estimating the guessing and slipping parameters and the difficulty parameters.
We agree with the reviewer that the readers can be better informed regarding how the estimates were attained. We added the description to clarify how the model can be estimated in R programming on page 11. We also extended the discussion of the future direction regarding the estimation function.
Point 4: In simulation study 1 (page 11), the authors mentioned power set, 2Q, but they did not provide the definition of Q. What is the Q? Does the number of Q matter in this simulation?
Response 1: We appreciate your observation regarding the notation and definition of Q used in the simulation section. Q in simulation study 1 denotes the full domain as provided in the introduction of the KST literature on page 4, line 135, as "the full domain" of items . As we assume , we have that . To enhance the clarity, we have repeated the definition of Q and defined the number of items contained in the simulation section.
Point 5: It seems that the authors can eliminate Table 1. I saw the same information presented in simulation study 2 without a table.
Response 5: Thank you for the valuable suggestion to improve the clarity and conciseness of the manuscript. We have considered your feedback and removed Table 1. Instead, we have incorporated a concise statement presenting the true parameters in simulation study 1 as well.
Point 6: The authors presented Table 2 on page 12 but did not discuss the content of Table 2 in text (i.e., the first paragraph on page 12).
Response 6: Thank you for noticing the lack of discussion regarding Table 2. We agree that a discussion regarding the new Table 1 (old Table 2) contains the description of the table columns. We have now added further discussion regarding the content of Table 1, comparing the estimates and the performance with respect to the bias and RMSE.
Point 7: There are several Greek letters in the tables but did not provide the definitions for these Greek letters, such as b (difficulty), eta (guessing) or beta (slipping).
Response 7: We appreciate the reviewer's observation regarding the presentation of the Greek letters. As the reviewer mentioned, the parameter indicates the difficulty parameter from the RM and and indicate the lucky guess and careless error parameters discussed on page 5. We repeated the definition of the notations used in the tables as captions in all tables.
Point 8: In Table 4, there are abbreviations (i.e., DG and FM). The authors provided the definitions for these two abbreviations in the title of Table 5. It would be good to have the definitions in Table 4 for the first time.
Response 8: Thank you for clarifying the notation in Table 4. We have now added the definitions for these two abbreviations in new Table 2 and Table 3(old Table 3 and Table 4) as well.
Point 9: In the title of Table 4, I am not sure what the full misspecified structure K meant.
Response 9: We appreciate the clarification of the column label of Table 4. The misspecified structure K in the new Tables 2 and 3 entails(old Table 3 and 4) using different models for the data generation and the parameter estimation. For instance, in the first row, the response data was generated with the assumption of the power set, and the estimation process assumed the graded chain and the second row is the reversal of the first row, where the data generation process assumed the power set and the estimation assumed the graded chain. We agree that the notion of misspecification can be improved, and we added a statement in the simulation section to elaborate further on the misspecification of the structure.
Point 10: In the discussion, I was wondering if there were existing literature (i.e., estimation approaches) that could be brought in for discussion with the findings in this study.
Response 10: This research follows the guidelines of Bock and Liberman(1970) and Bock and Aitkin (1980) quite closely for the estimation method, MML-EM, and for the literature of the KST-IRT models, Stefanutti(2006) and Noventa et al. (2021) as discussed in the manuscript. The efficacy of this research is first on presenting the parameter estimation method for the new KST-IRT model and, second, investigating the estimate comparison under the misspecified models. Considering the short history of the KST-IRT model, the resource regarding the estimation approach for the estimation method, for instance, was quite limited. We have added a statement regarding this point of underdevelopment of the estimation approach in the discussion section.
Point 11: Are there any limitations related to two simulation studies? Are there any future research directions?
Response 11: Thank you for raising an important question about the simulation studies and their limitation. As discussed in the discussion section, the main limitation of the presented estimation method is that the computational burden increases quickly with a larger set of items, especially when the assumed model is a KST-IRT model. Another limitation is that this research narrowed down the choice of knowledge structure to the two most extreme cases: the power set and the graded chain. Choosing the two specific knowledge structures was well-fitting for the application of this manuscript. However, the estimation method needs further generalization for more complicated knowledge structures to investigate estimation performance in greater detail. The future direction intends to broaden the target knowledge structure where the estimation method also applies to arbitrary knowledge structures. We have added the above limitation to the discussion section as well.