Next Article in Journal
Incorporating the Third Law of Geography with Spatial Attention Module–Convolutional Neural Network–Transformer for Fine-Grained Non-Stationary Air Quality Predictive Learning
Previous Article in Journal
Quadratic American Strangle Options in Light of Two-Sided Optimal Stopping Problems
 
 
Article
Peer-Review Record

Novel Feature-Based Difficulty Prediction Method for Mathematics Items Using XGBoost-Based SHAP Model

Mathematics 2024, 12(10), 1455; https://doi.org/10.3390/math12101455
by Xifan Yi 1, Jianing Sun 1,* and Xiaopeng Wu 2
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Mathematics 2024, 12(10), 1455; https://doi.org/10.3390/math12101455
Submission received: 5 April 2024 / Revised: 25 April 2024 / Accepted: 5 May 2024 / Published: 8 May 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. The authors propose a Feature-based Difficulty Prediction Method for Mathematics Items Using XGBoost-based SHAP Model


2. The authors clearly show in the Introduction of the document the differences between the known models for predicting difficulty based on the characteristics of the items of mathematical tests and the model they propose. Thus making explicit the novelty of its model.
3. The proposed model is based on a rigorous framework for evaluating the difficulty of mathematics questions by listing several characteristics necessary for this task, based on the relevant literature and the characteristics of mathematics questions. They take into account the demands of mathematics in terms of cognition and ability and in the construction of the mathematical knowledge system. The research data were then subjected to feature extraction, followed by data analysis of the extracted features.


4. The methodology is divided into three stages: 1st. Development of the model where they used and improved the XGBoost model; 2nd. Parameterization of the model, they improved it using the grid search method; and 3rd. .Interpretation of the model, in which they integrated the SHAP model based on XGBoost. These processes are consistent with the bases of the model and carried out by the authors with acceptable scientific rigor. In the Results section, the authors compare their model with two other models and show the advantages of the proposed one over the others. The establishment of clear methodological processes and the results shown allow giving credibility, authenticity, and confidence to the model proposed in the research. This predictive model can contribute to improving the assessment of students' performance in mathematics, due to its greater precision and confidence for predicting item difficulty levels.

Author Response

For research article

Response to Reviewer 1 Comments

1. Summary

 

 

Thank you for your time and effort in reviewing our manuscript and for your very positive feedback. We have addressed your comments in the revisions highlighted below. Meanwhile, we have found several places where English could be further improved throughout the manuscript. These changes can be found in yellow for your review. We appreciate the opportunity to improve our manuscript with your guidance.

2. Questions for General Evaluation

Reviewer’s Evaluation

Response and Revisions

Does the introduction provide sufficient background and include all relevant references?

Yes

Thanks

Are all the cited references relevant to the research?

Yes

Thanks

Is the research design appropriate?

Yes

Thanks

Are the methods adequately described?

Yes

Thanks

Are the results clearly presented?

Yes

Thanks

Are the conclusions supported by the results?

Yes

Thanks

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1: In general terms, it seems to me that the article is excellent and meets the necessary conditions to be published.

Response 1: Thank you for your positive feedback.

Comments 2: It deals with a topic of interest to the community of mathematicians and mathematics educators. Since the processes of evaluating student performance worldwide require increasingly better developed tests.

Response 2: We appreciate your acknowledgment of the relevance of our topic.

Comments 3: In particular, the article presents a novel feature based approach for assessing the difficulty of mathematical test items. Therefore, the objective of the article is very clear.

Response 3: Thank you for recognizing the clarity and novelty of our approach.

Comments 4: The background is presented in a way that shows the differences that make it novel between the existing models related to the prediction of featured-based difficulty and the model that the authors propose.

Response 4: We are glad you found the background effective in validating the novelty of our model. 

5. Comments 5: The proposed method seems novel and interesting to me and contributes to progress in Feature-based Difficulty Prediction.

Response 5: Your comments are greatly appreciated.

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript is well written and clearly outlines the problem, the framework, the data, and the conclusions. There are a couple of instances where I was unclear of which tables I was to refer to as a reader:

Page 5 line 212 refers to Table 3 twice (Table 3 and Table 3)

Page 7 line 219 refers to Table 3 twice "Refer to Table 3 and Table 3."

Additionally, I was unsure about the levels in the CLF. You stated that Blooms Taxonomy was divided "into four levels of increasing complexity" yet only 3 levels were stated. I would also be interested to know why 4 levels and how they mapped onto Bloom's.

In Figure 2, I was not quite clear what the different colours in the PCA represented.

Overall, a strong paper and an interesting and applicable way to predict the difficulty of test items.

Author Response

For research article

Response to Reviewer 2 Comments

1. Summary

 

 

Thank you for the time and effort you have invested in reviewing our manuscript. We are grateful for your insightful comments and have addressed each point in the detailed responses below. We have made the appropriate modifications as suggested and have highlighted these changes in red font in the manuscript for review. Meanwhile, we have found several places where English could be further improved throughout the manuscript. These changes can be found in yellow for your review. Your feedback has been invaluable in enhancing the quality of our paper. We hope our revisions meet your expectations and look forward to your further suggestions.

2. Questions for General Evaluation

Reviewer’s Evaluation

Response and Revisions

Does the introduction provide sufficient background and include all relevant references?

Yes

Thanks

Are all the cited references relevant to the research?

Yes

Thanks

Is the research design appropriate?

Yes

Thanks

Are the methods adequately described?

Yes

Thanks

Are the results clearly presented?

Yes

Thanks

Are the conclusions supported by the results?

Yes

Thanks

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1: Page 5 line 212 refers to Table 3 twice (Table 3 and Table 3)

Response 1: Thanks for pointing out the typo. We have revised to keep only one Table 3 in line 255.

Comments 2: Page 7 line 219 refers to Table 3 twice "Refer to Table 3 and Table 3."

Response 2: Thanks for pointing out the typo. We have revised to keep only one Table 3 in line 262.

Comments 3: Additionally, I was unsure about the levels in the CLF. You stated that Blooms Taxonomy was divided "into four levels of increasing complexity" yet only 3 levels were stated. I would also be interested to know why 4 levels and how they mapped onto Bloom's.

Response 3:

Thank you for pointing out the discrepancy in the levels of the Cognitive Load Framework as related to Bloom's Taxonomy in Table 1. We have revised the description of the CLF and its levels of complexity to ensure accuracy and alignment with Bloom's Taxonomy in Table 1 as the following:

“Based on Bloom's cognitive domain and the analysis of behavioral verbs in the three-dimensional objectives of the Chinese mathematics curriculum standards, we divide the cognitive levels into three levels of increasing complexity: Cognitive Level A (Understanding, 1 point), Cognitive Level B (Comprehension, 2 points), and Cognitive Level C (Mastery, 3 points). Next, we evaluate the cognitive level for each knowledge point in a question separately, sum them up to obtain the cognitive level score for that item, and then categorize them into four grades based on their respective ranges: [0,7], (7,11], (11,16], and (16,22].”

Comments 4: In Figure 2, I was not quite clear what the different colours in the PCA represented.

Response 4: Thank you for your question about the color representation in Figure 2. To clarify, Table 2 shows the abnormal features during the PCA dimensionality reduction process. Blue scatters represent abnormal features and Magenta represents normal features. We have added the caption to Figure 2. We have further described the abnormal features in Table 2 in line 270-277 as the following:

“Figure 2 depicts the distribution of abnormal features in PCA dimensionality reduction, with blue data points representing these abnormal features. Abnormal features are defined as those that differ from or deviate from the normal pattern during the dimensionality reduction process. They may affect the PCA results, causing changes in the principal variance or distorting the data distribution. Consequently, it is of paramount importance to monitor the distribution of abnormal features during PCA dimensionality reduction, evaluate their influence on the results, and implement appropriate corrective measures, such as the removal of outliers.”

4. Response to Comments on the Quality of English Language

Point 1: English language fine. No issues detected

Response 1: Thank you for your positive feedback on this aspect of our manuscript. We have found several places where English could be further improved throughout the manuscript. These changes can be found in yellow for your review.

5. Additional clarifications

We have improved our English writing and format throughout the manuscript. These changes are highlighted in yellow in the manuscript.

 

 

 

 

 

 

 

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Author

Congratulations on submitting your article to this journal. We have been trusted to read your article. As a result of reading, we have written several notes to improve your article. Please see the article that has been sent to us. We wish you success in the next stage.

best regards

Comments for author File: Comments.pdf

Comments on the Quality of English Language

In general, the English is good, but in terms of the structure of the article, it needs to be rewritten, especially in relation to the formulation of research questions.

 

Author Response

For research article

Response to Reviewer 3 Comments

1. Summary

 

 

Thank you for your time and effort in reviewing our manuscript and for your very positive feedback. Your insightful feedback has been instrumental in enhancing the quality of our work. We have made the appropriate modifications as suggested and have highlighted these changes in red font in the manuscript for review. Meanwhile, we have found several places where English could be further improved throughout the manuscript. These changes can be found in yellow for your review. We appreciate the opportunity to improve our manuscript with your guidance.

2. Questions for General Evaluation

Reviewer’s Evaluation

Response and Revisions

Does the introduction provide sufficient background and include all relevant references?

Can be improved

Revised and elaborated in the point-by-point response below.

Are all the cited references relevant to the research?

Can be improved

Revised and elaborated in the point-by-point response below.

Is the research design appropriate?

Can be improved

Revised and elaborated in the point-by-point response below.

Are the methods adequately described?

Can be improved

Revised and elaborated in the point-by-point response below.

Are the results clearly presented?

Yes

Thanks

Are the conclusions supported by the results?

Yes

Thanks

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1: I suggest that the structure of writing an abstract follows the structure of a scientific work resulting from research including; Background of the problem, research methods, research samples, data collection methods, data analysis methods, results obtained and limitations or inferences from the results obtained. Even though it has been generally stated in this abstract.

Response 1: Thanks for the suggestion. We have rewritten the abstract to include the suggested structure. Please see line 1 to 22.

Comments 2: The statement that "text-based feature extraction" is less effective for mathematics and physics subjects must be supported by the results of research that has been carried out, so that this claim is stronger and more reliable. We suggest including some previous research results related to this claim.

Response 2: We have included 4 papers in line 46 and line 134 to support the statement that text-based feature is less effective for math and physics. We have also highlighted in the manuscript the reason for the ineffectiveness as the following in line 42-46 and line 131-134:

“However, for disciplines such as mathematics and physics, which involve logic, symbolic language and a systematic knowledge system, text-based feature extraction may not fully capture the unique features of these subjects. As a result, it may not effectively reveal students’ logic and reasoning abilities [19–22].”

“However, in mathematics and physics, the difficulty of test items is influenced by the amount of computation, the knowledge background and the correlation between knowledge items, which cannot be determined by lexical and syntactic features alone[19-22]”   

Comments 3: Even though the structure of the article has been explained, it is also necessary to write the research questions (Q1, Q2,...etc) sequentially according to the stages of research implementation and also according to the results obtained. This is important so that readers can easily follow the author's flow of ideas.

Response 3: We have outlined the research questions sequentially in line 85-108 as the following:

“In summary, the three main research questions of this study are as follows:

Q1. Feature Extraction Methodology of Mathematics Items: What methodologies can be deployed to identify the critical features that determine the difficulty of mathematics items and to conduct an effective analysis of the data?

Q2. XGBoost-based SHAP Model Application: How does the XGBoost-based SHAP model function in predicting item difficulty, and what is the role of each identified feature in contributing to this prediction?

Q3.How many knowledge units can the subjective questions of the Chinese college entrance examination be classified into? How do the eight features influence the difficulty of test items in different knowledge units?”

 

We have also rewritten the paper organization structure in line 95-108 to reflect specific methods and contents associated with the above research questions.

 

“The paper is organized as follows: In the initial section of this study, we present the background and content of the research. In the second section, we provide a comprehensive overview of current methods used to assess exam difficulty and delve into the practical application of both the XGBoost and SHAP models in different domains. The third section delves into the specifics of feature extraction rules and data analysis to address the first research question. Following this, the fourth section meticulously outlines the process of model construction, including the incorporation of relevant formulas and parameter optimization efforts, thus addressing the second research question.   Then, in the fifth section, we delve deeper into the analysis by evaluating the individual contributions of the features, thereby increasing the transparency of the model and further addressing the second research question. In addition, we classify the knowledge units of the Chinese college entrance examination and study the distribution of each knowledge unit in eight characteristics, which solves the third research problem. In the last section, we discussed possible future directions and limitations of our model.”

Comments 4: It may also be necessary to indicate in which part or where the position, in this diagram, is the position of the XGBoot and SHAP models

Response 4: We have revised the Fig. 1 to show the position of XGBoost. We also have included a description of the position of XGBoost in machine learning in line 135-142 as the following:

“In summary, traditional methods, statistical methods and machine learning methods have achieved high accuracy in predicting the difficulty of exam items (Figure 1), This study employed the XGBoost model in machine learning, which has been demonstrated to exhibit high accuracy and generalizability.

However, the XGBoost model was not designed with interpretability in mind and is therefore a "black-box" model, Thus, in this study we further use SHAP to increase the interpretability of XGBoost results. We could then reveal what features are primary contributors to the exam difficulty.”

SHAP is not included in Fig. 1 because it is not a method for predicting the difficulty of test items. It is employed in this study to elucidate the results of XGBoost, thereby rendering our model less opaque and, in turn, informing educational practice. The role of SHAP in this study is elucidated in greater detail in lines 172 through 183.

Comments 5: We propose to include several previous research results, related to the integration of "the XGboot-based SHAP" strategy. This is important as proof that this integration strategy has been used and is reliable. It is important to avoid one way of trial and error.

Response 5: The integration of XGBoost and SHAP has been used in traffic accident feature detection and has demonstrated enhanced accuracy and interpretability. We elaborated on reasons for the integration of XGBoost and SHAP and used previous studies as examples to explain the rationality in line 184-193:

“Based on the analysis of XGBoost model and SHAP method, we can find that XGBoost model has better generalization ability than other machine learning methods. It can deal with the overfitting problem of the model more effectively to achieve faster learning and faster model exploration.  Therefore, it is a very good model for classification and nonlinear problem solving in related fields.  In terms of model interpretation, the SHAP method can rank the importance of the features that affect the model and help researchers better understand the factors that affect the model. Although no researchers have combined the XGBoost and SHAP model in the field of mathematics education, the combination of these two models have been applied to real-time accident detection and feature analysis in the aspect of traffic safety factors and traffic accident feature detection.”

We also described how the integration of XGBoost and SHAP could improve the accuracy and interpretability in line 194-203:

“In summary, this paper proposes a SHAP model based on XGBoost, which is a kind of machine learning model. This model combines and improves the two previous models to accurately evaluate the complexity of mathematics test questions. In fact, the XGBoost-based SHAP strategy used in this paper is effective in improving the interpretability of the model as well as the accuracy of test difficulty prediction, while taking into account the generalization performance of the model. Firstly, it improves interpretability by revealing the contribution of each feature to the prediction results. In addition, this integration also allows a ranking of feature importance to be derived, which helps to identify influential features. This is valuable for understanding the impact of data and features, as well as for feature selection, and improves model ”

Comment 6: Explain the background to selecting the eight features, what basis was used or what references were used or what considerations were used to select these eight features. Whether previous researchers have never used it or never and is based on limitations in previous research.

Response 6: We have distinguished the features we used following the guideline of National Center for Education Center and the features used in this study to improve the representativeness. We revised line 218~234 to reflect this distinction:

“In order to address the first research question of this study, a novel framework for the evaluation of the difficulty of test items is proposed, which will overcome the aforementioned limitations. We identified eight features that significantly influence the difficulty level of Chinese high school math exam items, namely Parameter Level Features (PLF)}, Reasoning Level Features (RLF), Thinking Mode Features (TMF), Calculate Rank Features (CRF), Background Information Features (BIF), Character Count Features (CCF), Knowledge Content Features (KCF), and Cognitive Level Features (CLF). The four selected features, PLF, RLF, CRF and BIF, are based on the framework of NCES. However, the specific extraction rules for these features have been defined differently from those proposed by NCES, as shown in Table 1. Furthermore, this study considers TMF, CCF, CLF, and KCF to be important features that affect the difficulty of math tests. TMF reflects the mathematical thinking mode needed to solve the problem. CCF reflects the amount of information and complexity of the test. CLF represents the cognitive depth required by the test. KCF involves the specific mathematical knowledge content and quantity involved in the test, as shown in the Table 1. These features directly affect the way students understand and respond to the questions, and thus they are crucial in assessing the difficulty of the questions.”

Comment 7: I am sure that most readers already know the source of this formula, but there are also readers who have never known the origin of this equation. Therefore, we suggest that every equation contained in this article or paper be written down from the source.

Response 7: We have included the references for the calculation of the difficulty coefficient (line 251), the decision tree functions(line 298), and the Taylor expansion for solving XGBoost approximate (line 313).

Comment 8: (1) Our suggestion is that the results of this study be shown sequentially in accordance with the research objectives that have been formulated (Q1, Q2, ...etc) in the introduction.

(2) Our suggestion is that the discussion of results of this study be shown sequentially in accordance with the research objectives that have been formulated (Q1, Q2, ...etc) in the introduction.

Response 8: 

(1) We have included how the discussion in Sections 5.2 and 5.3 match research questions 2 and 3, respectively, in the first paragraphs of each section.

Specifically, in Section 5.2, we added in line 431 that “In the preceding sections, in order to address the initial research question of this study, we presented a novel framework for evaluating the difficulty of test items (Table 1). In order to address the second research question of this study, this section offers an objective interpretation of the results, ranking their importance based on eight features.”

In Section 5.3, we added in Section 5.3 line 455 that “In order to address the third research question of this study, this section categorises the subjective items in CNCEE mathematics into nine distinct knowledge units based on the Chinese curriculum standards [72], then analyze the distribution of each knowledge unit across the eight features.”

(2) In Section 6 Discussion part, we also described how the analysis connects to research questions by adding the following:

Section 6.1 line 504: “This section will present a detailed discussion of the first and second research objectives of this study. ”

Section 6.2 line 574: “This section addresses the third research objective of this study the results are shown in Figure 7.”

Comment 9: It is also necessary to add what the inferential results of this study are about the world of education and learning. especially in the fields of mathematics and physics.

Response 9: We have outlined potential applications of this study in mathematics education across the Discussion part. Specifically,

Lines 402: “Through this analysis, we reveal notable differences in the distribution of the eight features across different units of the textbook, which can assist teachers in placing different emphasis on different units during the instructional process. These findings have the potential to inform classroom practice by providing tools that are better aligned with and responsive to students’ learning needs.

Line 564: “Educators can utilize the factors that have been identified to enhance the design of test papers and regulate the overall difficulty of test papers in order to better align with the diverse learning needs of students. Furthermore, this approach could facilitate the development of more targeted teaching strategies that would ultimately improve learning outcomes in mathematics education.”

Line 612: “This analysis has revealed significant disparities in the distribution of the eight features across various units of the textbook. Overall, this study presents considerable potential for the field of mathematics education. It can equip teachers with more effective instructional support, enabling them to navigate the differences in teaching between distinct knowledge units. Moreover, it aids in comprehending the focal points and challenges of different knowledge units, assisting teachers in better understanding and addressing students’ learning needs.

4. Response to Comments on the Quality of English Language

Point 1: In general, the English is good, but in terms of the structure of the article, it needs to be rewritten, especially in relation to the formulation of research questions.

Response 1: Thanks for point out our English problem and structural issues. We have found several places where English could be further improved throughout the manuscript. These changes can be found in yellow for your review. We have rewritten the research questions in line 85-108 and the discussion in Sections 5.2 and 5.3 to elaborate how the results address corresponding research questions.

5. Additional clarifications

We have improved our English writing and format throughout the manuscript. These changes are highlighted in yellow in the manuscript.

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Thank-you for the edits and the responses to my suggestions. The paper flows well and is clearer than before.

Back to TopTop