Next Article in Journal
Exploring Ensemble-Based Class Imbalance Learners for Intrusion Detection in Industrial Control Networks
Next Article in Special Issue
Clustering Algorithm to Measure Student Assessment Accuracy: A Double Study
Previous Article in Journal
Gambling Strategies and Prize-Pricing Recommendation in Sports Multi-Bets
Previous Article in Special Issue
How Does Learning Analytics Contribute to Prevent Students’ Dropout in Higher Education: A Systematic Literature Review
 
 
Article
Peer-Review Record

Customized Rule-Based Model to Identify At-Risk Students and Propose Rational Remedial Actions

Big Data Cogn. Comput. 2021, 5(4), 71; https://doi.org/10.3390/bdcc5040071
by Balqis Albreiki 1,2, Tetiana Habuza 1,2, Zaid Shuqfa 1, Mohamed Adel Serhani 1,*, Nazar Zaki 1,2 and Saad Harous 3
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Big Data Cogn. Comput. 2021, 5(4), 71; https://doi.org/10.3390/bdcc5040071
Submission received: 3 October 2021 / Revised: 13 November 2021 / Accepted: 24 November 2021 / Published: 29 November 2021
(This article belongs to the Special Issue Educational Data Mining and Technology)

Round 1

Reviewer 1 Report

In order to improve the quality of this work, some comments have been given as below.

1.Introduction should highlight the motivation of proposing this customized rule-based model for identifying at-risk students, and a brief introduction of this issue. Authors are suggested to refine introduction section.

2.Authors are suggested to add a literature review section. They can move some paragraphs in introduction section to this section.

3.By the way, authors should provide enough literature review to illustrate the development of applying data mining models to recognize at-risk students in higher education institutes.

4.In sub-section 2.1, authors should provide a clear introduction of using input and output variables.

5.In subsection 2.2, authors need to clarify how to clean and prepare data, such as normalization.

  1. In sub-section 2.3, authors should clarify what kind of analysis this step will do. From figure 1, this step only implements correlation analysis.

7.In methodology section, we don’t know how to build a customized rule-based model. Please improve it. And, it’s better to provide a flow chart to illustrate the implemental procedure, and describe they used methods and equations.

8.Form figures 4~5, we still don’t know how to build this customized rule-base model. In addition, from the results shown in both figures, did authors only use student id and performances of exams and homeworks?

9.Section 4 is too short to highlight the contribution of this work. Please enrich it.

10.An additional discussion section or subsection is needed to be included.

Author Response

In order to improve the quality of this work, some comments have been given as below.

Reply: we would like to thank the reviewer for his/her time to review our paper and provide us with constructive feedback that helped us to improve both the organization and the quality of the paper.

Point 1.1: 1. Introduction should highlight the motivation of proposing this customized rule-based model for identifying at-risk students, and a brief introduction of this issue. Authors are suggested to refine introduction section.

Reply: Thank you for your suggestion, the Introduction section is revised with respect to the comments, please have a look at line no. 14 – line no. 80. All the changes are highlighted in red.

Point 1.2: 2. Authors are suggested to add a literature review section. They can move some paragraphs in introduction section to this section.

Reply: Thank you for your feedback, a comprehensive literature review section is provided starting at line 81 to line 226. We revised this section according to your suggestion.

Point 1.3: 3. By the way, authors should provide enough literature review to illustrate the development of applying data mining models to recognize at-risk students in higher education institutes.

Reply: Your comment is highly appreciated, an entire section of LR is provided as stated above from line 81 to line 226. Research gap was also identified in the introduction section starting from line 51 to line 62, it is highlighted in red color in the revised manuscript.

Point 1.4: In sub-section 2.1, authors should provide a clear introduction of using input and output variables.

Reply: Thank you for this valuable comment. We clearly specified all the input and output variables in the sub-section 3.1 (Please refer to pages 5-6, lines 248-252)  

Point 1.5: In subsection 2.2, authors need to clarify how to clean and prepare data, such as normalization.

Reply: Thank you for this valid comment, we updated the manuscript by adding the explanation on how the data were preprocessed and normalized (Please refer to page 6, lines 268-271).

Point 1.6: In sub-section 2.3, authors should clarify what kind of analysis this step will do. From figure 1, this step only implements correlation analysis.

Reply: Your comment is highly appreciated. As it is mentioned in the sub-section 2.3, we started from the exploratory analysis of the data before moving to the designed model. We added comparative analysis of students’ group (Table 2). Then we also explored the bivariate relationships between the features using correlation analysis and distribution plots. All these steps were conducted to explore the data and assess the potential predictive power of the proposed model (Please refer to page 6, lines 273-277; and page 7, Table 2)

Point 1.7: In methodology section, we don’t know how to build a customized rule-based model. Please improve it. And, it’s better to provide a flow chart to illustrate the implemental procedure, and describe the used methods and equations.

Reply: Thank you for highlighting this concern. We enriched the methodology section and added the description on how the proposed rule-based model can be built from scratch.

Point 1.8: Form figures 4~5, we still don’t know how to build this customized rule-base model. In addition, from the results shown in both figures, did authors only use student id and performances of exams and homeworks?

Reply: Thank you for your comment. As our dataset consisted of performance of the students at the checkpoints only (assignments, tests, homework assignments), the proposed model uses information from checkpoints cumulatively and calculated the value of output variable RF.

Point 1.9: Section 4 is too short to highlight the contribution of this work. Please enrich it.

Reply: Thank you for this comment, Section 4 is revised as new section 5, please refer to changes highlighted in red color in the revised version of the manuscript.

Point 1.10: An additional discussion section or subsection is needed to be included.

Reply: Thank you for this proposition, we have added a discussion session where we discuss the results of our model evaluation with regards to the correlation between the number of remedial actions and the grades distribution.  

Reviewer 2 Report

The materials presented in the article can be published and are of interest. The authors should expand the list of analyzed factors in further research and take  not only the academic achievements of students.

Author Response

The materials presented in the article can be published and are of interest. The authors should expand the list of analyzed factors in further research and take not only the academic achievements of students.

Reply: Thank you for your comment that will help improving the quality of the paper. We totally agree with you, we are planning to extend this work further and add additional features to the dataset such the prior knowledge of the student (high school grades, historical data, social data, etc.) and study how these factors will affect the students’ performance.

Reviewer 3 Report

1. The title suits the context of the special issue well.

2. The manuscript needs proofreading.  Some of the sentences seem to be the opposite of or irrelevant to what the authors wish to express, such as:

  • In the abstract: reducing student retention rate; should it be reducing student "dropout" rate?
  • Line 179: role-baed should be rule-based?

3. Some obvious grammar errors, for examples:

  • Line 15: "Student's success" should be "Students'" success
  • Line 19: "peers family" should be "peers'" family
  • etc.

For 2 and 3 I suggest a thorough proofread. I also suggest that you run the manuscript through a grammar checking software.

3. Line 207-210: the passage is supposed to describe Table 1, but the symbols used in the passage are quite different from those used in Table 1. For example, g should be grade but was never mentioned in the passage. Symbols used in the passage, such as HW, Q, do not appear in Table 1.

4. The author highlighted several concepts in the title and abstract but they were not addressed adequately in the manuscript, namely:

  • "Prediction" seems to be one of the core concept of this study. Starting from LINE 181, "The proposed solution will allow universities to evaluate and predict the student’s performance (especially at-risk) and develop plans to review and enhance the alignment between planned, delivered, and experienced curriculum". (again a grammar error "student's") Also in the related works reviewed quite some of them involved prediction. However, there seems to be no ground truth info in this study to verify if the score calculated by this system can predict a student's outcome of this class. The score can serve as a warning system for the teacher, but there were no proof regarding if the score was a good predictor to the ground truth.
  • A lot of the parameters in the system are specified by the teacher, and it seems that it's up to the teacher to set the parameters so that it can "predict" student outcome better. In this sense, the system's goal is to provide the teachers with a toolkit to design the evaluation flow, not generate accurate predictions automatically. If this is the study's goal, the research question and related works need to be modified extensively to reflect this goal. In addition, some proof needs to be provided to prove the system's performance. Currently there is an example in the study regarding how the system can be used, but not how it performed. Since many parameters are specified(customized) by a teacher, an objective performance indicator (such as predictive accuracy) is out of the question. In this case, some kind of case study or satisfactory survey can be used to prove the system's usefulness.
  • The term "Remedial Actions" is in the title and all over the manuscript. However, there are no specific examples of the remedial actions and how they will correspond with the score. It seems that it has a vital role in this manuscript but was not elaborated in the manuscript.

5. Line 337: "However, the results still lack the efficiency in terms of time and high false prediction. In this study, we propose a customized rule-based model that offers timely identification for at-risk students. It responds to the student’s performance in the assessment component as it happens. Also, it visualizes the at-risk students to support the educator in recognizing these students and offer timely intervention." The current system did not address the issue of high false prediction, and manually specified parameters by teacher mean that the accuracy will depend on the teacher that uses the system. The system serves as a warning system for the instructors and provides a useful framework for evaluating students. Still, it does not help an instructor to learn the optimal parameters.

6. Overall:

  • The system does not generate optimal parameters for the users from ground truth. Thus the value of this study is to provide a reasonable framework and tools for instructors to customize.
  • The authors need to prove its value by getting good qualitative or quantitative feedback from the instructors or objective indicators. (for example, better grades from the classes that use this system)
  • As it is structured, the manuscript needs to be modified and extended to address the above two issues.

Author Response

  1. The title suits the context of the special issue well.

Reply: Thank you for your time to review our paper and provide us with constructive feedback that helped us improve the quality of the paper.

  1. The manuscript needs proofreading.  Some of the sentences seem to be the opposite of or irrelevant to what the authors wish to express, such as:

Reply: We appreciate your comment in highlighting these spelling mistakes, we fixed these in the revised version of the paper.

  • In the abstract: reducing student retention rate; should it be reducing student "dropout" rate?

Reply: Updated in page 1, line 1, and highlighted in red in the revised version of the manuscript.

 

  • Line 179: role-baed should be rule-based?

Reply: Updated in page 2, line 63, highlighted in red in the revised version of the manuscript.

  1. Some obvious grammar errors, for examples:
  • Line 15: "Student's success" should be "Students'" success

Reply: Updated in page 14, line 407, highlighted in red in the revised version of the manuscript.

 

  • Line 19: "peers family" should be "peers'" family
  • etc.

Reply: Updated in page 1, line 35, highlighted in red in the revised version of the manuscript.

For 2 and 3 I suggest a thorough proofread. I also suggest that you run the manuscript through a grammar checking software.

Reply: Thank you for your valuable comments, we conducted few iterations of proofread as well as we run the manuscript under Grammarly premium software, and all typos and English pitfalls have been revised and fixed.

  1. Line 207-210: the passage is supposed to describe Table 1, but the symbols used in the passage are quite different from those used in Table 1. For example, g should be grade but was never mentioned in the passage. Symbols used in the passage, such as HW, Q, do not appear in Table 1.

Reply: Thank you for pointing us to this issue. We updated the manuscript according to your suggestion.

  1. The author highlighted several concepts in the title and abstract but they were not addressed adequately in the manuscript, namely:
  • "Prediction" seems to be one of the core concept of this study. Starting from LINE 181, "The proposed solution will allow universities to evaluate and predict the student’s performance (especially at-risk) and develop plans to review and enhance the alignment between planned, delivered, and experienced curriculum". (again a grammar error "student's") Also in the related works reviewed quite some of them involved prediction. However, there seems to be no ground truth info in this study to verify if the score calculated by this system can predict a student's outcome of this class. The score can serve as a warning system for the teacher, but there were no proof regarding if the score was a good predictor to the ground truth.

 

Reply: Thank you for your thoughtful suggestions and insights. Regarding to the good predictor point, we added new section in the manuscript which is section 5 (Discussion and Future Work) where we validate the usage of the proposed framework by assessing the distribution of the total grade values with relation to the number of remedial actions.  Figure 8 shows that the greater number of remedial actions corresponds to the lower total grades. The linear relationship between the number of remedial actions and the total grade was also assessed with Pearson correlation coefficient. The calculated value of −0.803 is statistically significant (p=1.54 e−50).  Therefore, the proposed customized model may be used as an effective warning system to identify the students at risk at early stages.

 

  • A lot of the parameters in the system are specified by the teacher, and it seems that it's up to the teacher to set the parameters so that it can "predict" student outcome better. In this sense, the system's goal is to provide the teachers with a toolkit to design the evaluation flow, not generate accurate predictions automatically. If this is the study's goal, the research question and related works need to be modified extensively to reflect this goal. In addition, some proof needs to be provided to prove the system's performance. Currently there is an example in the study regarding how the system can be used, but not how it performed. Since many parameters are specified(customized) by a teacher, an objective performance indicator (such as predictive accuracy) is out of the question. In this case, some kind of case study or satisfactory survey can be used to prove the system's usefulness.

 

Reply: We agree with the reviewer therefore, future directions are now added to the manuscript.

The system suggests remedial actions based on the severity of the case and the time the students are flagged.  Moreover, we are looking to collect either qualitative or quantitative feedbacks from instructors about the usefulness of the system and focusing more on assuring that this work provides a useful solution for evaluating students. The system will be deployed with general checkpoints and to be easy to use by any instructor. We will make use of reinforcement learning for optimizing the system according to the teacher’s feedback in terms of design and optimal parameters. This is a very efficient way to ensure continuous improvement of the model based on past experiences.

 

  • The term "Remedial Actions" is in the title and all over the manuscript. However, there are no specific examples of the remedial actions and how they will correspond with the score. It seems that it has a vital role in this manuscript but was not elaborated in the manuscript.

 

Reply: We highly appreciate your comment. Please note, that by proposing the customized model we intend to let the model be adaptable to different settings. For instance, the Programming lab and Object-oriented programming courses have a different set of remedial actions that have to be invoked. The first one requires more practical tasks to improve the coding skills, the second should be based on blended tasks, involving programming methodology and practice.

As for the second part of your question. It is very difficult to link the list of remedial actions with the risk flag value. We did not elaborate this in the paper and we will address this issue in as our future extension. For the moment, our model suggests the method to identify the students at risk. The choice of remedial actions is on the instructor, who teach the course.   

  1. Line 337: "However, the results still lack the efficiency in terms of time and high false prediction. In this study, we propose a customized rule-based model that offers timely identification for at-risk students. It responds to the student’s performance in the assessment component as it happens. Also, it visualizes the at-risk students to support the educator in recognizing these students and offer timely intervention." The current system did not address the issue of high false prediction, and manually specified parameters by teacher mean that the accuracy will depend on the teacher that uses the system. The system serves as a warning system for the instructors and provides a useful framework for evaluating students. Still, it does not help an instructor to learn the optimal parameters.

Reply: Thank you for your comment. We agree that our model does not provide any clue to find the optimal parameters to minimize the failure rate. However, to address this problem, we need to have access to students’ performance now. This study is retrospective, and we do not have a chance to evaluate the impact of parameters on the final students’ marks. This is one of the future directions, we are planning to work on. We intend to check our system in a real educational environment and propose an algorithm to find the optimal parameters of the model.

  1. Overall:
  • The system does not generate optimal parameters for the users from ground truth. Thus the value of this study is to provide a reasonable framework and tools for instructors to customize.
  • The authors need to prove its value by getting good qualitative or quantitative feedback from the instructors or objective indicators. (for example, better grades from the classes that use this system)
  • As it is structured, the manuscript needs to be modified and extended to address the above two issues.

 

Reply: Thank you for your valuable feedbacks. We tried to address all the suggestions and revise our work to be consistent and meaningful. These two main issues are taken into our consideration, we were able to validate the linear relationship between the number of remedial actions and the total grade which was assessed with Pearson correlation coefficient. The calculated value of −0.803 is statistically significant (p=1.54 e−50). Therefore, the proposed customized model can be considered and used as an effective warning system to identify the students at risk at early stages. Regarding the good feedback, we totally agree with this proposition, and we are planning to conduct a case study or satisfactory survey to prove the system's usefulness. Due to the limitation of time, we are planning to extend this work further by adding new objectives such as building ML predictive models-based reinforcement learning and the instructor’s feedback will be considered as well.

Round 2

Reviewer 1 Report

All of my comments have been considered in this version.

Author Response

All of my comments have been considered in this version.

Thank you for your comments and for accepting our responses.

Reviewer 3 Report

1. Writing

  • Some sentences seem out of place, such as: "Because cyber warfare can target anyone who is feeling unattended and ruin the peace and image of the country" (line 18). It is okay if it is a citation from another study, just a bit out of place for me.
  • The abstract can be improved to include some changes in this revision.

2. Main text

  • (line 305) "The non-parametric version of ANOVA, the Kruskal-Wallis test also partially supports our hypothesis, showing that the 306 population medians of gender groups are not equal (p < 0.05) for the final exam grade." I did not see any hypothesis stated in this study. You may want to revise this sentence or add hypothesis statements along with a theoretical justification to it, but I do not think this is the authors' intent.
  • (table 1) It seems that Table 1 is the result of data pre-processing, after the handling of missing data, and I suggest moving the table and related description to the latter part of pre-processing. Also, Symbols such as i, while I can see denote the i-th checkpoint, were never mentioned in the manuscript. In Table 1 there are only m and n. The symbols in Table 1 and their relationship to the symbols mentioned in the text need to be linked more to reduce confusion. In the current form of a manuscript, a reader will need to read through to line 330 before he can try to guess what those symbols meant.
  • Section 5 was added to justify the model's usefulness. If I understand it correctly, the # of RF and invoked RAs depends on a series of parameters defined by the instructor. In short, different setting of parameters can lead to different results. A large number of combination of parameters would be needed to make the experiment a bit more convincing. After all, if the parameters chose were bad then the result will be bad.
  • Also for section 5, the system really does not predict things. On the other hand, it can serve as a tool for an instructor to tailor to his need. Therefore the authors are still suggested to use other quantitative or qualitative ways to prove the system's usefulness.

Author Response

  1. Writing
  • Some sentences seem out of place, such as: "Because cyber warfare can target anyone who is feeling unattended and ruin the peace and image of the country" (line 18). It is okay if it is a citation from another study, just a bit out of place for me.

Reply: Thank you for your feedback, your comment is highly appreciated. This sentence has been deleted and further updates have been done to respond to other comments below. Please refer to updated text I the abstract and the introduction, the new changes are highlighted in blue.

 

  • The abstract can be improved to include some changes in this revision.

 

Reply: Thank you for this comment, the abstract has been updated now and highlighted in blue.

 

  1. Main text
  • (line 305) "The non-parametric version of ANOVA, the Kruskal-Wallis test also partially supports our hypothesis, showing that the 306 population medians of gender groups are not equal (p < 0.05) for the final exam grade." I did not see any hypothesis stated in this study. You may want to revise this sentence or add hypothesis statements along with a theoretical justification to it, but I do not think this is the authors' intent.

 

Reply: We highly appreciate your comment. We intended to check if there are differences in the performance between sexes, as it is mentioned in line 301 (“Presumably, male and female students have different performance levels concerning the type of assessment”). For this reason, we observed the grades distribution among females and males. Additionally, we applied Kruskal-Wallis test to support our hypothesis we have tested and that was: final grades for males and females come from the same distribution, or the medians of male and female students’ final grades are equal. The results of the test shows, that there are significant differences among sex groups (p<0.05). We rewrote the mentioned paragraph to make it clear to the reader in line 317-321, page 8.

  • (Table 1) It seems that Table 1 is the result of data pre-processing, after the handling of missing data, and I suggest moving the table and related description to the latter part of pre-processing. Also, Symbols such as i, while I can see denote the i-th checkpoint, were never mentioned in the manuscript. In Table 1 there are only m and n. The symbols in Table 1 and their relationship to the symbols mentioned in the text need to be linked more to reduce confusion. In the current form of a manuscript, a reader will need to read through to line 330 before he can try to guess what those symbols meant.

Reply: Thank you for this valid comment, highly appreciated. We moved Table 1 to the pre-processing section. We also clearly specified the meaning of m and n variables used in the Table 1, which shows the typical structure of the data. As for the symbol “i”, please note that this is used as index in mathematical notation only throughout the manuscript. The lines 274-282 in page 6 are updated according to your comment to make it easier and clearer for the reader to follow these notations.

 

  • Section 5 was added to justify the model's usefulness. If I understand it correctly, the # of RF and invoked RAs depends on a series of parameters defined by the instructor. In short, different setting of parameters can lead to different results. A large number of combination of parameters would be needed to make the experiment a bit more convincing. After all, if the parameters chose were bad then the result will be bad.

 

Reply: Thank you and we fully agree with this valuable comment highly appreciate. As highlighted in the paper, we propose a customized warning system to identify at-risk students as early as possible. The proposed model has several hyperparameters that can be tuned, such as threshold value (70% of grade), and weight vector W (weights of the checkpoints).  The choice of the parameters may allow instructors creating a system that can be less or more sensitive to the changes in students’ performance. It would be good to find the optimal set of the hyperparameters of the model, however, it can be done in real settings and must be adapted to the course. In our case we are working with retrospective data, so it is not possible. We highlighted this issue as our future direction.

 

  • Also, for section 5, the system really does not predict things. On the other hand, it can serve as a tool for an instructor to tailor to his need. Therefore, the authors are still suggested to use other quantitative or qualitative ways to prove the system's usefulness.

 

Reply: Thank you again for your comment. We totally agree with this comment, however, it is exceedingly difficult to prove the system’s usefulness without using it in a real-time setting. This is also highlighted as our future direction. In the revised version of the manuscript, we use quantitative analysis and statistical methods to validate the usefulness of the proposed framework.  First, we assess the distribution of the total grade values vis-à-vis the number of remedial actions invoked (based on the proposed values of hyperparameters). Figure 8 shows the tendency that the greater number of remedial actions corresponds to the lower total grades. Second, we assessed the linear relationship between the number of remedial actions and the total grade with Pearson correlation coefficient. The calculated value of $-0.803$ is statistically significant ($p=1.54e-50$). Therefore, the proposed customized model can be considered and used as an effective warning system to identify the students at risk at early stages.

Round 3

Reviewer 3 Report

Most issues raised have been addressed.
The current system has moderate value in assisting an instructor.
More objective assessments can really boost the value of this study.

Back to TopTop