Predicting Student Performance Using Clickstream Data and Machine Learning
Round 1
Reviewer 1 Report
The title as well as the introduction raised expectations about your manuscript and research. The topic you are addressing would be a relevant addition to existing literature. Thank you for this valuable contribution. I will structure my feedback in (a) general remarks (these comments cover feedback applicable in the entire manuscript), and (b) specific remarks (feedback on sentence and/or word level). The specific remarks can include a quote from your original manuscript to refer to a specific section. The specific remarks will refer to page (emphasis added in boldface; e.g., 1.15/16) and row(s; e.g., 11.15/16).
General remarks:
The overall manuscript is neat and written concisely—with relevant information for existing literature. It was a pleasant read with a sufficient quality academic writing. Keep up the good work. This journal welcomes this high-quality work approach.
Specific remarks:
p.1.5 “this research” = this study.
p.1.12/13 This is rather general. Can you be more specific? You use specific words such as “important” and “significant insights” and this creates expectations, but your statement in the abstract empty.
p.1.20 Quote requires a page number. The word “strongly” is redundant.
p.1.28/29 This is repetition. Rephrase. Why is it so significant?
p.1.32/32 I find the example not suitable here. I assume pass/fail predictions are labeled as the easiest of predicting student behaviour? As a result, this does not fully represent the potential of LA used for predicting student behaviours.
p.2.73–75 Structure it with (a), (b), (c) to guide the reader more.
p.4.190 The hyphen needs to be replaced by an em dash.
p.Tables I would not use capital letters in the column titles. Moreover, the colours in the tables aren’t according to a reference format. Hence, look for a different wait of emphasizing those statistics.
p.References Sometimes you use a hyphen to separate the page numbers, whereas other times you are an en dash. Keep it consistent (and correct; = en dash). Moreover, in some reference you mention “pp.” before the page numbers (and sometimes you do not).
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
The authors of this manuscript examine the potential of clickstream data to predict student performance. Student performance prediction is a sub-topic of Learning Analytics (LA) and Educational Data Mining (EDM). With the ideas proposed in this manuscript, the authors aim to identify at-risk students such that proper learning support could be provided to them in time. In their experiments, multiple predictive models were trained and analyzed from click data aggregation utilizing different machine learning algorithms (LR, k-NN, RF, GBT, 1D-CNN and LSTM).
Strengths:
1. A large sample of student data is used (N=5341).
2. Solid quantitative research work is carried out and presented.
3. The manuscript is well written.
4. Using a number of machine learning algorithms.
5. Recent and adequate references are used.
6. The combination of ideas from different disciplines (machine learning, data mining and education) is very interesting.
7. The provided implications for learning and teaching are realistic and useful.
Weaknesses:
1. The contributions of the manuscript are not provided early on.
2. The organization of the manuscript is not provided at all.
3. The various machine learning algorithms are not described at all. References to each of them is merely provided.
Overall, this is good scholarly work. I would recommend the authors address the issues in the weaknesses.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
In this manuscript, the authors propose a work on the predictive analysis of student performance.
Recently, many researchers have used usage data collected from learning management systems to predict student performance.
This paper explores the potential of clickstream data for such tasks.
Five thousand three hundred forty-one sample students and their clickstream behavior data from the Open University Learning Analytics Dataset (OULAD) were used.
In this study, the raw clickstream data are transformed, integrating the time and activity dimensions of students' click actions. Two feature sets and experiments are extracted to compare deep learning algorithms (including LSTM and 1D-CNN) with traditional machine learning approaches with different feature sets.
The best of the proposed algorithms was that the LSTM outperformed the other approaches on several evaluation metrics, with up to 90.25 percent accuracy and precision.
The essential features identified by the best feature set in the LSTM model provide importantly, identified by the best feature set in the LSTM model provide significant insights for improving teaching and learning.
Strengths:
The paper is well written. Section 2 tells the overview of this research environment very well.
In section 4, the results are presented very well. I appreciate Tables 3-4 and 5, where repetitive testing is done to make sure that the experiments make sense and it is not pure randomness.
The discussions support the scientific slant of the paper, and the conclusions support what the authors are saying.
Weaknesses
There are very few.
The interesting question is, why did you choose that benchmark dataset?
Why weren't models used with a transformer underneath? (These are great for pattern recognition)
Finally, I recommend adding this work to the background (https://www.mdpi.com/1999-5903/14/1/10). Here you define the nature of statistical learners and make basic comparisons to the human learning method.
Author Response
Please see the attachment.
Author Response File: Author Response.docx