1. Introduction
Online education platforms, such as Intelligent Tutoring Systems (ITSs) and Massive Open Online Courses (MOOCs), have been rapidly developed in recent years. The volume of student–exercise interaction data on these platforms is growing, encompassing various disciplines [
1,
2]. To better collect and exploit these data, intelligent education systems are built to facilitate personalized learning and intelligent exercise recommendations [
3,
4]. However, effectively developing and utilizing these data has become a bottleneck when it comes to advancing the intelligent education system. This study primarily focus on the technologies used to leverage these data.
In intelligent education systems, student–exercise interaction data refers to the sequences of students’ historical exercise records. These data include information about the exercises that students have previously worked on, as well as their performance and response score on those exercises. Analyzing these data will be beneficial for some applications, such as cognitive diagnosis for judging students’ cognitive ability [
5], providing early warning to students [
6], adapting learning content to suit students’ abilities dynamically [
7], and recommending suitable exercises to students [
8]. In these applications, knowledge tracing (KT) is an fundamental source of technical support.
Knowledge tracing predicts students’ future exercise performance by monitoring and analyzing their evolving knowledge states during their exercise-solving process [
9]. For example,
Figure 1 shows a student practicing English reading comprehension. Given the student’s historical exercise-solving process in
, the knowledge tracing task predicts the student’s performance in the next exercise, e.g.,
.
The principles of knowledge tracing are based on the observations that students with similar proficiency levels for specific knowledge points tend to perform similarly on corresponding exercises in student–exercise interaction data. Additionally, an individual student will perform consistently when practicing similar exercises. By analyzing student–exercise interaction data, knowledge tracing can obtain students’ mastery of the knowledge points involved in the exercises. Based on the information about students’ mastery of the knowledge points and the correlation between exercises, knowledge tracing can predict the students’ performance in practicing exercises in the future, intelligently recommend exercises, and provide personalized learning suggestions.
The existing methods of knowledge tracking can be divided into two categories according to the types of used information, specifically the connections of student–student or exercise–exercise. For instance, the factor analysis methods [
10,
11,
12] utilize similar relationships between students to track their knowledge status, while the deep learning methods [
13,
14,
15,
16,
17] adopt the relationships between exercises by embedding the knowledge point information. These deep learning methods perform poorly on open-domain datasets, although they have achieved good results on public datasets, as most of these public datasets are closed-domain datasets in which the knowledge points implied in the exercises have occurred in the training set. However, in real-world applications, there are newly added exercises containing new knowledge points without annotations, such as new English reading comprehension articles.
No study focuses on these newly added exercises, but the cold start problem is similar and has been studied [
18,
19]. The cold start problem indicates the difficulty of prediction caused by too little training data when the system is first established. The open domain dataset problem focuses more on situations where the test data are not labeled and are not contained within the training set. In addition, although the cold start problem also faces the problem of predicting new exercises, these methods are faced with mathematical practice datasets, which are unsuitable for open-domain English reading comprehension problems.
In summary, the research goals of this study are as follows:
To solve the problem of knowledge tracking in open-domain data. In applications, new exercises are constantly added to the exercise bank. The newly added exercises have no labeled knowledge point information, so determining how to predict students’ performance on new exercises is a difficult problem. One of the goals of this study is to address this problem.
In this study, we aim to solve the problem of knowledge tracking in English reading comprehension data. Compared with the public mathematics education data, the English reading comprehension data contain a lot of exercise context and multiple questions in an exercise context. Determining how to use these features to improve the performance of knowledge tracking tasks is a problem, which is also one of the goals of this study.
Another aim of this study is to take advantage of students’ answer information. Existing research methods only focus on students’ performance—that is, whether students answered an exercise correctly or incorrectly—while students’ answer information will also benefit the knowledge tracking task. Incorporating students’ answer information can distinguish students’ exercise records with different reasons for mistakes. Therefore, determining how to use students’ answer information to improve the performance of knowledge tracking models is another important consideration. This is also the supporting research objective of this study.
3. Materials and Methods
This section first introduces the formal definition of knowledge tracing in reading comprehension, and then the proposed Exercise Semantic embedding for Knowledge Tracing (ESKT) is explained, containing two stages: Stage I: fine-tune Pre-trained Language Model (PLM) and Stage II: Knowledge Tracing with an Answer Encoder and Multiple Questions Attention Mechanism (KTAM), as illustrated in
Figure 2. In
Figure 2, the left side shows the process of fine-tuning PLM on an external dataset, and the right side explains KTAM. KTAM consists of three encoders—the exercise encoder, the response score encoder, and the answer encoder—along with a single decoder. These components internally utilize the Multiple Questions Attention Mechanism to enhance their functionality. Finally, this section introduces strategies for training with auxiliary tasks.
The knowledge tracking task in this study is built upon the Transformer [
24] architecture. Previous studies, such as SAINT [
13] and AKT [
14], have demonstrated that Transformer-based knowledge tracking models outperform those relying on Bayesian networks [
21,
22] or recurrent neural networks [
16,
17]. Drawing inspiration from state-of-the-art knowledge tracking algorithms like SAINT and AKT, the exercise semantic embedding for knowledge tracing (ESKT) framework is proposed to address this research’s goals and objectives. The specific connection between the exercise semantic embedding for knowledge tracing (ESKT) framework and our research goals is as follows:
To solve the problem of knowledge tracking in open-domain data, the exercise semantic embedding for knowledge tracing (ESKT) framework incorporates a pre-trained language model to generate exercise semantic embedding in Stage I: fine-tune Pre-trained Language Model (PLM). Even when encountering a new exercise, this exercise can be represented by the exercise context.
To solve the problem of knowledge tracking in English reading comprehension data, the encoder and decoder in the exercise semantic embedding for knowledge tracing (ESKT) framework incorporate the proposed Multiple Questions Attention Mechanism, which leverages multiple questions within a single exercise context in Stage II: Knowledge Tracing with an Answer Encoder and Multiple Questions Attention Mechanism (KTAM). Additionally, the long exercise context is embedded in Stage I: fine-tune Pre-trained Language Model (PLM).
To take advantage of students’ answer information, the Answer Encoder in the exercise semantic embedding for knowledge tracing (ESKT) framework is proposed to encode students’ answer information in Stage II: Knowledge Tracing with an Answer Encoder and Multiple Questions Attention Mechanism (KTAM), which enriches the features of input data and improves the performance of knowledge tracking.
3.1. Task Definition of Knowledge Tracing
Let be the students set, in which each student is represented by , which contains the basic information of a student. Let denote the English multiple-choice reading comprehension exercise set. An exercise is a tuple , where is the context of reading comprehension exercise, is the question text, stands for the four option texts of , is the question type of , denotes the exercise–question index, and is the true option. An exercise record sequence R of a student s is defined as , where is an exercise solved by student s at time step i, is the choice of student s in solving , and is the student’s response score; if , otherwise, .
Given a student and their exercise record sequence , knowledge tracing is used to predict the student’s performance response score on the exercise .
3.3. Stage II: Knowledge Tracing with Answer Encoder and Multiple Questions Attention Mechanism (KTAM)
Given a student’s exercise record sequence and a next exercise , the KTAM predicts the student’s response score on in stage II. There are two specific steps: embedding, and encode–decode.
Embedding: Taking the fine-tuned model
, we extract the features of
i-th exercise record
in a student exercise record sequence, as shown in Equations (
5)–(
7):
where
are embeddings of exercise, the student’s response score and the student’s answer option, respectively;
is a permutation of
, and
is the correct option of
, and the others are incorrect options;
are obtained by Equation (
3);
is a trainable matrix;
is a permutation of
;
is the answering option in which this student practices exercise
and selects
; and
are other options; and ⊕ is a vector concatenation.
Encode–decode: In an encoder, the obtained embeddings
and
are encoded by Equations (
8)–(
10):
where
,
,
are
dimensions’ hidden vectors, and
,
,
represent the exercise encoder, response score encoder, and answer encoder, correspondingly.
stands for multiple question attention mechanism, defined by Equation (
11):
where
refers to the key, query, and value of attention input, ‘⊕’ is a vector concatenation operation, and
represents a general term for attention.
stands for upper triangular mask attention and
is defined by Equation (
12):
In the decoder, the answer option and response score hidden vector
,
are fused to obtain student answer hidden vectors
. Then,
,
are decoded to hidden vector
, which is used to predict the future response score
of
, as shown in Equations (
13)–(
15):
where
and
are the 2
dimension vectors, and
refers to a hidden vector of knowledge state of student
s in time
while
ranges 0 to 1 refer to the predicting response score of the student
s answering exercise
e,
is defined in Equations (
11) and (
12), and
is a fully connected network.
4. Experiment Results and Analysis
To verify the performance of the proposed Exercise Semantic embedding for Knowledge Tracing (ESKT), ESKT was compared with SOTA methods AKT on the Zhixue dataset. To verify the effectiveness of the PLM, Answer Encoder, and Multiple Questions Attention Mechanism, ablation experiments were conducted. And to examine the adaptability of ESKT, exercise semantic embedding was applied for different downstream knowledge tracking models.
In addition, this section also introduces the Zhixue dataset and the experimental settings, and discusses the attention score of the model and the setting of hyperparameters.
4.1. Experiment Setup
Zhixue dataset. Table 1 shows some statistics on the
Zhixue data. The
Zhixue dataset is an open-domain dataset, has a large volume, and is an English reading comprehension dataset which can be used for knowledge tracing. And the characteristics of
Zhixue dataset are detailed as follows:
Open Domain. In the Zhixue dataset, the test set contains two types of open domain data. (1) New exercise data. A new exercise is an exercise that is performed by students in the test set but does not appear in the training set. (2) New exercise sequence data. A new exercise sequence is an exercise sequence that does not appear in the training set.
Special exercise data structures. The Zhixue dataset comprises educational data on English reading comprehension tasks, and most of these public datasets are educational tasks from the math field. Data in the field of mathematics will include knowledge points or concept labels, and general methods will also use the concept to distinguish the exercise, while English reading comprehension data weaken the knowledge points or concepts of exercise. The Zhixue dataset includes context, questions, and option text, which can help with the embedding of the exercise.
The entire dataset was first partitioned into two parts according to the students’ schools and grades in a 7:3 ratio to form the training and test sets. This partition is in line with practical application. In application, the student depicted in the test data often comes from a new school or grade. Thus, the two datasets obtained by this division are heterogeneously distributed, and the test dataset will contain data that have not appeared in the training data set. In the training set, we randomly took out 30% of the data to create the validation set, and the validation set and the training set were equally distributed, and the validation set was used for early stops and parameter tuning during training. After the model was trained, it was evaluated on the test set.
Table 1.
Statistics on the Zhixue dataset.
Statistics | Value |
---|
Exercise record | 5,096,545 |
Students | 266,487 |
# Exercise questions | 281,748 |
Avg. exercise record per student | 19.12 |
Table 2.
Statistics on Zhixue and public datasets.
Dataset | Exercise Record | Sequences | Questions |
---|
Statics2011 | 194,947 | 333 | 1224 |
ASSISTments2009 | 346,860 | 4217 | 26,688 |
Bridge2006 | 3,679,119 | 1146 | 207,856 |
NIPS34 | 1,382,727 | 4918 | 978 |
Zhixue | 5,096,545 | 266,487 | 281,748 |
To our knowledge, we have not yet found a public English reading comprehension knowledge tracing dataset, and the existing math datasets do not meet our research goals, so our experiments can only be conducted on our commercial Zhixue dataset.
Baseline methods. This paper uses the SOTA method AKT and SAINT [
13,
14] as the baseline methods, in which the exercise–question index is used to embed the exercise as general methods [
32].
Evaluation Metric. We use the general knowledge tracing methods [
32] Accuracy (ACC) and Area Under the Curve (AUC) to evaluate the model. To evaluate the performance of the model on an open domain dataset and focus on the performance in new exercises and new exercise sequences, we further introduce Sequence Cold AUC (SCAUC), Sequence Cold ACC (SCACC), Exercise Cold AUC (ECAUC), and Exercise Cold ACC (ECACC) to evaluate the model’s performance on open-domain data, as shown in Equations (
22)–(
25):
where
is the threshold for judging the prediction result,
represents the ordinal function, M represents the positive sample number, and N represents the negative sample number.
AUC and ACC calculate all the exercise prediction results in the test set while ECACC (ECAUC) only calculates the ACC (AUC) of the new exercise in the test set, and SCACC (SCAUC) calculates the ACC (AUC) of new exercise sequences in the test set. We extract all exercises of a student’s exercise record sequence to form exercise sequence . An exercise sequence that does not appear in the training set is called a new exercise sequence and an exercise that was never seen in the training set is called a new exercise.
Training. During PLM fine-tuning, we fine-tune Longformer on the external RACE dataset [
42]. The exercise texts are truncated to 2048 tokens if they exceeded this length, and shorter texts are padded with the special [PAD] token to ensure uniform input length for each batch. For the RACE dataset, an exercise’s context, question text, and option text length typically range from 300 to 600 words, and after tokenization, the maximum token length does not exceed 2048, so the truncation operation is not used in the RACE dataset. The [PAD] character added in the padding operation is a semantically meaningless character, which is just used to process batch inputs of the same length at the same time. We used the BertAdam optimizer to Fine-tune Longformer, and the learning rate was set to
. We ran three epochs with a single NVIDIA Tesla V100 GPU to fine-tune the Longformer for the multi-choice reading comprehension task. This GPU, equipped with 32GB of video memory, is manufactured by NVIDIA Corporation (Santa Clara, CA, USA).
During knowledge tracing, we used the Adam optimizer to train a series of experiments with AKT and SAINT, the basic framework with a batch size of 64 students. Learning rate was set to , and the number of duplicate blocks of each encoder and decoder was set to four, and the number of heads in the transformer was set to eight. The weight loss was set to . To complete the knowledge tracing tasks, we ran up to 100 epochs using a single NVIDIA Tesla V100 GPU. It took 12 h to train the model.
4.5. Attention Visualization
The attention score metrics within the ESKT+T model are visualized in
Figure 3, illustrating how the attention score metric evolves across different layers.
In
Figure 3, the attention scores of ESKT+T are displayed for the last block in the exercise encoder, response score encoder, answer encoder, and decoder. Each image represents an average eight-heads attention score. The top four images show the attention scores for multiple questions mask attention score in the exercise encoder, answer encoder, response score encoder, and decoder, which only capture information from the same exercise context but with different questions. The bottom images display the upper triangular mask attention score, which captures information from historical exercise records.
The attention score metric is a square matrix that serves as an intermediate value within the model’s three encoders and one decoder, reflecting the influence weights within the input sequence.
Taking the exercise encoder’s attention score matrix as an example, we assume its input is
, where
is the embedding of exercise
. The element at position
in this matrix represents the influence weight of exercise
on exercise
in the student’s exercise sequence. To prevent future exercises from affecting the current exercise
, the i-th row of the matrix is masked such that
is only influenced by previous exercises. This masked portion is depicted as the colorless part of the matrix in the
Figure 3. For exercises preceding
, those with greater impact on
are represented by colors closer to the top of the color-changing column (yellow), while those with lesser impact are represented by colors closer to the bottom (black). Since ESKT employs an eight-head attention mechanism, the final block of the exercise encoder generates eight attention score matrices of the same size. After averaging these matrices and visualizing the result, the attention score metric of the exercise encoder is obtained and shown in the lower left part of
Figure 3.
The answer encoder and response score encoder take inputs and , respectively, and their attention score matrices are structured similarly to those of the exercise encoder. The decoder’s input is the combined output of the three encoders, and its attention matrix reflects the interaction weights between the student’s exercise sequence, answer information, and response scores. The method for generating these matrices follows the same approach as described for the exercise encoder.
The four attention score matrices at the top of
Figure 3 are multi-question mask matrices. The uncolored parts signify blocked future information, while the black areas indicate weights reset to 0 for exercises without shared reading context. This means the current exercise is influenced solely by historical sub-questions and remains unaffected by future exercises or those from different contexts.
Author Contributions
Conceptualization, Z.C.; methodology, Z.C.; software, Z.C.; validation, Z.C.; formal analysis, Z.C. and J.L.; investigation, Z.C.; resources, Z.C.; data curation, Z.C.; writing—original draft preparation, Z.C.; writing—review and editing, Z.C. and J.L.; visualization, Z.C.; supervision, J.L.; project administration, J.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.
Acknowledgments
Thanks to Jinlong Li for providing valuable suggestions for revising the language of this article.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Salama, R.; Hinton, T. Online higher education: Current landscape and future trends. J. Furth. High. Educ. 2023, 47, 913–924. [Google Scholar] [CrossRef]
- Ulum, H. The effects of online education on academic success: A meta-analysis study. Educ. Inf. Technol. 2022, 27, 429–450. [Google Scholar] [CrossRef] [PubMed]
- Pei, P.; Raga, R.C., Jr.; Abisado, M. Enhanced personalized learning exercise question recommendation model based on knowledge tracing. Int. J. Adv. Intell. Inform. 2024, 10, 13–26. [Google Scholar] [CrossRef]
- Terzieva, V.; Ivanova, T.; Todorova, K. Personalized Learning in an Intelligent Educational System. In Novel & Intelligent Digital Systems, Proceedings of the 2nd International Conference (NiDS 2022), Athens, Greece, 29–30 September 2022; Springer: Cham, Switzerland, 2022; pp. 13–23. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, T.; Wang, X.; Yu, G.; Li, T. New development of cognitive diagnosis models. Front. Comput. Sci. 2023, 17, 171604. [Google Scholar] [CrossRef]
- Xu, F.; Li, Z.; Yue, J.; Qu, S. A systematic review of educational data mining. In Intelligent Computing, Proceedings of the 2021 Computing Conference, Virtual, 15–16 July 2021; Springer: Cham, Switzerland, 2021; Volume 2, pp. 764–780. [Google Scholar] [CrossRef]
- Khosravi, H.; Sadiq, S.; Gasevic, D. Development and adoption of an adaptive learning system: Reflections and lessons learned. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education, Portland, OR, USA, 11–14 March 2020; pp. 58–64. [Google Scholar] [CrossRef]
- Huang, Z.; Liu, Q.; Zhai, C.; Yin, Y.; Chen, E.; Gao, W.; Hu, G. Exploring multi-objective exercise recommendations in online education systems. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1261–1270. [Google Scholar] [CrossRef]
- Shen, S.; Liu, Q.; Huang, Z.; Zheng, Y.; Yin, M.; Wang, M.; Chen, E. A survey of knowledge tracing: Models, variants, and applications. IEEE Trans. Learn. Technol. 2024, 17, 1898–1919. [Google Scholar] [CrossRef]
- Cen, H.; Koedinger, K.; Junker, B. Learning factors analysis—A general method for cognitive model evaluation and improvement. In Intelligent Tutoring Systems, Proceedings of the International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan, 26–30 June 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 164–175. [Google Scholar] [CrossRef]
- Lavoué, E.; Monterrat, B.; Desmarais, M.; George, S. Adaptive gamification for learning environments. IEEE Trans. Learn. Technol. 2018, 12, 16–28. [Google Scholar] [CrossRef]
- Thai-Nghe, N.; Drumond, L.; Horváth, T.; Krohn-Grimberghe, A.; Nanopoulos, A.; Schmidt-Thieme, L. Factorization techniques for predicting student performance. In Educational Recommender Systems and Technologies: Practices and Challenges; IGI Global: Hershey, PA, USA, 2012; pp. 129–153. [Google Scholar] [CrossRef]
- Choi, Y.; Lee, Y.; Cho, J.; Baek, J.; Kim, B.; Cha, Y.; Shin, D.; Bae, C.; Heo, J. Towards an appropriate query, key, and value computation for knowledge tracing. In Proceedings of the Seventh ACM Conference on Learning @ Scale, Virtual, 12–14 August 2020; pp. 341–344. [Google Scholar] [CrossRef]
- Ghosh, A.; Heffernan, N.; Lan, A.S. Context-aware attentive knowledge tracing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 2330–2339. [Google Scholar] [CrossRef]
- Pandey, S.; Karypis, G. A self-attentive model for knowledge tracing. arXiv 2019, arXiv:1907.06837. Available online: https://arxiv.org/pdf/1907.06837 (accessed on 8 March 2025).
- Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep knowledge tracing. Adv. Neural Inf. Process. Syst. 2015, 28. Available online: https://proceedings.neurips.cc/paper/2015/file/bac9162b47c56fc8a4d2a519803d51b3-Paper.pdf (accessed on 8 March 2025).
- Yeung, C.K.; Yeung, D.Y. Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In Proceedings of the Fifth Annual ACM Conference on Learning @ Scale, London, UK, 26–28 June 2018; pp. 1–10. [Google Scholar] [CrossRef]
- Liu, Q.; Huang, Z.; Yin, Y.; Chen, E.; Xiong, H.; Su, Y.; Hu, G. Ekt: Exercise-aware knowledge tracing for student performance prediction. IEEE Trans. Knowl. Data Eng. 2019, 33, 100–115. [Google Scholar] [CrossRef]
- Su, Y.; Liu, Q.; Liu, Q.; Huang, Z.; Yin, Y.; Chen, E.; Ding, C.; Wei, S.; Hu, G. Exercise-enhanced sequential modeling for student performance prediction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
- Corbett, A.T.; Anderson, J.R. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model. User-Adapt. Interact. 1994, 4, 253–278. [Google Scholar] [CrossRef]
- Käser, T.; Klingler, S.; Schwing, A.G.; Gross, M. Dynamic Bayesian networks for student modeling. IEEE Trans. Learn. Technol. 2017, 10, 450–462. [Google Scholar] [CrossRef]
- Yudelson, M.V.; Koedinger, K.R.; Gordon, G.J. Individualized bayesian knowledge tracing models. In Artificial Intelligence in Education: Proceedings of the 16th International Conference, AIED 2013, Memphis, TN, USA, 9–13 July 2013; Proceedings 16; Springer: Berlin/Heidelberg, Germany, 2013; pp. 171–180. [Google Scholar] [CrossRef]
- Zhang, J.; Shi, X.; King, I.; Yeung, D.Y. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 765–774. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Available online: https://dl.acm.org/doi/10.5555/3295222.3295349 (accessed on 8 March 2025).
- Shin, D.; Shim, Y.; Yu, H.; Lee, S.; Kim, B.; Choi, Y. Saint+: Integrating temporal features for ednet correctness prediction. In Proceedings of the LAK21: 11th International Learning Analytics and Knowledge Conference, Irvine, CA, USA, 12–16 April 2021; pp. 490–496. [Google Scholar] [CrossRef]
- Lee, U.; Park, Y.; Kim, Y.; Choi, S.; Kim, H. Monacobert: Monotonic attention based convbert for knowledge tracing. In Generative Intelligence and Intelligent Tutoring Systems: Proceedings of the International Conference on Intelligent Tutoring Systems; Springer: Cham, Switzerland, 2024; pp. 107–123. [Google Scholar] [CrossRef]
- Jiang, Z.H.; Yu, W.; Zhou, D.; Chen, Y.; Feng, J.; Yan, S. Convbert: Improving bert with span-based dynamic convolution. Adv. Neural Inf. Process. Syst. 2020, 33, 12837–12848. Available online: https://arxiv.org/pdf/2008.02496 (accessed on 8 March 2025).
- Liu, Y.; Yang, Y.; Chen, X.; Shen, J.; Zhang, H.; Yu, Y. Improving knowledge tracing via pre-training question embeddings. arXiv 2020, arXiv:2012.05031. Available online: https://arxiv.org/pdf/2012.05031 (accessed on 8 March 2025).
- Ghosh, A.; Raspat, J.; Lan, A. Option tracing: Beyond correctness analysis in knowledge tracing. In Artificial Intelligence in Education, Proceedings of the International Conference, Utrecht, The Netherlands, 14–18 June 2021; Springer: Cham, Switzerland, 2021; pp. 137–149. [Google Scholar] [CrossRef]
- Huang, T.; Liang, M.; Yang, H.; Li, Z.; Yu, T.; Hu, S. Context-Aware Knowledge Tracing Integrated with the Exercise Representation and Association in Mathematics. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), Virtual, 29 June–2 July 2021; Available online: https://files.eric.ed.gov/fulltext/ED615528.pdf (accessed on 8 March 2025).
- Liu, Z.; Liu, Q.; Chen, J.; Huang, S.; Luo, W. simpleKT: A simple but tough-to-beat baseline for knowledge tracing. arXiv 2023, arXiv:2302.06881. Available online: https://arxiv.org/pdf/2302.06881 (accessed on 8 March 2025).
- Liu, Z.; Liu, Q.; Chen, J.; Huang, S.; Tang, J.; Luo, W. pyKT: A python library to benchmark deep learning based knowledge tracing models. Adv. Neural Inf. Process. Syst. 2022, 35, 18542–18555. Available online: https://papers.nips.cc/paper_files/paper/2022/file/75ca2b23d9794f02a92449af65a57556-Paper-Datasets_and_Benchmarks.pdf (accessed on 8 March 2025).
- Li, H.; Yu, J.; Ouyang, Y.; Liu, Z.; Rong, W.; Li, J.; Xiong, Z. Explainable few-shot knowledge tracing. arXiv 2024, arXiv:2405.14391. Available online: https://arxiv.org/pdf/2405.14391 (accessed on 8 March 2025).
- Neshaei, S.P.; Davis, R.L.; Hazimeh, A.; Lazarevski, B.; Dillenbourg, P.; Käser, T. Towards Modeling Learner Performance with Large Language Models. arXiv 2024, arXiv:2403.14661. Available online: https://arxiv.org/pdf/2403.14661 (accessed on 8 March 2025).
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The long-document transformer. arXiv 2020, arXiv:2004.05150. Available online: https://arxiv.org/pdf/2004.05150 (accessed on 8 March 2025).
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. Available online: https://arxiv.org/pdf/1810.04805 (accessed on 8 March 2025).
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv 2019, arXiv:1909.11942. Available online: https://arxiv.org/pdf/1909.11942 (accessed on 8 March 2025).
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. Available online: https://arxiv.org/pdf/1907.11692 (accessed on 8 March 2025).
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 2019, 32. Available online: https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf (accessed on 8 March 2025).
- Zhu, P.; Zhang, Z.; Zhao, H.; Li, X. DUMA: Reading comprehension with transposition thinking. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 30, 269–279. Available online: https://arxiv.org/pdf/2001.09415 (accessed on 8 March 2025). [CrossRef]
- Zhang, S.; Zhao, H.; Wu, Y.; Zhang, Z.; Zhou, X.; Zhou, X. Dual co-matching network for multi-choice reading comprehension. arXiv 2019, arXiv:1901.09381. Available online: https://arxiv.org/pdf/1901.09381 (accessed on 8 March 2025). [CrossRef]
- Lai, G.; Xie, Q.; Liu, H.; Yang, Y.; Hovy, E. Race: Large-scale reading comprehension dataset from examinations. arXiv 2017, arXiv:1704.04683. Available online: https://aclanthology.org/D17-1082/ (accessed on 8 March 2025).
- Liu, Z.; Liu, Q.; Chen, J.; Huang, S.; Gao, B.; Luo, W.; Weng, J. Enhancing deep knowledge tracing with auxiliary tasks. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 4178–4187. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).