Next Article in Journal
Design of a Capacitorless DRAM Based on Storage Layer Separated Using Separation Oxide and Polycrystalline Silicon
Previous Article in Journal
Horus: An Effective and Reliable Framework for Code-Reuse Exploits Detection in Data Stream
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

DKT-LCIRT: A Deep Knowledge Tracking Model Integrating Learning Capability and Item Response Theory

1
School of Computer and Information Engineering, Jiangxi Agriculture University, Nanchang 330045, China
2
School of Vocational Teachers, Jiangxi Agriculture University, Nanchang 330045, China
3
School of Information Engineering, Jiangxi Vocational College of Mechanical & Electrical Technology, Nanchang 330045, China
4
Department of Computer, Mathematical and Physical Sciences, Sul Ross State University, Alpine, TX 79830, USA
*
Authors to whom correspondence should be addressed.
Electronics 2022, 11(20), 3364; https://doi.org/10.3390/electronics11203364
Submission received: 23 August 2022 / Revised: 28 September 2022 / Accepted: 17 October 2022 / Published: 18 October 2022
(This article belongs to the Section Artificial Intelligence)

Abstract

:
In the realm of intelligent education, knowledge tracking is a critical study topic. Deep learning-based knowledge tracking models have better predictive performance compared to traditional knowledge tracking models, but the models are less interpretable and also often ignore the intrinsic differences among students (e.g., learning capability, guessing capability, etc.), resulting in a lack of personalization of predictive results. To further reflect the personalized differences among students and enhance the interpretability of the model at the same time, a Deep Knowledge Tracking model integrating Learning Capability and Item Response Theory (DKT-LCIRT) is proposed. The model dynamically calculates students’ learning capability by each time interval and allocates each student to groups with similar learning capabilities to increase the predictive performance of the model. Furthermore, the model introduces item response theory to enhance the interpretability of the model. Substantial experiments on four real datasets were carried out, and the experimental results showed that the DKT-LCIRT model improved the AUC by 3% and the ACC by 2% compared to other models. The results confirmed that the DKT-LCIRT model outperformed other classical models in terms of predictive performance, fully reflecting students’ individualization and adding a more meaningful interpretation to the model.

1. Introduction

As artificial intelligence technology has advanced, online education platforms such as Massive Open Online Courses (MOOC) and Intelligent Tutoring Systems (ITS) have become very good. No matter what time or place, students can learn high-quality courses and improve their learning efficiency through online education systems, but online education systems are only widely used in large cities and are not fully popular in many small and medium-sized cities. Due to the impact of the new coronavirus pneumonia in 2020, offline education is difficult to carry out. The online education system is being accepted by more and more parents and students under the situation of school closure and non-stop learning. The system also stores a large amount of online education data [1]. While conducting online education, how to provide students with personalized guidance has become another major problem. Personalized guidance aims to provide appropriate learning plans based on students’ knowledge state and improve their learning efficiency, and knowledge tracking is the key to solving this problem [2]. Knowledge tracking is the process of tracking student’s knowledge state depending on the sequence of previous interactions and predicting whether they can answer the upcoming exercise correctly. The interaction means the response sequence between the student and the online education system. The system receives the result of the student’s answer, changes the student’s knowledge state according to the result, and then recommends an appropriate exercise. The student gives feedback to the system after answering the exercise, and so on and so forth. Students’ mastery of knowledge points is represented by their knowledge state, as well as by keeping track of students’ knowledge state, and the system can better understand students’ knowledge level and provide appropriate learning plans accordingly [3]. The research on knowledge tracking dates from the late 1970s, and a large number of knowledge tracking models [4] have been proposed in these four decades, including models based on item response theory, Bayesian networks, and deep learning.
The probability of correctly answering the exercises depends mainly on their mastery degree of the knowledge points contained in the exercises. Students gradually improve their knowledge through continuous learning [5], and the speed of improvement of knowledge mastery is influenced by learning capability. Each student’s learning capability is different and can change at any time during the learning process. Classifying students into different groups of similar learning capabilities depending on previous performance in order to provide more personalized instruction for each group of students has been the subject of much research in the field of education [6,7]. Although deep knowledge tracking models outperform traditional knowledge tracking models in terms of predictive performance, basic deep knowledge tracking models also have the following shortcomings: (1) The basic deep knowledge tracking models often ignore the intrinsic differences among students, such as learning capability and guessing capability, and all students have the same intrinsic capability, leading to a lack of personalization. (2) The basic deep knowledge tracking models have been poorly interpretable because of the black box characteristics of deep learning.
To overcome these two shortcomings, this paper proposes a deep knowledge tracking model DKT-LCIRT that integrates learning capability and item response theory. The DKT-LCIRT model is an improvement on the Dynamic Key-Value Memory Network model (DKVMN) [8,9] by adding learning capability features to the input layer of the neural network [10,11] and introducing item response theory to the output layer of the neural network [12]. Feature engineering is a crucial part of deep learning models, and features determine the upper limit of deep learning models, and by constructing new features, more information contained in the data can be mined [13,14], making the data further expressive [15,16,17]. The DKT-LCIRT model extracts potential features from the sequences of students’ historical interactions [18,19], and then uses deep learning neural networks to track the student’s knowledge state, output student capability parameter, and exercise difficulty parameter. Finally, the probability of students answering the exercises correctly is predicted depending on the item response theory.
The primary contributions of this paper are as follows: (1) The DKT-LCIRT model better reflects the individualization of students and enhances the predictive performance by adding students’ learning capability features, dynamically calculating students’ learning capability at each time interval, and assigning students to groups with similar learning capabilities. (2) The DKT-LCIRT model improves the interpretability by introducing item response theory to provide meaningful estimates of students’ capability levels and difficulty levels of the exercises. (3) The study validated the practical effects of adding both learning capability features and item response theory and filled in the gaps of previous research in this area in the literature.
The rest of this paper is organized as follows: The Section 2 reviews relevant research on knowledge tracking. The Section 3 details the specific implementation of the DKT-LCIRT model. The Section 4 describes details of the experimental study. The Section 5 concludes the paper and considers future research prospects.

2. Related Work

First of all, knowledge tracking can be viewed as a supervised sequential learning problem. Suppose there are E students and F exercises in an online education system, and given the sequence of historical interaction u = { ( q 1 , a 1 ) , ( q 2 , a 2 ) , , ( q t , a t ) } between students and the system, where q t represents the exercises done by the students at the moment t and a t represents the correctness of the answers to the exercises. In general, a t = 1 represents correct answers and a t = 0 represents incorrect answers. Knowledge tracking can predict the probability of students correctly answering the upcoming exercise q t + 1 depending on the sequence of historical interactions. Then, three typical types of knowledge tracking models are reviewed here.

2.1. Item Response Theory

Item Response Theory [20] (IRT), which was applied in testing environments in the 1950s, outputs the probability of correctly answering an exercise j on a test depending on the student’s capability level θ and the difficulty level β j of the exercise. If the student’s capability level is high, the probability of correctly answering the exercise is higher; on the other hand, if the difficulty level of the exercise is high, the probability of correctly answering the exercise is lower. At the beginning, the IRT was designed to be used in testing environments. The IRT presumes that students’ capability levels do not change during the testing process and that students’ knowledge states are static, which is a reasonable assumption for testing environments. However, the student’s knowledge state in knowledge tracking changes all the time. Therefore, it cannot be directly applied to knowledge tracking tasks.
To solve this challenge, researchers have combined IRT with deep learning methods, such as the DIRT model proposed by Cheng et al. [21], and the NeuralCD model proposed by Wang et al. [22,23]. Both models use deep neural networks to extract the complicated information included in the data and track their knowledge state while ensuring the interpretability of IRT.

2.2. Bayesian Knowledge Tracking

Bayesian Knowledge Tracking (BKT) was proposed by Corbett and Anderson in the 1990s [24], and the BKT model is probably the first to relax the assumption of a static knowledge state. The static knowledge state assumption is an unreasonable assumption for a learning environment. The student’s knowledge state is represented by the BKT model as a series of binary variables, each of which indicates whether they have mastered a single knowledge point. Each binary variable is refreshed using a Hidden Markov Model (HMM) when the student answers an exercise. To estimate student knowledge point mastery based on the sequences of their historical interaction, the BKT model requires four parameters: the prior probability p ( L ) , which indicates the probability of having mastered the knowledge point at the beginning; the learning probability p ( T ) , which indicates the probability of transferring the knowledge point from the non-mastery state to the mastery state; the guessing probability p ( G ) , which indicates the probability of correctly answering the exercise even if they did not master the knowledge point; and the slipping probability p ( S ) , which indicates the probability of incorrectly answering the exercise even if they mastered the knowledge point. The BKT model divides the mastery degree of each knowledge point into unmastered and mastered and models them separately, ignoring the intermediate mastery level and not considering the relationship between knowledge points.
Over the years, many variants of BKT have been proposed. Baker et al. [25] enhanced model predictive performance by introducing student slip and guess parameters. Yudelson et al. [26] investigated personalization of student learning rate parameters and found better predictive performance.

2.3. Deep Knowledge Tracking

Similar to the BKT model, Deep Knowledge Tracking (DKT) [27] processes the sequences of student’s historical interaction, but takes advantage of neural networks and breaks the limitations of knowledge point separation and binary state assumptions. In the DKT model, the interaction tuples ( q t , a t ) are first converted into input vectors using one-hot coding. Then, the input vector is passed to the hidden layer. Through Long Short-Term Memory (LSTM) [28] generates the hidden state that theoretically summarizes all past information so that the hidden state can be understood as the potential knowledge state resulting from the student’s previous learning process. Finally, the output layer uses the hidden state to generate the output vector, which represents the probability of correctly answering the exercise.
Basic deep knowledge tracking models suffer from poor interpretability, long-term dependency, and few learning features. To address the interpretability problem, Zhang et al. proposed the DKVMN model [29], which draws on Memory Augmented Neural Network (MANN) [30] to model students’ knowledge mastery process using two memory matrices, key and value. The DKVMN model can capture the relationship between different knowledge points while tracking their mastery states. Liu et al. proposed the EKT model [31] to track students’ knowledge states through a Bi-directional Long Short-Term Memory (Bi-LSTM) network with an attention module. The EKT model uses the textual content of the exercises to encode the exercise embeddings so that the exercise embeddings contain information about the textual features of the exercises. To address the long-term dependency problem, Choi et al. proposed the SAINT model [32] based on a Transformer [33], which utilizes an encoder–decoder structure consisting of superimposed attention layers. The SAINT model inputs the exercise embedding to the encoder, and the encoder output and the interaction embedding to the decoder. To address the problem of missing learning features, Wang et al. proposed the DHKT model [34], which models the hierarchical relationships between exercises using the relationships between exercises and knowledge points. Nakagawa et al. proposed the GKT model [35] that transforms the knowledge tracking problem into a time-series node-level classification task in the Graph Neural Network (GNN) by representing the relationships between knowledge points as one directed graph. Yang et al. proposed the GIKT model [36] that merges problem and skill relevance by embedding propagation utilizing the Graph Convolutional Network (GCN). Shen et al. proposed the CKT model [37] that utilizes the Convolutional Neural Network (CNN) to capture learning rate features from students’ interaction histories.

3. Our Proposed DKT-LCIRT Scheme

Basic deep knowledge tracking models can effectively track students’ knowledge states, but the predictive results lack personalization because they often ignore the intrinsic differences among students (e.g., learning capability, guessing capability, etc.). In addition, the interpretability of the basic deep knowledge tracking models is bad. Therefore, this paper proposes a Deep Knowledge Tracking model integrating Learning Capability and Item Response Theory (DKT-LCIRT). The model tracks students’ knowledge state by adding learning capability features, dynamically calculating students’ learning capability at each time interval, and assigning students to groups with similar learning capabilities [38,39]. This relaxes the assumption that all students have the same learning capability, and learning capability remains constant over time, achieving individualization of students, thus enhancing the predictive performance of the model. The model also introduces item response theory to provide meaningful estimates of students’ capability levels and difficulty levels of the exercises, enhancing the interpretability of the model.

3.1. Learning Capability Features

Students learning capability features are extracted by calculating learning capabilities based on all historical performances of students until the start of the next time interval, and then dynamically assigning students to groups with similar learning capabilities through the k-means clustering algorithm [40,41].

3.1.1. Time Interval Division

The time interval is the segment in the interaction sequence where the number of responses is fixed [42], and the student’s learning capability is recalculated after each time interval. As shown in Figure 1, the interaction sequence containing 23 answers was divided into 5-time intervals, and students answered five exercises at each time interval.

3.1.2. Learning Capability Calculation

Students’ learning capability is coded as a vector with the number of elements equal to the number of knowledge points. The difference between the correct and incorrect rates of each knowledge point from students’ previous interactions is converted into elements of the learning capability vector. Students’ learning capability is calculated by Equations (1)–(4):
C o r r e c t ( s m ) 1 : z = t = 1 z ( s m i = = 1 ) N m i ,
I n c o r r e c t ( s m ) 1 : z = t = 1 z ( s m i = = 0 ) N m i ,
R ( s m ) 1 : z = C o r r e c t ( s m ) 1 : z i n c o r r e c t ( s m ) 1 : z ,
d 1 : z i = ( R ( s 1 ) 1 : z , R ( s 2 ) 1 : z , , R ( s m ) 1 : z ) ,
where C o r r e c t ( s m ) 1 : z indicates the proportion of knowledge points s m answered correctly by student i during time intervals 1 to z. I n c o r r e c t ( s m ) 1 : z indicates the proportion of knowledge points s m answered incorrectly by student i. R ( s m ) 1 : z indicates the difference in students’ performance on knowledge points s m . d 1 : z i indicates the student i’s learning capability vector. N m i indicates the total number of times student i answered the knowledge point s m .

3.1.3. K-Means Clustering Grouping

At each time interval, students are assigned to groups c z with similar learning capability by performing k-means clustering on the calculated learning capability d 1 : z i of students. c z is based on all historical performance prior to time interval z. In the k-means clustering training phase, the centroid of the k student groups is identified. Once the centroid is determined, the centroid of the k student groups will not change in the following grouping process. The students are assigned to the nearest student group by Equation (5):
C l u s t e r ( i , z ) = arg min C c = 1 k d 1 : z 1 i C c d 1 : z 1 i μ c 2 ,
where C l u s t e r ( i , z ) represents the student i’s grouping at time interval z. k means that there are k learning capability groupings. d 1 : z 1 i represents the student i’s learning capability at time intervals 1 to z − 1. μ c represents the centroid of group c. The specific process of grouping students’ learning capability is shown in Figure 2.

3.2. Framework of the DKT-LCIRT Model

The framework of the DKT-LCIRT model is shown in Figure 3. The DKT-LCIRT model contains two memory matrices, key and value, in the reading and writing process. The key memory matrix, which is immutable, stores the potential knowledge points of the exercises; the value memory matrix, which is variable, stores the mastery of the knowledge points. The DKT-LCIRT model contains three main steps: the acquisition of correlation weights, the prediction of the probability of correct answers to the exercises, and the update of the student’s knowledge state.

3.2.1. Acquisition of Correlation Weights

q t denotes the exercise done by the student at moment t. Suppose that Q exercises contain N knowledge points. The exercise q t has its own correlation weight vector w t N , which shows how the exercise q t and each knowledge point are related. Firstly, the embedding vector k t d k of the exercise q t is extracted from the exercise Embedding matrix A Q × d k , where d k is the embedding size of a key memory slot. Then, the embedding vector k t is cascaded with the learning capability grouping c z , and finally, the cascaded vector [ k t , c z ] and the key memory matrix M k N × d k are inner-produced to obtain the correlation weight w t by the activation function s o f t m a x , and the correlation weight w t is calculated as shown in Equation (6):
w t = S o f t m a x [ k t , c z ] T M k ( i ) ,
where M k ( i ) is the i-th row-vector of M k .

3.2.2. Prediction of the Probability of Correct Answers to the Exercises

The probability of correctly answering the exercise is predicted by the following process. First, a read vector r t that indicates the student’s mastery of the current exercise q t is output using the correlation weights w t reading the knowledge states in the value memory matrix, as shown in Equation (7):
r t = i = 1 N w t ( i ) M t v ( i ) ,
Then, the reading vectors r t are cascaded with previous response v t 1 and exercise difficulty p d t . In this paper, the difficulty p d t of the exercises is measured in 10 levels [43] to measure, the difficulty is related to the exercises and is not related to the knowledge points contained in the exercises [44]. The difficulty of the exercises is calculated by Equations (8) and (9):
p d ( j ) = δ ( j , 10 ) , if   N j 4 p d , else ,
δ ( j , 10 ) = i N j a i j = = 0 N j 10 ,
in which N j represents the group of students who answered the exercises j . a i j = = 0 represents the result of students’ first answer to the exercises j is wrong. The constant p d represents the difficulty level of the exercises that we wish to keep. δ ( j , 10 ) is a function that maps the error rate of exercise j onto (10) difficulty levels. p d ( j ) represents the difficulty of exercise j . For those exercises answered by less than four students, p d = 5.
The cascade vectors v t 1 , r t , p d t are input to the hidden layer, thus outputting the potential knowledge state h t of the students. As shown in Equation (10):
h t = tanh ( W h x v t 1 , r t , p d t + W h h h t 1 + b h ) ,
where W h x indicates the input weight matrix. W h h indicates the recurrent weight matrix. b h indicates the bias vector, and the hidden layer is activated using the hyperbolic tangent function.
Next, the student potential knowledge state h t and the exercise embedding vector k t are input into two single-layer fully connected neural networks, respectively. The students’ capability levels and difficulty levels of the exercises required by the item response theory are output. According to the role of neural networks, these two neural networks are called student capability network and exercise difficulty network, respectively. Both neural networks use the hyperbolic tangent function as their activation function so that their outputs are scaled to (−1, 1). Student capability θ t j and difficulty β j of the exercises are calculated by Equations (11) and (12):
θ t j = tanh ( W θ h t + b θ ) ,
β j = tanh ( W β k t + b β ) ,
where θ t j represents the capability of students to answer the exercises j at the moment t . β j represents the difficulty of the exercises j .
Finally, the student capability θ t j and the difficulty β j of the exercise are input to the item response function to predict the probability of correctly answering the exercise j . As shown in Equation (13):
p t c z = σ ( 3.0 θ t j β j ) ,
where the output of the student capability network was multiplied by 3.0 [45] so that the value domain was (0, 1). This is because the predicted probability of correctly answering the exercise is σ ( 1 ( 1 ) ) = σ ( 2 ) = 0.881 at the maximum if the student’s capability is not improved.

3.2.3. Update of Students’ Knowledge State

After students answer the exercises, the knowledge state is changed, and the value memory matrix is updated depending on the import tuples ( q t , a t ) and the correlation weights w t . The embedding vector v t is extracted from the exercise response Embedding matrix B , and the embedding vector v t represents the knowledge growth after answering the exercise q t . First, some knowledge is removed using the delete vector e t , which represents the knowledge forgotten by the student. Then, the increase vector a t adds the knowledge growth to the value memory matrix. The student’s knowledge state is updated by Equations (14)–(17):
e t = σ W e v t + b e ,
a t = tanh W a v t + b a ,
M ˜ t + 1 v ( i ) = M t v ( i ) ( 1 w t ( i ) e t ) T ,
M t + 1 v ( i ) = M ˜ t + 1 v ( i ) + w t ( i ) a t T ,
For ease of understanding, we illustrate the DKT-LCIRT model in Algorithm 1.
Algorithm1: The DKT-LCIRT model
Input: interaction sequence u i = { ( q 1 , a 1 ) , ( q 2 , a 2 ) , , ( q t , a t ) } for student i
Output: the probability p t c z of answering the exercise correctly
1: Initialize previous response v and exercise difficulty p d
2: for n = 1 , 2 , , n do
3: the learning capability grouping c z is obtained by Equations (1)–(5)
4: extract the embedding vector k t of the exercise and cascade it with the learning capability grouping c z
5: the relevant weight vector w t of the exercise is obtained by Equation (6)
6: the reading vector r t is obtained by Equation (7)
7: cascade the read vector r t with v t 1 and p d t , and then input to the hidden layer
8: the student’s knowledge state h t is obtained by Equation (10)
9: the student’s knowledge state h t and the embedding vector k t of the exercise are input to the student capability network and the exercise difficulty network
10: output student’s capability θ t j and difficulty β j of the exercises
11: the probability p t c z of answering the exercise correctly is obtained by Equation (13)
12: students’ knowledge states are updated by Equations (14)–(17)
13: end for
14: return

3.3. Optimization of the Model

The parameters that need to be trained for the DKT-LCIRT model are the exercise Embedding matrix A , the exercise response Embedding matrix B , the key memory matrix M k ( i ) , and the weights and biases of neural network. To increase the predictive performance of the model, the model is trained by minimizing the loss function [46]. The loss function is shown in Equation (18):
l = t ( R t log p t c z + ( 1 R t ) log ( 1 p t c z ) ) ,
where p t c z indicates the predicted value, and R t indicates the true value.

4. Performance Analysis

4.1. Datasets

In order to verify the effectiveness of the proposed DKT-LCIRT model, experiments on four publicly available online education datasets were conducted, the specific descriptive information of which is detailed in Table 1. The datasets have a very large data volume and a very diverse data distribution, which can meet the requirements of the experiments well. The experimental results are universally representative and applicable and can be well extended to real-life scenarios. The datasets have good ecological validity.
ASSIST2009 [47]: The dataset, from the ASSISTments intelligent tutoring system, contains 325,637 interactions from 4151 students, covering 26,688 exercises and 110 knowledge points, with a 65.84% correct rate. The correct rate measures the percentage of correct answers to the exercises contained in all interactions in the dataset.
ASSIST2015 [48]: The dataset, also from the ASSISTments intelligent tutoring system, contains 683,801 interactions from 19,840 students, covering 100 knowledge points, with a 73.18% correct rate. In comparison to the ASSIST2009 dataset, although this one has a lot more interactions, the average number of interactions per student is substantially lower because the number of students is also larger.
Synthetic [27]: The dataset simulates 100,000 interactions of 2000 virtual students, each answering the same 50 exercises, which cover five knowledge points, with a 58.83% correct rate.
Statics2011 [49]: The dataset, from a university course on engineering mechanics, contains 189,927 interactions from 333 students, covering 1223 exercises and 156 knowledge points, with a 76.54% correct rate.

4.2. Experimental Setup

The model was implemented on a PC running Windows 10 and equipped with an Intel Core i5-5200U CPU using the Python language and TensorFlow framework. In the experiments, the time interval was set to 20 interactions, and the learning capability was grouped into 8. Predictions were performed using hv-block cross-validation for all datasets. The hv-block cross-validation method is consistent for general smooth observations [50]. The loss function was minimized using a mini-batch stochastic gradient descent algorithm to speed up the training. The model was trained using a batch size of 32 and a learning rate of 0.01, and a dropout was used to prevent model over-fitting.

4.3. Evaluation Index

In this paper, the predictive performance of each model is evaluated using the average ACC and average AUC indexes. ACC is the accuracy rate, which indicates the percentage of correct predictions to all predictions. ACC is calculated by Equation (19):
A C C = T N + T P F N + T N + F P + T P ,
in which T N represents negative samples correctly predicted. T P represents positive samples correctly predicted. F N represents negative samples incorrectly predicted. F P represents positive samples incorrectly predicted. AUC is the area enclosed by the lower coordinate axis and the ROC curve. The AUC is calculated by Equations (20) and (21):
F P R = F P F P + T N ,
T P R = T P T P + F N ,
where F P R indicates the horizontal coordinate of the ROC curve, and T P R indicates the vertical coordinate of the ROC curve. The AUC value equal to 0.5 means that the predictive performance of the model is equivalent to a random guess. The predictive performance of the model is positively correlated with the AUC and ACC values.

4.4. Experimental Results and Analysis

4.4.1. Validity of the DKT-LCIRT Model

To demonstrate the effectiveness of the DKT-LCIRT model, the DKT-LCIRT model was compared with the DKT and DKVMN models. The comparative outcome of the average AUC and ACC values of the three models tested on four publicly available datasets is shown in Table 2.
On the ASSIST2009 dataset, the AUC of the DKT and DKVMN models are 0.823 and 0.825, respectively, and the AUC of the DKT-LCIRT model is 0.852, which is an improvement of about 3%. The ACC of the DKT and DKVMN models are 0.768 and 0.771, respectively, and the DKT-LCIRT model has an ACC of 0.785, which is an improvement of about 2%. On the ASSIST2015 dataset, the AUC of the DKT and DKVMN models are 0.725 and 0.730, respectively, and the AUC of the DKT-LCIRT model is 0.764, which is an improvement of about 4%. The ACC of the DKT and DKVMN models are 0.735 and 0.736, respectively, and the ACC of the DKT-LCIRT model is 0.749, which is improved by about 1%. On the Synthetic dataset, the AUC of the DKT and DKVMN models are 0.804 and 0.799, respectively, and the AUC of the DKT-LCIRT model is 0.825, which is about a 2% improvement. The ACC of the DKT and DKVMN models are 0.752 and 0.754, respectively, and the ACC of the DKT-LCIRT model is 0.775, which is about a 2% improvement. On the Statics2011 dataset, the AUC of the DKT and DKVMN models are 0.794 and 0.797, respectively, and the AUC of the DKT-LCIRT model is 0.819, which is improved by about 2%. The ACC of the DKT and DKVMN models is 0.751 and 0.754, respectively, and the ACC of the DKT-LCIRT model is 0.773, which is improved by about 2%. The results confirm that the DKT-LCIRT model outperforms the DKT and DKVMN models on all datasets, while the DKT and DKVMN models have about the same predictive performance.

4.4.2. Validity of Learning Capability Features and Item Response Theory

To demonstrate the effectiveness of adding learning capability features and introducing item response theory, the DKT-LCIRT model was compared with the DKT-LC model, which only adds learning capability features, and the DKT-IRT model, which only introduces item response theory. The comparative outcome of the average AUC and ACC values of the three models tested on four publicly available datasets is shown in Table 3.
The results showed that the DKT-LCIRT model has similar predictive performance and greater interpretability compared to the DKT-LC model only the learning capability features were added. Because both added learning capability features, the DKT-LCIRT model introduced item response theory to make meaningful estimates of student capability levels and exercise difficulty levels. The DKT-LCIRT model had improved prediction performance and the same interpretability compared to the DKT-IRT model only introduced item response theory. Because the DKT-LCIRT model adds learning capability features to track students’ knowledge states, both introduce item response theory.

4.4.3. Avoid Over-Fitting

If the model initially performs well on the training dataset but not so well on the test dataset, this indicates that the model is over-fitting [51]. To demonstrate that the DKT-LCIRT model can avoid over-fitting better, this paper compares the training and validation AUC of the DKT, DKVMN, and DKT-LCIRT models during training on four publicly available datasets. The comparative outcome is shown in Figure 4.
As can be seen from Figure 4, the DKT model shows over-fitting on all datasets, and its training AUC values and validation AUC values gradually appear to be more different. The DKVMN model does not show over-fitting on the ASSIST2009, ASSIST2015, and Synthetic datasets, but on the Statics2011 dataset, its training AUC values and validation AUC values gradually appear to be more different after 13 epochs. The DKT-LCIRT model performs better in avoiding over-fitting and maintains similar training and validation AUC values on all datasets.

5. Conclusions and Future Work

This paper proposes a deep knowledge tracking model DKT-LCIRT that integrates learning capability and item response theory. The model dynamically calculates students’ learning capability by each time interval and allocates each student to groups with similar learning capability to track students’ knowledge state. Finally, the model combines item response theory to estimate students’ probability of correctly answering the exercises. This reflects students’ individualization and improves the predictive performance of the model, and also increases the interpretability of the model. Extensive experiments on four publicly available datasets were carried out. The results confirmed that the predictive performance of the DKT-LCIRT model was raised compared to the classical knowledge tracking models DKT and DKVMN. The model parameters were meaningfully interpreted. The over-fitting problem could also be well avoided. This fully proved the practicality and effectiveness of the DKT-LCIRT model.
The exercises in the dataset used in this paper cover fewer knowledge points, and the DKT-LCIRT model can effectively track the students’ real knowledge state, but the effect of knowledge tracking for multi-knowledge point exercises is unknown. In future research, the model can be further improved for application to knowledge tracking where the exercises cover relatively more knowledge points. The model improves robustness and generalization performance through adversarial training and combines with recommendation algorithms to achieve personalized exercise recommendations. This provides adaptive learning support for students. In addition, how can offline learning be grounded in the real world to help achieve personalized learning path recommendations? This is one of the future research directions.

Author Contributions

Conceptualization, G.L. and J.S.; methodology, G.L., J.S. and Y.H.; validation, Y.Z., Y.W. and T.Y.; writing—original draft preparation, J.S. and G.L.; writing—review and editing, G.L., J.S., Y.H., Y.Z., Y.W., T.Y. and N.X.; supervision, Y.Z. and T.Y.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (#62041702); Ministry of Education, Humanities, and Social Sciences Project (#20YJA870010); Jiangxi Provincial Social Science Planning Project (#19TQ05); Key project of Education Science planning in Jiangxi Province (#19ZD024); Jiangxi University Humanities and Social Science Planning Project (#TQ20105); Basic Education Research Project of Jiangxi Province (#SZUNDZH2021-1143); and Jiangxi Province Degree and Postgraduate Education and Teaching Reform Research Project (#JXYJG-2020-075).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

Abbreviations of Professional Terms
AbbreviationFull Name
DKT-LCIRTDeep Knowledge Tracking model integrating Learning Capability and Item Response Theory
MOOCMassive Open Online Courses
ITSIntelligent Tutoring Systems
DKVMNThe Dynamic Key-Value Memory Network model
IRTItem Response Theory
BKTBayesian Knowledge Tracking
HMMHidden Markov Model
DKTDeep Knowledge Tracking
LSTMLong Short-Term Memory
MANNMemory Augmented Neural Network
Bi-LSTMBi-directional Long Short-Term Memory
GNNGraph Neural Network
GCNGraph Convolutional Network
CNNConvolutional Neural Network
Meaning of the Main Variables
VariablesMeaning
C o r r e c t ( s m ) 1 : z the proportion of knowledge points sm answered correctly by student i during time intervals 1 to z
I n c o r r e c t ( s m ) 1 : z the proportion of knowledge points sm answered incorrectly by student i
R ( s m ) 1 : z the difference in students’ performance on knowledge points sm
d 1 : z i the student i’s learning capability vector at time intervals 1 to z
N m i the total number of times student i answered the knowledge point sm
C l u s t e r ( i , z ) the student i’s grouping at time interval z
μ c the centroid of group c
W h x the input weight matrix
W h h the recurrent weight matrix
b h the bias vector
θ t j the capability of students to answer the exercises j at the moment t
β j the difficulty of the exercises j
p t c z the predicted value
R t the true value

References

  1. Cheng, H.; Xie, Z.; Shi, Y.; Xiong, N. Multi-step data prediction in wireless sensor networks based on one-dimensional CNN and bidirectional LSTM. IEEE Access 2019, 7, 117883–117896. [Google Scholar] [CrossRef]
  2. Chen, E.; Liu, Q.; Wang, S.; Huang, Z.; Su, Y.; Ding, P.; Ma, J.; Zhu, B. Key techniques and applications for Intelligent Education Oriented Adaptive Learning. CAAI Trans. Intell. Syst. 2021, 16, 886–898. [Google Scholar]
  3. Guo, W.; Xiong, N.; Chao, H.C.; Hussain, S.; Chen, G. Design and analysis of self-adapted task scheduling strategies in wireless sensor networks. Sensors 2011, 11, 6533–6554. [Google Scholar] [CrossRef] [Green Version]
  4. Liu, T.; Chen, W.; Chang, L.; Gu, T. Research Advances in the Knowledge Tracing Based on Deep Learning. J. Comput. Res. Dev. 2022, 59, 81. [Google Scholar]
  5. Li, R.; Yin, Y.; Dai, L.; Shen, S.; Lin, X.; Su, Y.; Chen, E. PST: Measuring Skill Proficiency in Programming Exercise Process via Programming Skill Tracing. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 2601–2606. [Google Scholar]
  6. Trivedi, S.; Pardos, Z.A.; Heffernan, N.T. Clustering students to generate an ensemble to improve standard test score predictions. In International Conference on Artificial Intelligence in Education; Springer: Berlin/Heidelberg, Germany, 2011; pp. 377–384. [Google Scholar]
  7. Lin, C.; He, Y.X.; Xiong, N. An energy-efficient dynamic power management in wireless sensor networks. In Proceedings of the 2006 Fifth International Symposium on Parallel and Distributed Computing, Timisoara, Romania, 6–9 July 2006; IEEE: Piscataway, NJ, USA; pp. 148–154. [Google Scholar]
  8. Zong, X.; Tao, Z. Knowledge tracing model based on mastery speed. Comput. Eng. Appl. 2021, 57, 117–123. [Google Scholar]
  9. Li, X.; Wei, S.; Zhang, X.; Du, Y.; Yu, G. LFKT: Deep knowledge tracing model with learning and forgetting behavior merging. J. Softw. 2021, 32, 818–830. [Google Scholar]
  10. Minn, S.; Yu, Y.; Desmarais, M.C.; Zhu, F.; Vie, J.J. Deep knowledge tracing and dynamic student classification for knowledge tracing. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; IEEE: Piscataway, NJ, USA; pp. 1182–1187. [Google Scholar]
  11. Minn, S.; Desmarais, M.C.; Zhu, F.; Xiao, J.; Wang, J. Dynamic student classiffication on memory networks for knowledge tracing. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Cham, Switzerland, 2019; pp. 163–174. [Google Scholar]
  12. Yeung, C.K. Deep-IRT: Make deep learning based knowledge tracing explainable using item response theory. arXiv 2019, arXiv:1904.11738. [Google Scholar]
  13. Yao, Y.; Xiong, N.; Park, J.H.; Ma, L.; Liu, J. Privacy-preserving max/min query in two-tiered wireless sensor networks. Comput. Math. Appl. 2013, 65, 1318–1325. [Google Scholar] [CrossRef]
  14. Zhao, J.; Huang, J.; Xiong, N. An effective exponential-based trust and reputation evaluation system in wireless sensor networks. IEEE Access 2019, 7, 33859–33869. [Google Scholar] [CrossRef]
  15. Xiao, R.; Zheng, R.; Xiao, Y.; Zhang, Y.; Sun, B.; He, J. Deep Knowledge Tracking Based on Exercise Semantic Information. In International Symposium on Emerging Technologies for Education; Springer: Cham, Switzerland, 2021; pp. 278–289. [Google Scholar]
  16. Wang, W.; Ma, H.; Zhao, Y.; Li, Z.; He, X. Tracking knowledge proficiency of students with calibrated Q-matrix. Expert Syst. Appl. 2022, 192, 116454. [Google Scholar] [CrossRef]
  17. Song, Z.; Huang, S.; Zhou, Y. A Deep Knowledge Tracking Model Integrating Difficulty Factors. In Proceedings of the 2nd International Conference on Computing and Data Science, Standford, CA, USA, 28–30 January 2021; pp. 1–5. [Google Scholar]
  18. Wu, C.; Ju, B.; Wu, Y.; Lin, X.; Xiong, N.; Xu, G.; Liang, X. UAV autonomous target search based on deep reinforcement learning in complex disaster scene. IEEE Access 2019, 7, 117227–117245. [Google Scholar] [CrossRef]
  19. Lu, Y.; Wu, S.; Fang, Z.; Xiong, N.; Yoon, S.; Park, D.S. Exploring finger vein based personal authentication for secure IoT. Future Gener. Comput. Syst. 2017, 77, 149–160. [Google Scholar] [CrossRef]
  20. Rasch, G. Studies in Mathematical Psychology: I. Probabilistic Models for some Intelligence and Attainment Tests; American Psychological Association: Washington, DC, USA, 1960. [Google Scholar]
  21. Cheng, S.; Liu, Q.; Chen, E.; Huang, Z.; Huang, Z.; Chen, Y.; Hu, G. DIRT: Deep learning enhanced item response theory for cognitive diagnosis. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2397–2400. [Google Scholar]
  22. Wang, F.; Liu, Q.; Chen, E.; Huang, Z.; Chen, Y.; Yin, Y.; Wang, S. Neural cognitive diagnosis for intelligent education systems. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 6153–6161. [Google Scholar]
  23. Wang, F.; Liu, Q.; Chen, E.; Huang, Z.; Yin, Y.; Wang, S.; Su, Y. NeuralCD: A General Framework for Cognitive Diagnosis. In IEEE Transactions on Knowledge and Data Engineering; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
  24. Corbett, A.T.; Anderson, J.R. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model. User Adapt. Interact. 1994, 4, 253–278. [Google Scholar] [CrossRef]
  25. Baker, R.S.J.d.; Corbett, A.T.; Aleven, V. More accurate student modeling through contextual estimation of slip and guess probabilities in bayesian knowledge tracing. In Proceedings of the International Conference on Intelligent Tutoring Systems, Montreal, QC, Canada, 23–27 June 2008; Springer: Berlin/Heidelberg, Germany; pp. 406–415. [Google Scholar]
  26. Yudelson, M.V.; Koedinger, K.R.; Gordon, G.J. Individualized bayesian knowledge tracing models. In Proceedings of the International Conference on Artificial Intelligence in Education, Memphis, TN, USA, 9–13 July 2013; Springer: Berlin/Heidelberg, Germany; pp. 171–180. [Google Scholar]
  27. Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep knowledge tracing. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015; pp. 505–513. [Google Scholar]
  28. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  29. Zhang, J.; Shi, X.; King, I.; Yeung, D.Y. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 765–774. [Google Scholar]
  30. Graves, A.; Wayne, G.; Reynolds, M.; Harley, T.; Danihelka, I.; Grabska-Barwińska, A.; Hassabis, D. Hybrid computing using a neural network with dynamic external memory. Nature 2016, 538, 471–476. [Google Scholar] [CrossRef]
  31. Liu, Q.; Huang, Z.; Yin, Y.; Chen, E.; Xiong, H.; Su, Y.; Hu, G. EKT: Exercise-aware knowledge tracing for student performance prediction. In IEEE Transactions on Knowledge and Data Engineering; IEEE: Piscataway, NJ, USA, 2019; Volume 33, pp. 100–115. [Google Scholar]
  32. Choi, Y.; Lee, Y.; Cho, J.; Baek, J.; Kim, B.; Cha, Y.; Heo, J. Towards an appropriate query, key, and value computation for knowledge tracing. In Proceedings of the Seventh ACM Conference on Learning@ Scale, Online, 12–14 August 2020; pp. 341–344. [Google Scholar]
  33. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation: San Diego, CA, USA, 2017. [Google Scholar]
  34. Wang, T.; Ma, F.; Gao, J. Deep hierarchical knowledge tracing. In Proceedings of the 12th International Conference on Educational Data Mining, Montreal, QC, Canada, 2–5 July 2019. [Google Scholar]
  35. Nakagawa, H.; Iwasawa, Y.; Matsuo, Y. Graph-based knowledge tracing: Modeling student proficiency using graph neural network. In Proceedings of the 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI), Thessaloniki, Greece, 14–17 October 2019; IEEE: Piscataway, NJ, USA; pp. 156–163. [Google Scholar]
  36. Yang, Y.; Shen, J.; Qu, Y.; Liu, Y.; Wang, K.; Zhu, Y.; Yu, Y. GIKT: A graph-based interaction model for knowledge tracing. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Bilbao, Spain, 13–17 September 2020; Springer: Cham, Switzerland; pp. 299–315. [Google Scholar]
  37. Shen, S.; Liu, Q.; Chen, E.; Wu, H.; Huang, Z.; Zhao, W.; Wang, S. Convolutional knowledge tracing: Modeling individualization in student learning process. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Online, 25–30 July 2020; pp. 1857–1860. [Google Scholar]
  38. Wan, R.; Xiong, N. An energy-efficient sleep scheduling mechanism with similarity measure for wireless sensor networks. Hum. Cent. Comput. Inf. Sci. 2018, 8, 18. [Google Scholar] [CrossRef] [Green Version]
  39. Guo, X.; Huang, Z.; Gao, J.; Shang, M.; Shu, M.; Sun, J. Enhancing Knowledge Tracing via Adversarial Training. In Proceedings of the 29th ACM International Conference on Multimedia, Online, 20–24 October 2021; pp. 367–375. [Google Scholar]
  40. Zhou, Y.; Zhang, Y.; Liu, H.; Xiong, N.; Vasilakos, A.V. A bare-metal and asymmetric partitioning approach to client virtualization. In IEEE Transactions on Knowledge and Data Engineering; IEEE: Piscataway, NJ, USA, 2012; Volume 7, pp. 40–53. [Google Scholar]
  41. MacQueen, J. Classification and analysis of multivariate observations. In 5th Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
  42. Gao, K.; Han, F.; Dong, P.; Xiong, N.; Du, R. Connected vehicle as a mobile sensor for real time queue length at signalized intersections. Sensors 2019, 19, 2059. [Google Scholar] [CrossRef] [Green Version]
  43. Minn, S.; Zhu, F.; Desmarais, M.C. Improving knowledge tracing model by integrating problem difficulty. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17–20 November 2018; IEEE: Piscataway, NJ, USA; pp. 1505–1506. [Google Scholar]
  44. Shen, S.; Huang, Z.; Liu, Q.; Su, Y.; Wang, S.; Chen, E. Assessing Student’s Dynamic Knowledge State by Exploring the Question Difficulty Effect. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 427–437. [Google Scholar]
  45. Yang, F.M. Item response theory for measurement validity. Shanghai Arch. Psychiatry 2014, 26, 171. [Google Scholar]
  46. Jiang, Y.; Tong, G.; Yin, H.; Xiong, N. A pedestrian detection method based on genetic algorithm for optimize XGBoost training parameters. IEEE Access 2019, 7, 118310–118321. [Google Scholar] [CrossRef]
  47. Feng, M.; Heffernan, N.; Koedinger, K. Addressing the assessment challenge with an online system that tutors as it assesses. User Model. User Adapt. Interact. 2009, 19, 243–266. [Google Scholar] [CrossRef] [Green Version]
  48. Xiong, X.; Zhao, S.; Van Inwegen, E.G.; Beck, J.E. Going Deeper with Deep Knowledge Tracing. International Educational Data Mining Society; International Educational Data Mining Society: Worcester, MA, USA, 2016. [Google Scholar]
  49. Koedinger, K.R.; Baker, R.S.; Cunningham, K.; Skogsholm, A.; Leber, B.; Stamper, J. A data repository for the EDM community: The PSLC DataShop. In Handbook of Educational Data Mining; CRC Press: Boca Raton, FL, USA, 2010; Volume 43, pp. 43–56. [Google Scholar]
  50. Racine, J. Consistent cross-validatory model-selection for dependent data: Hv-block cross-validation. J. Econom. 2000, 99, 39–61. [Google Scholar] [CrossRef]
  51. Wu, C.; Luo, C.; Xiong, N.; Zhang, W.; Kim, T.H. A greedy deep learning method for medical disease analysis. IEEE Access 2018, 6, 20021–20030. [Google Scholar] [CrossRef]
Figure 1. Time interval in the student interaction sequence.
Figure 1. Time interval in the student interaction sequence.
Electronics 11 03364 g001
Figure 2. Grouping of students’ learning capability at every time interval.
Figure 2. Grouping of students’ learning capability at every time interval.
Electronics 11 03364 g002
Figure 3. Framework diagram of the DKT-LCIRT model.
Figure 3. Framework diagram of the DKT-LCIRT model.
Electronics 11 03364 g003
Figure 4. Training and validation AUC values for the three models on datasets (a) ASSIST2009, (b) ASSIST2015, (c) Synthetic, (d) Statics2011.
Figure 4. Training and validation AUC values for the three models on datasets (a) ASSIST2009, (b) ASSIST2015, (c) Synthetic, (d) Statics2011.
Electronics 11 03364 g004aElectronics 11 03364 g004b
Table 1. Overview of the datasets.
Table 1. Overview of the datasets.
DatasetsNumber of ExercisesNumber of Knowledge PointsNumber of StudentsNumber of InteractionsCorrect Rate
ASSIST200926,6841104151325,63765.84%
ASSIST2015NA10019,840683,80173.18%
Synthetic5052000100,00058.83%
Statics20111223156333189,29776.54%
Table 2. Average AUC and ACC values of the three models on all datasets (DKT-LCIRT Model).
Table 2. Average AUC and ACC values of the three models on all datasets (DKT-LCIRT Model).
ModelsDKT [27]DKVMN [29]DKT-LCIRT
AUCACCAUCACCAUCACC
ASSIST20090.8230.7680.8250.7710.8520.785
ASSIST20150.7250.7350.7300.7360.7640.749
Synthetic0.8040.7520.7990.7540.8250.775
Statics20110.7940.7510.7970.7540.8190.773
Table 3. Average AUC and ACC values of the three models on all datasets (Learning Capability Features and Item Response Theory).
Table 3. Average AUC and ACC values of the three models on all datasets (Learning Capability Features and Item Response Theory).
ModelsDKT-LCDKT-IRTDKT-LCIRT
AUCACCAUCACCAUCACC
ASSIST20090.8500.7840.8260.7730.8520.785
ASSIST20150.7650.7500.7320.7370.7640.749
Synthetic0.8270.7760.7980.7540.8250.775
Statics20110.8160.7710.7950.7530.8190.773
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, G.; Shuai, J.; Hu, Y.; Zhang, Y.; Wang, Y.; Yang, T.; Xiong, N. DKT-LCIRT: A Deep Knowledge Tracking Model Integrating Learning Capability and Item Response Theory. Electronics 2022, 11, 3364. https://doi.org/10.3390/electronics11203364

AMA Style

Li G, Shuai J, Hu Y, Zhang Y, Wang Y, Yang T, Xiong N. DKT-LCIRT: A Deep Knowledge Tracking Model Integrating Learning Capability and Item Response Theory. Electronics. 2022; 11(20):3364. https://doi.org/10.3390/electronics11203364

Chicago/Turabian Style

Li, Guangquan, Junkai Shuai, Yuqing Hu, Yonghong Zhang, Yinglong Wang, Tonghua Yang, and Naixue Xiong. 2022. "DKT-LCIRT: A Deep Knowledge Tracking Model Integrating Learning Capability and Item Response Theory" Electronics 11, no. 20: 3364. https://doi.org/10.3390/electronics11203364

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop