Online Learning Strategy Induction through Partially Observable Markov Decision Process-Based Cognitive Experience Model
Abstract
:1. Introduction
- Generally, a large amount of calculation is required to complete the model update, which is time-consuming and contrary to the required real-time update.
- The information provided by a single answer is very limited, and will be diluted by other historical records.
- To achieve a personalized online update of the model and strategy, we propose the POMDP-based cognitive experience model (PCEM). Unlike models that accurately reflect learners’ cognitive abilities, the cognitive experience model emphasizes learners’ subjective experiences in the cognitive process. Additionally, we introduce an adviser–advisee mechanism to derive learning strategies corresponding to the PCEM.
- This paper proposes new methods for parameter learning and strategy solutions to the PCEM. The parameter learning draws on the ideas of HCPM and makes appropriate modifications based on the PCEM characteristics. The strategy solving method incorporates the similarity between cognitive experiences to improve the performance.
- We demonstrate the PCEM performance in real-time update of learning strategies in several real-world knowledge concept learning domains.
2. Related Works
3. POMDP-Based Cognitive Experience Modeling and Online Personalized Learning Planning
3.1. Background Knowledge
3.2. PCEM Specification
3.3. PCEM Parameter Learning
- Determine the log-likelihood function of the complete data.
- E-step of the EM algorithm: Calculate the Q function ,
- M-step of the EM algorithm: maximize the Q-function to estimate the model parameters.
- (a)
- The first terms of Equation (1) can be expressed as
- (b)
- The th to th terms of Equation (1) can be expressed as
- (c)
- The th term can be expressed as
Algorithm 1 Parameter Learning of PCEM |
Input: The observation sequences , the action sequences , the termination condition Output: The initial state distribution , the transition function , and the observation function O (). |
- Comparison of algorithm efficiency.
3.4. PCEM Online Planning
4. Experimental Results
- PCPM [20]: This model incorporates POMDP to represent learners’ cognitive processes, facilitating real-time updates to a learner’s knowledge state based on feedback. This, in turn, enables the adaptation of learning strategies accordingly.
- HCPM [21]: By utilizing H-POMDP, this model models learners’ cognitive processes, allowing dynamic updates to learners’ knowledge states and cognitive abilities based on their feedback. This facilitates the induction of suitable learning strategies.
- PCEM: our method, proposed in Algorithm 1, aims to enhance the performance and efficiency of the system.
4.1. Datasets
4.2. Metrics for Evaluating Model Performance
- represents the average mastery level of the knowledge concept among the learner group.
- is the average number of knowledge concepts mastered by the learner group in the learning domain.
- represents the stability of the learning strategy.
- The independent sample t-test statistic is
4.3. Experiment: Evaluations on Online Learning Strategy Induction
4.4. Experimental Summary and Discussion
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, H.; Tlili, A.; Huang, R.; Cai, Z.; Li, M.; Cheng, Z.; Yang, D.; Li, M.; Zhu, X.; Fei, C. Examining the applications of intelligent tutoring systems in real educational contexts: A systematic literature review from the social experiment perspective. Educ. Inf. Technol. 2023, 28, 9113–9148. [Google Scholar] [CrossRef] [PubMed]
- Vasandani, V.; Govindaraj, T. Knowledge organization in intelligent tutoring systems for diagnostic problem solving in complex dynamic domains. IEEE Trans. Syst. Man Cybern. 1995, 25, 1076–1096. [Google Scholar] [CrossRef]
- Goh, G.M.; Quek, C. EpiList: An intelligent tutoring system shell for implicit development of generic cognitive skills that support bottom-up knowledge construction. IEEE Trans. Syst. Man Cybern. Part Syst. Humans 2006, 37, 58–71. [Google Scholar] [CrossRef]
- Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
- Tang, X.; Chen, Y.; Li, X.; Liu, J.; Ying, Z. A reinforcement learning approach to personalized learning recommendation systems. Br. J. Math. Stat. Psychol. 2019, 72, 108–135. [Google Scholar] [CrossRef]
- Zhou, G.; Yang, X.; Azizsoltani, H.; Barnes, T.; Chi, M. Improving student-system interaction through data-driven explanations of hierarchical reinforcement learning induced pedagogical policies. In Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, Genoa, Italy, 12–18 July 2020; pp. 284–292. [Google Scholar]
- Kubotani, Y.; Fukuhara, Y.; Morishima, S. Rltutor: Reinforcement learning based adaptive tutoring system by modeling virtual student with fewer interactions. arXiv 2021, arXiv:2108.00268. [Google Scholar]
- Pateria, S.; Subagdja, B.; Tan, A.h.; Quek, C. Hierarchical reinforcement learning: A comprehensive survey. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Zhou, G.; Azizsoltani, H.; Ausin, M.S.; Barnes, T.; Chi, M. Hierarchical reinforcement learning for pedagogical policy induction. In Proceedings of the Artificial Intelligence in Education: 20th International Conference, AIED 2019, Chicago, IL, USA, 25–29 June 2019; Proceedings, Part I 20. Springer: Berlin/Heidelberg, Germany, 2019; pp. 544–556. [Google Scholar]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Ju, S. Identify critical pedagogical decisions through adversarial deep reinforcement learning. In Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019), Montreal, QC, Canada, 2–5 July 2019. [Google Scholar]
- Huang, Z.; Liu, Q.; Zhai, C.; Yin, Y.; Chen, E.; Gao, W.; Hu, G. Exploring multi-objective exercise recommendations in online education systems. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1261–1270. [Google Scholar]
- Sanz Ausin, M.; Maniktala, M.; Barnes, T.; Chi, M. Exploring the impact of simple explanations and agency on batch deep reinforcement learning induced pedagogical policies. In Proceedings of the Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, 6–10 July 2020; Proceedings, Part I 21. Springer: Berlin/Heidelberg, Germany, 2020; pp. 472–485. [Google Scholar]
- Ausin, M.S.; Maniktala, M.; Barnes, T.; Chi, M. Tackling the credit assignment problem in reinforcement learning-induced pedagogical policies with neural networks. In Proceedings of the International Conference on Artificial Intelligence in Education, Utrecht, The Netherlands, 14–18 June 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 356–368. [Google Scholar]
- Judd, C.H. Educational Psychology; Routledge: London, UK, 2012. [Google Scholar]
- Spaan, M.T. Partially observable Markov decision processes. In Reinforcement Learning; Springer: Berlin/Heidelberg, Germany, 2012; pp. 387–414. [Google Scholar]
- Rafferty, A.N.; Brunskill, E.; Griffiths, T.L.; Shafto, P. Faster teaching via pomdp planning. Cogn. Sci. 2016, 40, 1290–1332. [Google Scholar] [CrossRef] [PubMed]
- Ramachandran, A.; Sebo, S.S.; Scassellati, B. Personalized robot tutoring using the assistive tutor pOMDP (AT-POMDP). In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 8050–8057. [Google Scholar]
- Nioche, A.; Murena, P.A.; de la Torre-Ortiz, C.; Oulasvirta, A. Improving artificial teachers by considering how people learn and forget. In Proceedings of the 26th International Conference on Intelligent User Interfaces, College Station, TX, USA, 13–17 April 2021; pp. 445–453. [Google Scholar]
- Gao, H.; Zeng, Y.; Ma, B.; Pan, Y. Improving Knowledge Learning Through Modelling Students’ Practice-Based Cognitive Processes. Cogn. Comput. 2024, 16, 348–365. [Google Scholar] [CrossRef]
- Gao, H.; Zeng, Y.; Pan, Y. Inducing Individual Students’ Learning Strategies through Homomorphic POMDPs. arXiv 2024, arXiv:2403.10930. [Google Scholar]
- Bellman, R. Dynamic programming. Science 1966, 153, 34–37. [Google Scholar] [CrossRef]
- Feng, M.; Heffernan, N.; Koedinger, K. Addressing the assessment challenge with an online system that tutors as it assesses. User Model. User-Adapt. Interact. 2009, 19, 243–266. [Google Scholar] [CrossRef]
- Lu, Y.; Pian, Y.; Shen, Z.; Chen, P.; Li, X. SLP: A multi-dimensional and consecutive dataset from k-12 education. In Proceedings of the 29th International Conference on Computers in Education (ICCE 2021), Online, 22–26 November 2021; Volume 1, pp. 261–266. [Google Scholar]
Dataset/Sub-Dataset | Subject | Learners | Knowledge Concepts | Questions | Answer Logs |
---|---|---|---|---|---|
ASSIST | 8096 | 543 | 6907 | 603,128 | |
ASSIST1 | Math | 2865 | 3 | 202 | 25,963 |
ASSIST2 | Math | 3509 | 7 | 154 | 20,513 |
ASSIST3 | Math | 3283 | 6 | 208 | 25,642 |
Quanlang | 11,765 | 399 | 4871 | 343,719 | |
Quanlang1 | Math | 416 | 4 | 81 | 4243 |
Quanlang2 | Math | 296 | 3 | 62 | 3089 |
Quanlang3 | Math | 515 | 5 | 103 | 5304 |
SLP | 3408 | 184 | 6851 | 344,576 | |
SLP1 | Physics | 143 | 5 | 184 | 4278 |
SLP2 | Chemistry | 81 | 2 | 33 | 1400 |
Data Set | ID | Knowledge Name | State Description |
---|---|---|---|
ASSIST1 | Subtraction whole numbers | (,,) | |
Subtraction whole numbers, Pattern finding | |||
Pattern finding | |||
ASSIST2 | Congruence | (,,,,,,) | |
Perimeter of a Polygon | |||
Substitution | |||
Equation solving more than two steps | |||
Congruence, Perimeter of a Polygon | |||
Congruence, Perimeter of a Polygon, Substitution | |||
Congruence, Perimeter of a Polygon, Substitution, Equation solving more than two steps | |||
ASSIST3 | Multiplication of whole numbers | (,,,,,) | |
Pattern finding | |||
Unlabeled | |||
Multiplication of whole numbers, Pattern finding | |||
Pattern finding, Unlabeled | |||
Multiplication of whole numbers, Pattern finding, Unlabeled | |||
Quanlang1 | Positive and negative numbers | (,,,) | |
Absolute value | |||
Opposite number | |||
Addition of rational numbers | |||
Quanlang2 | Definition of linear equations in one variable | (,,) | |
Formulating linear equations in one variable | |||
Solutions to linear equations in one variable | |||
Quanlang3 | Definition of triangles | (,,,,) | |
Classification of triangles | |||
Triangular side relations | |||
Interior angle properties of triangles | |||
Exterior angle properties of triangles | |||
SLP1 | Rectilinear propagation of light | (,,,,) | |
Reflection of light | |||
Refraction of light | |||
Image formation by converging lens | |||
Lens and its applications | |||
SLP2 | Physical and chemical changes | (,) | |
Research into applications of substance property |
ASSIST1 | ASSIST2 | ASSIST3 | Quanlang1 | Quanlang2 | Quanlang3 | SLP1 | SLP2 | |
---|---|---|---|---|---|---|---|---|
PCEM | 44 | 7 | 8 | 364 | 682 | 39 | 92 | 1916 |
HCPM | 14 | 3 | 3 | 91 | 178 | 12 | 38 | 792 |
Sub-Dataset | Metric | PCPM | HCPM | PCEM |
---|---|---|---|---|
ASSIST1 | 0.9977 | 0.9988 | 0.9993 | |
0.9926 | 0.9947 | 0.9959 | ||
0.9988 | 0.9986 | 0.9983 | ||
2.9890 | 2.9921 | 2.9935 | ||
0.0180 | 0.0133 | 0.0112 | ||
ASSIST2 | 0.9944 | 0.9983 | 0.9995 | |
0.9804 | 0.9913 | 0.9956 | ||
0.9950 | 0.9985 | 0.9994 | ||
0.9586 | 0.9793 | 0.9889 | ||
0.9901 | 0.9953 | 0.9977 | ||
0.9337 | 0.9613 | 0.9761 | ||
0.9819 | 0.9870 | 0.9903 | ||
6.8340 | 6.9110 | 6.9475 | ||
0.4848 | 0.2427 | 0.1369 | ||
ASSIST3 | 0.9966 | 0.9978 | 0.9985 | |
0.9863 | 0.9923 | 0.9955 | ||
0.9984 | 0.9994 | 0.9997 | ||
0.9869 | 0.9918 | 0.9945 | ||
0.9969 | 0.9977 | 0.9982 | ||
0.9690 | 0.9795 | 0.9859 | ||
5.9340 | 5.9584 | 5.9722 | ||
0.1647 | 0.1008 | 0.0675 | ||
Quanlang1 | 0.9999 | 1.0000 | 1.0000 | |
0.9968 | 0.9987 | 0.9995 | ||
0.9891 | 0.9932 | 0.9956 | ||
0.9817 | 0.9874 | 0.9911 | ||
3.9674 | 3.9793 | 3.9861 | ||
0.0671 | 0.0391 | 0.0246 | ||
Quanlang2 | 0.9998 | 0.9999 | 0.9999 | |
0.9962 | 0.9977 | 0.9985 | ||
0.9882 | 0.9917 | 0.9941 | ||
2.9843 | 2.9893 | 2.9926 | ||
0.0237 | 0.0157 | 0.0105 | ||
Quanlang3 | 1.0000 | 1.0000 | 1.0000 | |
0.9338 | 0.9337 | 0.9330 | ||
0.9317 | 0.9296 | 0.9267 | ||
0.9858 | 0.9875 | 0.9889 | ||
0.9199 | 0.9211 | 0.9215 | ||
4.7711 | 4.7718 | 4.7700 | ||
0.2494 | 0.2464 | 0.2478 | ||
SLP1 | 1.0000 | 1.0000 | 1.0000 | |
0.9643 | 0.9636 | 0.9627 | ||
0.9965 | 0.9976 | 0.9984 | ||
0.9881 | 0.9899 | 0.9913 | ||
0.9478 | 0.9492 | 0.9501 | ||
4.8967 | 4.9004 | 4.9025 | ||
0.1401 | 0.1298 | 0.1236 | ||
SLP2 | 0.9997 | 0.9997 | 0.9998 | |
0.9963 | 0.9978 | 0.9987 | ||
1.9960 | 1.9975 | 1.9984 | ||
0.0046 | 0.0030 | 0.0020 |
PCEM vs. | ASSIST1 | ASSIST2 | ASSIST3 | Quanlang1 | Quanlang2 | Quanlang3 | SLP1 | SLP2 |
---|---|---|---|---|---|---|---|---|
PCPM | 2.6378 | 14.4004 | 7.9324 | 6.1880 | 4.4891 | 0.1518 | 1.1302 | 3.0267 |
HCPM | 0.9550 | 5.9215 | 3.3538 | 2.7149 | 2.0473 | 0.2519 | 0.4169 | 1.2903 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, H.; Ma, B. Online Learning Strategy Induction through Partially Observable Markov Decision Process-Based Cognitive Experience Model. Electronics 2024, 13, 3858. https://doi.org/10.3390/electronics13193858
Gao H, Ma B. Online Learning Strategy Induction through Partially Observable Markov Decision Process-Based Cognitive Experience Model. Electronics. 2024; 13(19):3858. https://doi.org/10.3390/electronics13193858
Chicago/Turabian StyleGao, Huifan, and Biyang Ma. 2024. "Online Learning Strategy Induction through Partially Observable Markov Decision Process-Based Cognitive Experience Model" Electronics 13, no. 19: 3858. https://doi.org/10.3390/electronics13193858
APA StyleGao, H., & Ma, B. (2024). Online Learning Strategy Induction through Partially Observable Markov Decision Process-Based Cognitive Experience Model. Electronics, 13(19), 3858. https://doi.org/10.3390/electronics13193858