DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition Scheduling
Abstract
:1. Introduction
2. Related Work
2.1. Traditional Spaced Repetition Algorithms
2.2. Reinforcement Learning for Spaced Repetition
3. Background
3.1. Models for Human Memory
3.1.1. Exponential Forgetting Curve
3.1.2. Half-Life Regression
3.1.3. Difficulty–Half-life–P(recall) HLR
3.2. Reinforcement Learning
4. Methodology
4.1. Problem Formulation
- State space. The state space depends on the memory model. For EFC, the state space encodes the item difficulty, the time delay, and the memory strength. For DHP, the states include the difficulty, the time delay, and the recall result.
- Action space. The action space consists of review intervals ranging from 1 to the maximum interval T. The action indicates that the current item was scheduled days ago after its review at time t.
- Observation space. Away from the internal memory states, the agent can only obtain observations including the time delay, the recall result, and the recall probability no matter the choice of memory model.
4.2. Transformer-Based Half-Life Regression
- Previous recall results ;
- Previous recall probabilities ;
- Previous review intervals .
4.3. Schedule Optimization Environment
Algorithm 1: The simulation process of the environment |
4.4. Reinforcement-Learning-Based Spaced Repetition Optimization
4.4.1. DQN Algorithm
4.4.2. Recurrent-Style Planning
5. Experiments
5.1. Environments
5.2. Memory Prediction
5.2.1. Baselines
- Pimsleur [4] is a pioneer work in introducing an initial scheduling system that utilized a geometric progression, characterized by a common ratio of five.
- Leitner [5] is a method using tangible containers, which manages the frequency of flashcard reviews by transferring them between boxes of differing dimensions.
- HLR [8] proposes a parameter to measure the storage strength of memory and gives the recall probability according to the half-life and the time interval.
- DHP-HLR [9] is a variant of HLR that considers the item difficulty.
- GRU-HLR [10] is an improved version of DHP-HLR that uses recurrent neural networks to update internal parameters.
5.2.2. Evaluation Metrics and Experimental Settings
5.2.3. Results and Analysis
5.3. Schedule Optimization
5.3.1. Baselines
- RANDOM is a baseline that chooses a random interval from to schedule the next review.
- ANKI [7] is a variant of the SM-2 algorithm, which is used in a popular learning application.
- MEMORIZE [20] is a spaced repetition algorithm based on optimal control that is trained to determine the parameter to minimize the expectation of review cost.
- HLR [8] is a baseline to schedule the next review interval equal to the half-life.
- EFC [3] is one of the most recognized memory models that illustrates the relationship between the forgetting curve and the time decay.
- DHP [9] is a memory model with the Markov property for explainability and simplicity and handcrafted state-transition equations. It takes half-life, recall probability, recall result, and item difficulty as state variables.
- THLR is the simulation environment proposed in this work.
5.3.2. Evaluation Metrics and Experimental Settings
5.3.3. Results and Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cepeda, N.J.; Vul, E.; Rohrer, D.; Wixted, J.T.; Pashler, H. Spacing effects in learning: A temporal ridgeline of optimal retention. Psychol. Sci. 2008, 19, 1095–1102. [Google Scholar] [CrossRef] [PubMed]
- Finke, L. Markov Models for Spaced Repetition Learning; Mathematisches Institut der Georg-August-Universität Göttingen: Göttingen, Germany, 2023. [Google Scholar]
- Ebbinghaus, H. Memory: A contribution to experimental psychology. Ann. Neurosci. 2013, 20, 155. [Google Scholar] [CrossRef] [PubMed]
- Pimsleur, P. A memory schedule. Mod. Lang. J. 1967, 51, 73–75. [Google Scholar] [CrossRef]
- Leitner, S.; Totter, R. So Lernt Man Lernen; Angewandte Lernpsychologie ein Weg zum Erfolg; Herder: Freiburg, Germany, 1972. [Google Scholar]
- Woźniak, P.; Gorzelańczyk, E. Optimization of repetition spacing in the practice of learning. Acta Neurobiol. Exp. 1994, 54, 59–62. [Google Scholar] [CrossRef]
- Lu, M.; Farhat, J.H.; Beck Dallaghan, G.L. Enhanced learning and retention of medical knowledge using the mobile flash card application Anki. Med. Sci. Educ. 2021, 31, 1975–1981. [Google Scholar] [CrossRef] [PubMed]
- Settles, B.; Meeder, B. A trainable spaced repetition model for language learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 1848–1858. [Google Scholar]
- Ye, J.; Su, J.; Cao, Y. A stochastic shortest path algorithm for optimizing spaced repetition scheduling. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 4381–4390. [Google Scholar]
- Su, J.; Ye, J.; Nie, L.; Cao, Y.; Chen, Y. Optimizing spaced repetition schedule by capturing the dynamics of memory. IEEE Trans. Knowl. Data Eng. 2023, 35, 10085–10097. [Google Scholar] [CrossRef]
- Pinto, J.D.; Paquette, L. Deep Learning for Educational Data Science. arXiv 2024, arXiv:2404.19675. [Google Scholar]
- Taherisadr, M.; Stavroulakis, S.A.; Elmalaki, S. adaPARL: Adaptive privacy-aware reinforcement learning for sequential decision making human-in-the-loop systems. In Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation, San Antonio, TX, USA, 9–12 May 2023; pp. 262–274. [Google Scholar]
- Gharbi, H.; Elaachak, L.; Fennan, A. Reinforcement Learning Algorithms and Their Applications in Education Field: A Systematic Review. In Proceedings of the International Conference on Smart City Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 410–418. [Google Scholar]
- Reddy, S.; Levine, S.; Dragan, A. Accelerating human learning with deep reinforcement learning. In Proceedings of the NIPS Workshop: Teaching Machines, Robots, and Humans, Long Beach, CA, USA, 9 December 2017. [Google Scholar]
- Sinha, S. Using Deep Reinforcement Learning for Personalizing Review Sessions on E-Learning Platforms with Spaced Repetition. Master’s Thesis, KTH, School of Electrical Engineering and Computer Science (EECS), Stockholm, Sweeden, 2019. [Google Scholar]
- Upadhyay, U.; De, A.; Gomez Rodriguez, M. Deep reinforcement learning of marked temporal point processes. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; p. 31. [Google Scholar]
- Yang, Z.; Shen, J.; Liu, Y.; Yang, Y.; Zhang, W.; Yu, Y. TADS: Learning Time-Aware Scheduling Policy with Dyna-Style Planning for Spaced Repetition. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20), New York, NY, USA, 25–30 July 2020; pp. 1917–1920. [Google Scholar] [CrossRef]
- Sutton, R.S.; Szepesvari, C.; Geramifard, A.; Bowling, M.P. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping. arXiv 2012, arXiv:1206.3285. [Google Scholar]
- Zhao, G.; Huang, Z.; Zhuang, Y.; Liu, J.; Liu, Q.; Liu, Z.; Wu, J.; Chen, E. Simulating Student Interactions with Two-stage Imitation Learning for Intelligent Educational Systems. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 3423–3432. [Google Scholar]
- Tabibian, B.; Upadhyay, U.; De, A.; Zarezade, A.; Schölkopf, B.; Gomez-Rodriguez, M. Enhancing human learning via spaced repetition optimization. Proc. Natl. Acad. Sci. USA 2019, 116, 3988–3993. [Google Scholar] [CrossRef]
- Hunziker, A.; Chen, Y.; Mac Aodha, O.; Gomez Rodriguez, M.; Krause, A.; Perona, P.; Yue, Y.; Singla, A. Teaching multiple concepts to a forgetful learner. Adv. Neural Inf. Process. Syst. 2019, 32, 4048–4058. [Google Scholar]
- Reddy, S.; Labutov, I.; Banerjee, S.; Joachims, T. Unbounded human learning: Optimal scheduling for spaced repetition. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1815–1824. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Pokrywka, J.; Biedalak, M.; Gralinski, F.; Biedalak, K. Modeling Spaced Repetition with LSTMs. In Proceedings of the 15th International Conference on Computer Supported Education CSEDU (2), Prague, Czech, 21–23 April 2023; pp. 88–95. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc.: La Jolla, CA, USA, 2017; Volume 30. [Google Scholar]
- Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Schäfer, A.M. Reinforcement Learning with Recurrent Neural Networks. Ph.D. Thesis, Osnabrück University, Osnabrück, Germany, 2008. [Google Scholar]
- Towers, M.; Terry, J.K.; Kwiatkowski, A.; Balis, J.U.; Cola, G.d.; Deleu, T.; Goulão, M.; Kallinteris, A.; KG, A.; Krimmel, M.; et al. Gymnasium. 2023. Available online: https://github.com/Farama-Foundation/Gymnasium (accessed on 16 May 2024).
- Weng, J.; Chen, H.; Yan, D.; You, K.; Duburcq, A.; Zhang, M.; Su, Y.; Su, H.; Zhu, J. Tianshou: A Highly Modularized Deep Reinforcement Learning Library. J. Mach. Learn. Res. 2022, 23, 1–6. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026. [Google Scholar]
- Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics: Methodology and Distribution; Springer: New York, NY, USA, 1992; pp. 196–202. [Google Scholar]
- Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Learning-Based | Temporal Dynamics | Attention Mechanism | |
---|---|---|---|
Pimsleur [4] | - | - | - |
Leitner [5] | - | - | - |
HLR [8] | ✔ | - | - |
DHP [9] | ✔ | - | - |
GRU-HLR [10] | ✔ | ✔ | - |
THLR (ours) | ✔ | ✔ | ✔ |
Learning-Based | Temporal Dynamics | Reinforcement Learning | |
---|---|---|---|
RANDOM | - | - | - |
ANKI [7] | - | - | - |
MEMORIZE [20] | - | - | - |
HLR [8] | ✔ | - | - |
Ours | ✔ | ✔ | ✔ |
MAE | MAPE | |
---|---|---|
Pimsleur | 0.3169 | 165.69% |
Leitner | 0.4535 | 133.92% |
HLR | 0.1070 | 76.65% |
DHP | 0.0779 | 46.35% |
GRU-HLR | 0.0307 | 18.88% |
THLR (ours) | 0.0274 | 16.70% |
EFC | DHP | THLR | ||||
---|---|---|---|---|---|---|
SUM | MEAN | SUM | MEAN | SUM | MEAN | |
RANDOM | 52,439.40 | 0.867 | 28,803.55 | 0.475 | 13,482.70 | 0.178 |
ANKI | 13,264.31 | 0.637 | 13,702.83 | 0.584 | 12,425.18 | 0.185 |
MEMORIZE | - | - | 35,350.10 | 0.817 | 12,694.96 | 0.169 |
HLR | - | - | 10,516.85 | 0.468 | 25,169.26 | 0.355 |
Ours | 88,548.67 | 0.920 | 162,482.57 | 0.942 | 18,572.39 | 0.372 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, Q.; Wang, J. DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition Scheduling. Appl. Sci. 2024, 14, 5591. https://doi.org/10.3390/app14135591
Xiao Q, Wang J. DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition Scheduling. Applied Sciences. 2024; 14(13):5591. https://doi.org/10.3390/app14135591
Chicago/Turabian StyleXiao, Qinfeng, and Jing Wang. 2024. "DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition Scheduling" Applied Sciences 14, no. 13: 5591. https://doi.org/10.3390/app14135591
APA StyleXiao, Q., & Wang, J. (2024). DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition Scheduling. Applied Sciences, 14(13), 5591. https://doi.org/10.3390/app14135591