Next Article in Journal
Cross-Cultural Validation of Teachers Social Self-Efficacy Scale: Insights from Cyprus, Greece, and Portugal
Previous Article in Journal
Education Practices Mediated by Digital Technologies: Mobilization and Teachers’ Strategies in Primary and Secondary Schools in Germany
 
 
Review
Peer-Review Record

Evaluating Recent Advances in Affective Intelligent Tutoring Systems: A Scoping Review of Educational Impacts and Future Prospects

Educ. Sci. 2024, 14(8), 839; https://doi.org/10.3390/educsci14080839
by Jorge Fernández-Herrero
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Educ. Sci. 2024, 14(8), 839; https://doi.org/10.3390/educsci14080839
Submission received: 22 May 2024 / Revised: 9 July 2024 / Accepted: 29 July 2024 / Published: 1 August 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Is the work a significant contribution to the field?

A comprehensive review of Affective Intelligent Tutoring Systems is timely, interesting, and important for those interested in the question of how learning technologies could best adapt to learners (an important current topic).

 

Is the work well organized and comprehensively described?

Although the paper is comprehensive in scope, tackles a series of nine interesting research questions, and carefully describes and analyzes the reviewed papers, I found myself nonetheless wanting a bit more, as a researcher deeply invested in the field of adaptive learning technologies.

 

First, I would like to see a more thoughtful weighing and summarizing of the empirical evidence regarding the value affect-sensitive tutoring systems on students’ learning outcomes, processes, and experiences (though outcomes in the first place). Developers of systems need to know what kind of affect-sensitivity (if any) is worth the development effort. I feel that the discussion of RQ6 mentions many different variables (performances, strategies, self-regulated learning, understanding, learning effectiveness) making it hard for the reader to track the influences of affect-adaptiveness on any of them, specifically. It may be helpful to pick just two or three key variables and then discuss each one separately.

 

Second, in a review of affect-adaptive learning technologies, I would like (at minimum) to see two main questions answered: (a) How do the reviewed systems adjust to student affect?  (b) What is the effect on students – of affect-aware systems, or more specifically, of making systems react to student affect, as opposed to not doing so? These two questions fall within the scope of the review and appear to be captured by research questions 3, 5, and 6. The descriptions of the reviewed systems in the sections “Multimodal Affect Recognition Systems” and “Unimodal Affect Recognition Systems” also seem intended to provide this information.

 

Yet I felt that the answers I was looking for are not in the paper. To understand exactly how the reviewed systems adjust to student affect, I would like to know, for each system (i) what aspects of the system’s behavior are varied, (ii) based on what aspects of student aspect, and (iii) exactly how? To answer the question of what the empirical studies say about the effects of the reviewed systems’ affect-responsive features, I would like to know what kind of study was done and with what kind of result.  Yet for most if not all of the reviewed system, the paper does not provide this information. 

 

As a representative example, the paper states (p. 14):

“EasyLogic [80] …  identifies emotions like engagement, frustration, and boredom to tailor feedback and interventions … it .. has proven to improve academic performance for its users.”

 

This description leaves it unsaid how the system’s feedback and interventions are adjusted based on a student’s affective or emotional state (e.g., how is the feedback to a student in a bored state different from that to an engaged student – is it kinder or funnier, does it give more examples, etc.?); as well, it is not clear what kind of “interventions” are meant.  Furthermore, this description does not tell me what the nature of the evidence is for the improved academic performance, or compared to what control condition the performance was improved (e.g., was it against the same system but without the affect-sensitive features, or was it against some other kind of control condition?)  Table 4 suggests that a quasi-experimental study was done comparing the system against “traditional methods” comparing post-test results. That is helpful but it would have been even more helpful to know what the traditional methods are.

 

The descriptions of other systems (in combination with Tables 3 and 4) similarly lack the information I am looking for. As a side note, the section “ATSs Pedagogical Features” lists the kind of systems that were reviewed but without highlighting which system features or system behaviors were varied based on student affect.

 

To address these issues, perhaps the authors could include new tables (or expand Tables 3 and 4) with the requested information. One table could list, for example, for each system, which aspects of affect it reacts to, what system feature behaviors are adjusted, and how.  Providing a bit more insight into the nature of the evaluation studies would be helpful as well, with (for experimental or quasi-experimental studies at least) a bit more information about the control condition and the dependent measures. Perhaps that too, could be in a separate table.

 

Perhaps that table could also list the number of participants in the studies. In the limitations section, the paper mentions that the reported studies tend to have small numbers of participants, but these numbers are not reported.

 

Is the work scientifically sound and not misleading?

The review process on which the paper is based is sound and well described, based on the PRISMA process.

 

I was somewhat surprised that Tables 3 and 4 list quasi-experimental studies but no true experimental studies. Did the set of reviewed studies not include any true experimental studies?

 

Regarding the same tables, what is meant by “Quantitative?” Please clarify this term in the paper.

 

Similarly, what is the difference between “Mixed” and “Mixed-methods?”  Please clarify this difference in the paper.

 

Are there appropriate and adequate references to related and previous work?      

Generally, yes, although it would have been good to cite the following paper, which is relevant to the current topic - a past review in the same area:

Harley, J. M., Lajoie, S. P., Frasson, C., & Hall, N. C. (2017). Developing emotion-aware, advanced learning technologies: A taxonomy of approaches and features. International Journal of Artificial Intelligence in Education27, 268-297.

 

It would in fact be nice if the current paper could have applied (or built on) the taxonomy developed in the past paper, to bring greater coherence to the field. Or at minimum to draw some connections.

 

Other past reviews that may be worth mentioning include:

 

Aleven, V., McLaughlin, E. A., Glenn, R. A., & Koedinger, K. R. (2017). Instruction based on adaptive learning technologies. In R. E. Mayer & P. Alexander (Eds.), Handbook of research on learning and instruction (2nd ed., pp. 522-560). New York: Routledge.

 

which includes a brief section on adapting to affect and motivation,

 

and

 

D’Mello, S. K. & Graesser, A. C. (2014). Feeling, thinking, and computing with affect- aware learning technologies. In Calvo, R. A., D’Mello, S. K., Gratch, J., & Kappas, A. (Eds.) Handbook of Affective Computing. Oxford University Press. doi: 10.1093/oxfordhb/9780199942237.013.032

 

which introduces the useful distinction between affect-reactive and affect-proactice systems, which perhaps could be usefully applied in the current paper.

 

Is the English used correct and readable?

The English is fine.

Author Response

Comment 1: Is the work a significant contribution to the field?

A comprehensive review of Affective Intelligent Tutoring Systems is timely, interesting, and important for those interested in the question of how learning technologies could best adapt to learners (an important current topic).

Response 1: I appreciate the reviewer's considerations in this regard.

 

Comment 2: Is the work well organized and comprehensively described?

Although the paper is comprehensive in scope, tackles a series of nine interesting research questions, and carefully describes and analyzes the reviewed papers, I found myself nonetheless wanting a bit more, as a researcher deeply invested in the field of adaptive learning technologies.

First, I would like to see a more thoughtful weighing and summarizing of the empirical evidence regarding the value affect-sensitive tutoring systems on students’ learning outcomes, processes, and experiences (though outcomes in the first place). Developers of systems need to know what kind of affect-sensitivity (if any) is worth the development effort. I feel that the discussion of RQ6 mentions many different variables (performances, strategies, self-regulated learning, understanding, learning effectiveness) making it hard for the reader to track the influences of affect-adaptiveness on any of them, specifically. It may be helpful to pick just two or three key variables and then discuss each one separately.

Second, in a review of affect-adaptive learning technologies, I would like (at minimum) to see two main questions answered: (a) How do the reviewed systems adjust to student affect?  (b) What is the effect on students – of affect-aware systems, or more specifically, of making systems react to student affect, as opposed to not doing so? These two questions fall within the scope of the review and appear to be captured by research questions 3, 5, and 6. The descriptions of the reviewed systems in the sections “Multimodal Affect Recognition Systems” and “Unimodal Affect Recognition Systems” also seem intended to provide this information.

Yet I felt that the answers I was looking for are not in the paper. To understand exactly how the reviewed systems adjust to student affect, I would like to know, for each system (i) what aspects of the system’s behavior are varied, (ii) based on what aspects of student aspect, and (iii) exactly how? To answer the question of what the empirical studies say about the effects of the reviewed systems’ affect-responsive features, I would like to know what kind of study was done and with what kind of result.  Yet for most if not all of the reviewed system, the paper does not provide this information. 

As a representative example, the paper states (p. 14):

“EasyLogic [80] …  identifies emotions like engagement, frustration, and boredom to tailor feedback and interventions … it .. has proven to improve academic performance for its users.”

This description leaves it unsaid how the system’s feedback and interventions are adjusted based on a student’s affective or emotional state (e.g., how is the feedback to a student in a bored state different from that to an engaged student – is it kinder or funnier, does it give more examples, etc.?); as well, it is not clear what kind of “interventions” are meant.  Furthermore, this description does not tell me what the nature of the evidence is for the improved academic performance, or compared to what control condition the performance was improved (e.g., was it against the same system but without the affect-sensitive features, or was it against some other kind of control condition?)  Table 4 suggests that a quasi-experimental study was done comparing the system against “traditional methods” comparing post-test results. That is helpful but it would have been even more helpful to know what the traditional methods are.

The descriptions of other systems (in combination with Tables 3 and 4) similarly lack the information I am looking for. As a side note, the section “ATSs Pedagogical Features” lists the kind of systems that were reviewed but without highlighting which system features or system behaviors were varied based on student affect.

To address these issues, perhaps the authors could include new tables (or expand Tables 3 and 4) with the requested information. One table could list, for example, for each system, which aspects of affect it reacts to, what system feature behaviors are adjusted, and how.  Providing a bit more insight into the nature of the evaluation studies would be helpful as well, with (for experimental or quasi-experimental studies at least) a bit more information about the control condition and the dependent measures. Perhaps that too, could be in a separate table.

Perhaps that table could also list the number of participants in the studies. In the limitations section, the paper mentions that the reported studies tend to have small numbers of participants, but these numbers are not reported.

Response 2: The reviewer's considerations are very accurate and have been incorporated into the modifications made to the new version of the submitted article. The tables originally named 3 and 4 (now called 3 and 5) have been simplified to summarize the selected ATS and their main functionalities. Two new tables, now called 4 and 6, have been created to address the aspects that the reviewer rightly identified as insufficiently covered in the initial version. Additionally, the results text that complements these tables has been updated to delve deeper into these aspects. The new version is submitted with change tracking to facilitate the differentiation between document versions.

 

Comment 3: Is the work scientifically sound and not misleading?

The review process on which the paper is based is sound and well described, based on the PRISMA process.

I was somewhat surprised that Tables 3 and 4 list quasi-experimental studies but no true experimental studies. Did the set of reviewed studies not include any true experimental studies?

Response 3: All the included studies are empirical, as determined by the inclusion and exclusion criteria. To define the research designs, an effort has been made to simplify by differentiating between those that use a control group (quasi-experimental if the sample is not random) and those that use a single group of subjects with repeated measures or pre-tests and post-tests, considering these as within-subject investigations. Many of these works follow experimental research designs, but while not all the selected works are explicitly clear about this, there is no evidence that the participant samples are random, or it is stated that they are convenience samples, thus they are interpreted as quasi-experimental designs. In any case, this does not diminish the validity of the results obtained.

 

Comment 4: Regarding the same tables, what is meant by “Quantitative?” Please clarify this term in the paper.

Response 4: To simplify and ensure clarity, the research designs have been unified into either quasi-experimental or within-subject designs (in the latter case, when no control group is used), further differentiating whether they are quantitative, qualitative (none were found), or mixed-methods.

 

Comment 5: Similarly, what is the difference between “Mixed” and “Mixed-methods?”  Please clarify this difference in the paper.

Response 5: There is no difference; these are studies that combine quantitative and qualitative strategies, integrating results. I apologize for the use of terminology that may have caused confusion, which the new version has sought to avoid.

 

Comment 6: Are there appropriate and adequate references to related and previous work?      

Generally, yes, although it would have been good to cite the following paper, which is relevant to the current topic - a past review in the same area:

Harley, J. M., Lajoie, S. P., Frasson, C., & Hall, N. C. (2017). Developing emotion-aware, advanced learning technologies: A taxonomy of approaches and features. International Journal of Artificial Intelligence in Education27, 268-297.

It would in fact be nice if the current paper could have applied (or built on) the taxonomy developed in the past paper, to bring greater coherence to the field. Or at minimum to draw some connections.

Response 6: Following this recommendation, this citation has been included in the body of the text in both the introduction and discussion sections, aiming to establish connections with the analysis of the selected articles.

 

Comment 7: Other past reviews that may be worth mentioning include:

Aleven, V., McLaughlin, E. A., Glenn, R. A., & Koedinger, K. R. (2017). Instruction based on adaptive learning technologies. In R. E. Mayer & P. Alexander (Eds.), Handbook of research on learning and instruction (2nd ed., pp. 522-560). New York: Routledge.

which includes a brief section on adapting to affect and motivation,

Response 7: Following this recommendation, this citation has been included in the body of the text in both the introduction and discussion sections, aiming to establish connections with the analysis of the selected articles.

 

Comment 8: and

D’Mello, S. K. & Graesser, A. C. (2014). Feeling, thinking, and computing with affect- aware learning technologies. In Calvo, R. A., D’Mello, S. K., Gratch, J., & Kappas, A. (Eds.) Handbook of Affective Computing. Oxford University Press. doi: 10.1093/oxfordhb/9780199942237.013.032

which introduces the useful distinction between affect-reactive and affect-proactice systems, which perhaps could be usefully applied in the current paper.

Response 8: Unfortunately, access to this work was not possible due to a lack of editorial agreements with the institution to which we belong. Therefore, this work could not be cited in the present paper.

 

Comment 9: Is the English used correct and readable?

The English is fine.

Response 8: Thanks

 

Reviewer 2 Report

Comments and Suggestions for Authors

Overall, this is a very interesting review of ATSs that is necessary within the field and has the potential to further work within the field due to the conclusions drawn. However there are a few concerns noted below. Most pressing is the appearance of an article that is not empirical where the authors state that criteria excluded papers that were not empirical. As such, I recommend a major revise and resubmit to ensure that the studies included and conclusions drawn are accurate.

 

On page 3, the authors describe some ITSs, including Guru Tutor and MetaTutor. However, this literature review of systems should reflect ATSs. How can systems such as MetaTutor be highlighted as an ATS if it is not affect-aware in which affect is only collected within specific studies for MetaTutor (for example)?

 

Please specify the years that are included in the “last five years” inclusion/exclusion criteria for clarity.

 

Clarify if the exclusion criteria (“research outside the educational context” includes intelligent training systems in this category.

 

It is slightly difficult to easily distinguish between the grayed out Xs and the solid Xs in Table 5. Please consider removing the grayed out Xs altogether or use a different symbol.

 

For the included articles, the authors stated that they excluded studies that were not empirical. However, studies that were conceptual/theoretical were included, such as [57]. This leads me to believe that the authors did not account for all exclusion criteria and other studies may have been falsely included within the final papers selected, skewing results and conclusion.

 

Author Response

Comment 1: Overall, this is a very interesting review of ATSs that is necessary within the field and has the potential to further work within the field due to the conclusions drawn. However there are a few concerns noted below. Most pressing is the appearance of an article that is not empirical where the authors state that criteria excluded papers that were not empirical. As such, I recommend a major revise and resubmit to ensure that the studies included and conclusions drawn are accurate.

Response 1: I appreciate the reviewer's considerations in this regard.

 

Comment 2: On page 3, the authors describe some ITSs, including Guru Tutor and MetaTutor. However, this literature review of systems should reflect ATSs. How can systems such as MetaTutor be highlighted as an ATS if it is not affect-aware in which affect is only collected within specific studies for MetaTutor (for example)?

Response 2: It is understood that the reviewer is referring to AutoTutor (since it is the ITS mentioned in the section in question), which indeed is originally an ITS without an affective module. However, later versions did include it and even empirically compared the influence of its inclusion (D'mello & Graesser, 2013; D’Mello et al., 2011). The text refers to these versions when it states that (Petrovica et al., 2017) gathers the most prominent ATSs at that time.

D’Mello, S. K., Lehman, B., & Graesser, A. (2011). A Motivationally Supportive Affect-Sensitive AutoTutor. En R. A. Calvo & S. K. D’Mello (Eds.), New Perspectives on Affect and Learning Technologies (pp. 113-126). Springer. https://doi.org/10.1007/978-1-4419-9625-1_9

 D’mello, S., & Graesser, A. (2013). AutoTutor and affective autotutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23:1-23:39. https://doi.org/10.1145/2395123.2395128

Petrovica, S., Anohina-Naumeca, A., & Ekenel, H. K. (2017). Emotion Recognition in Affective Tutoring Systems: Collection of Ground-truth Data. Procedia Computer Science, 104, 437-444. https://doi.org/10.1016/j.procs.2017.01.157

 

Comment 3: Please specify the years that are included in the “last five years” inclusion/exclusion criteria for clarity.

Response 3: Following the reviewer's recommendations, the years considered have been specified in the table that defines the inclusion and exclusion criteria for this review.

 

Comment 4: Clarify if the exclusion criteria (“research outside the educational context” includes intelligent training systems in this category.

Response 4: The reviewer is correct in considering that the concept of educational context can be ambiguous. The filtering has focused on ATSs applied to contexts at any educational level and does not discriminate by the type of competency or learning acquired. Articles whose ATSs can be defined as simulators or training systems for specific procedures, such as a flight or driving simulator, were not specifically excluded. In fact, one study describing an ATS for helicopter pilot training was found, but it was excluded as it was a system description without an empirical component. A brief sentence has been added in the procedure section to clarify this point. 

 

Comment 5: It is slightly difficult to easily distinguish between the grayed out Xs and the solid Xs in Table 5. Please consider removing the grayed out Xs altogether or use a different symbol.

Response 5: Following the reviewer's recommendations, the grey X's have been removed to avoid confusion.

 

Comment 6: For the included articles, the authors stated that they excluded studies that were not empirical. However, studies that were conceptual/theoretical were included, such as [57]. This leads me to believe that the authors did not account for all exclusion criteria and other studies may have been falsely included within the final papers selected, skewing results and conclusion.

Response 6: Indeed, the reviewer is correct in indicating that work Azevedo et al. (2022) should not have been included in the final selection based on the defined inclusion and exclusion criteria. The entire selection of articles has been reviewed again, and it was found that this is the only work that does not meet the defined requirements, so it has been excluded from the selection. However, the selection includes several studies on the particular ATS analyzed in this work, so the derived findings included in the present review remain. The excluded study is used to nuance and complement findings in the results section of the discussion. The new version submitted includes additional tables and simplifies the pre-existing ones to delve deeper into the research design of each selected work, confirming compliance with the established inclusion and exclusion criteria for this review.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you to the authors for responding to the critiques brought up in my review. The comments have been adequately addressed and I suggest accepting the paper.

Back to TopTop