The Promises and Pitfalls of Large Language Models as Feedback Providers: A Study of Prompt Engineering and the Quality of AI-Driven Feedback
Abstract
:1. Introduction
Feedback is a topic of hot debate in universities. Everyone agrees that it is important. However, students report a lot of dissatisfaction: they don’t get what they want from the comments they receive on their work and they don’t find it timely. Teaching staff find it burdensome, are concerned that students do not engage with it and wonder whether the effort they put in is worthwhile. (p. 3)
2. Theoretical Background
2.1. Artificial Intelligence in Higher Education
2.2. Prompt Engineering for Large Language Models in Higher Education
2.3. Feedback
2.3.1. Feedback Quality
2.3.2. Novice and Expert Feedback
2.3.3. Large Language Models as Feedback Providers
3. The Aim of the Study and the Research Questions
- (1)
- What differences emerge in LLM feedback when prompts of varying quality are used?
- (2)
- How does LLM feedback, influenced by prompt quality, compare to novice and expert feedback in terms of feedback quality (specific, empathetic, and engaging)?
4. Method
4.1. The Development of a Theory-Driven Prompt Manual
4.2. Generating LLM Feedback
4.3. Assessment of Feedback Quality
4.4. Coding of the Feedback
4.5. Analysis Method
5. Results
5.1. Differences Between Prompts and Their Output
- LLM feedback generated using prompt 1:
- LLM feedback generated using prompt 3:
5.2. Differences Between Novice, LLM, and Expert Feedback
6. Discussion
6.1. Limitations and Implications
6.2. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Harry, A.; Sayudin, S. Role of AI in education. Interdiscip. J. Hummanity (INJURITY) 2023, 2, 260–268. [Google Scholar] [CrossRef]
- Crompton, H.; Burke, D. Artificial intelligence in higher education: The state of the field. Int. J. Educ. Technol. High. Educ. 2023, 20, 22. [Google Scholar] [CrossRef]
- Prilop, C.N.; Mah, D.; Jacobsen, L.J.; Hansen, R.R.; Weber, K.E.; Hoya, F. Generative AI in Teacher Education: Using AI-Enhanced Methods to Explore Teacher Educators’ Perceptions; Center for Open Science: Charlottesville, VA, USA, 2024. [Google Scholar] [CrossRef]
- von Garrel, J.; Mayer, J. Artificial Intelligence in studies—Use of ChatGPT and AI-based tools among students in Germany. Humanit. Soc. Sci. Commun. 2023, 10, 1–9. [Google Scholar] [CrossRef]
- Fleckenstein, J.; Liebenow, L.W.; Meyer, J. Automated feedback and writing: A multi-level meta-analysis of effects on students’ performance. Front. Artif. Intell. 2023, 6, 1162454. [Google Scholar] [CrossRef] [PubMed]
- Henderson, M.; Ajjawi, R.; Boud, D.; Molloy, E. (Eds.) The Impact of Feedback in Higher Education: Improving Assessment Outcomes for Learners; Springer International Publishing: Berlin/Heidelberg, Germany, 2019. [Google Scholar] [CrossRef]
- Kluger, A.N.; DeNisi, A. The effects of feedback interventions on performance: A historical review, a meta-analysis and a preliminary feedback intervention theory. Psychol. Bull. 1996, 119, 254–284. [Google Scholar] [CrossRef]
- Prilop, C.N.; Weber, K.E.; Kleinknecht, M. Entwicklung eines video- und textbasierten Instruments zur Messung kollegialer Feedbackkompetenz von Lehrkräften. In Lehrer. Bildung. Gestalten.: Beiträge zur Empirischen Forschung in der Lehrerbildung; Beltz Juventa Verlag: Weinheim, Germany, 2019. [Google Scholar]
- Demszky, D.; Liu, J.; Hill, H.C.; Jurafsky, D.; Piech, C. Can automated feedback improve teachers’ uptake of student ideas? Evidence from a randomized controlled trial in a large-scale online course. Educ. Eval. Policy Anal. 2023, 46, 483–505. [Google Scholar] [CrossRef]
- United Nations Educational, Scientific and Cultural Organization (UNESCO). AI Competency Framework for Teachers; UNESCO: Paris, France, 2024. [Google Scholar] [CrossRef]
- Mah, D.K.; Groß, N. Artificial intelligence in higher education: Exploring faculty use, self-efficacy, distinct profiles, and professional development needs. Int. J. Educ. Technol. High. Educ. 2024, 21, 58. [Google Scholar] [CrossRef]
- Cotton DR, E.; Cotton, P.A.; Shipway, J.R. Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innov. Educ. Teach. Int. 2023, 61, 228–239. [Google Scholar] [CrossRef]
- Ifenthaler, D.; Majumdar, R.; Gorissen, P.; Judge, M.; Mishra, S.; Raffaghelli, J.; Shimada, A. Artificial Intelligence in Education: Implications for Policymakers, Researchers, and Practitioners. Technol. Knowl. Learn. 2024, 29, 1693–1710. [Google Scholar] [CrossRef]
- Jensen, L.X.; Buhl, A.; Sharma, A.; Bearman, M. Generative AI and higher education: A review of claims from the first months of ChatGPT. High. Educ. 2024. [Google Scholar] [CrossRef]
- Bsharat, S.M.; Myrzakhan, A.; Shen, Z. Principled instructions are all you need for questioning LLaMA-1/2, GPT-3.5/4. arXiv 2024, arXiv:2312.16171. [Google Scholar] [CrossRef]
- Lo, L.S. The CLEAR path: A framework for enhancing information literacy through prompt engineering. J. Acad. Librariansh. 2023, 49, 102720. [Google Scholar] [CrossRef]
- Zamfrescu-Pereira, J.D.; Wong, R.; Hartmann, B.; Yang, Q. Why Johnny can’t prompt: How non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ‘23), Hamburg, Germany, 23–28 April 2023; ACM: New York, NY, USA, 2023; pp. 1–21. [Google Scholar] [CrossRef]
- ChatGPT; Ekin, S. Prompt engineering for ChatGPT: A quick guide to techniques, tips and best practice. Authorea Prepr. 2023. [Google Scholar] [CrossRef]
- Kipp, M. Wie Sag Ich’s Meiner KI? Hintergründe und Prinzipien zum #Prompting bei #ChatGPT [Video]. 2023. Available online: https://www.youtube.com/watch?v=cfl7q1llkso&t=2382s (accessed on 20 June 2023).
- Weber, K.E.; Prilop, C.N. Videobasiertes Training kollegialen Feedbacks in der Lehrkräftebildung. In Spezifische Aspekte von Trainings pädagogischer Kompetenzen in Abgrenzung zu anderen Lehr-Lern-Situationen in der Lehrkräftebildung: Tagungsband. Online-Tagung an der Uni Rostock am 4. und 5. März 2022 zum Thema “Ist das jetzt schon ein Training? Wie unterscheiden sich Trainings von anderen Lehr-Lern-Situationen in der Lehrkräftebildung?”; Carnein, O., Damnik, G., Krause, G., Vanier, D., Eds.; Publikationsserver RosDok: Rostock, Germany, 2023. [Google Scholar] [CrossRef]
- Narciss, S. Designing and evaluating tutoring feedback strategies for digital learning environments on the basis of the interactive feedback model. Digit. Educ. Rev. 2013, 23, 7–26. [Google Scholar]
- Weber, K.E.; Gold, B.; Prilop, C.N.; Kleinknecht, M. Promoting pre-service teachers’ professional vision of classroom management during practical school training: Effects of a structured online- and video-based self-reflection and feedback intervention. Teach. Teach. Educ. 2018, 76, 39–49. [Google Scholar] [CrossRef]
- Lu, H.-L. Research on peer-coaching in preservice teacher education—A review of literature. Teach. Teach. Educ. 2010, 26, 748–753. [Google Scholar] [CrossRef]
- Kraft, M.A.; Blazar, D.; Hogan, D. The effect of teacher coaching on instruction and achievement: A meta-analysis of the causal evidence. Rev. Educ. Res. 2018, 88, 547–588. [Google Scholar] [CrossRef]
- Ericsson, K.A.; Krampe, R.T.; Tesch-Römer, C. The role of deliberate practice in the acquisition of expert performance. Psychol. Rev. 1993, 100, 363–406. [Google Scholar] [CrossRef]
- Prilop, C.N.; Weber, K.E.; Kleinknecht, M. The role of expert feedback in the development of pre-service teachers’ professional vision of classroom management in an online blended learning environment. Teach. Teach. Educ. 2021, 99, 103276. [Google Scholar] [CrossRef]
- Gielen, M.; De Wever, B. Structuring peer assessment: Comparing the impact of the degree of structure on peer feedback content. Comput. Hum. Behav. 2015, 52, 315–325. [Google Scholar] [CrossRef]
- Prins, F.; Sluijsmans, D.; Kirschner, P.A. Feedback for general practitioners in training: Quality, styles and preferences. Adv. Health Sci. Educ. 2006, 11, 289–303. [Google Scholar] [CrossRef] [PubMed]
- Prilop, C.N.; Weber, K.E. Digital video-based peer feedback training: The effect of expert feedback on pre-service teachers’ peer feedback beliefs and peer feedback quality. Teach. Teach. Educ. 2023, 127, 104099. [Google Scholar] [CrossRef]
- Strijbos, J.W.; Narciss, S.; Dünnebier, K. Peer feedback content and sender’s competence level in academic writing revision tasks: Are they critical for feedback perceptions and efficiency? Learn. Instr. 2010, 20, 291–303. [Google Scholar] [CrossRef]
- Nicol, D.J.; Macfarlane-Dick, D. Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Stud. High. Educ. 2006, 31, 199–218. [Google Scholar] [CrossRef]
- Nicol, D. Assessment for learner self-regulation: Enhancing achievement in the first year using learning technologies. Assess. Eval. High. Educ. 2009, 34, 335–352. [Google Scholar] [CrossRef]
- Weber, K.E.; Prilop, C.N.; Kleinknecht, M. Effects of blended and video-based coaching approaches on preservice teachers’ self-efficacy and perceived competence support. Learn. Cult. Soc. Interact. 2019, 22, 103–118. [Google Scholar] [CrossRef]
- Hattie, J.; Timperley, H. The power of feedback. Rev. Educ. Res. 2007, 77, 81–112. [Google Scholar] [CrossRef]
- Sailer, M.; Bauer, E.; Hofmann, R.; Kiesewetter, J.; Glas, J.; Gurevych, I.; Fischer, F. Adaptive feedback from artificial neural networks facilitates pre-service teachers’ diagnostic reasoning in simulation-based learning. Learn. Instr. 2023, 83, 101620. [Google Scholar] [CrossRef]
- Wang, S.; Wang, F.; Zhu, Z.; Wang, J.; Tran, T.; Du, Z. Artificial intelligence in education: A systematic literature review. Expert Syst. Appl. 2024, 252 Pt A, 124167. [Google Scholar] [CrossRef]
- Zhu, M.; Liu, O.L.; Lee, H.-S. The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing. Comput. Educ. 2020, 143, 103668. [Google Scholar] [CrossRef]
- Bernius, J.P.; Krusche, S.; Bruegge, B. Machine learning based feedback on textual student answers in large courses. Comput. Educ. Artif. Intell. 2022, 3, 100081. [Google Scholar] [CrossRef]
- Dai, W.; Tsai, Y.S.; Lin, J.; Aldino, A.; Jin, H.; Li, T.; Gasevic, D.; Chen, G. Assessing the proficiency of large language models in automatic feedback generation: An evaluation study. Comput. Educ. Artif. Intell. 2024, 7, 100299. [Google Scholar] [CrossRef]
- Narciss, S. Feedback strategies for interactive learning tasks. In Handbook of Research on Educational Communications and Technology, 3rd ed.; Spector, J.M., Merrill, M.D., van Merrienboer, J.J.G., Driscoll, M.P., Eds.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 2008; pp. 125–144. [Google Scholar]
- Pekrun, R.; Marsh, H.W.; Elliot, A.J.; Stockinger, K.; Perry, R.P.; Vogl, E.; Goetz, T.; van Tilburg, W.A.P.; Lüdtke, O.; Vispoel, W.P. A three-dimensional taxonomy of achievement emotions. J. Personal. Soc. Psychol. 2023, 124, 145–178. [Google Scholar] [CrossRef]
- Wittwer, J.; Kratschmayr, L.; Voss, T. Wie gut erkennen Lehrkräfte typische Fehler in der Formulierung von Lernzielen? Unterrichtswissenschaft 2020, 48, 113–128. [Google Scholar] [CrossRef]
- Jacobsen, L.J.; Rohlmann, J.; Weber, K.E. AI Feedback in Education: The Impact of Prompt Design and Human Expertise on LLM Performance. OSF Prepr. 2025. [Google Scholar] [CrossRef]
- Prilop, C.N.; Weber, K.E.; Kleinknecht, M. Effects of digital video-based feedback environments on pre-service teachers’ feedback competence. Comput. Hum. Behav. 2020, 102, 120–131. [Google Scholar] [CrossRef]
- Alkaissi, H.; McFarlane, S.I. Artificial hallucinations in ChatGPT: Implications in Scientific Writing. Cureus 2023, 15, e35179. [Google Scholar] [CrossRef]
- Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, J.; Dai, W.; Madotto, A.; et al. Survey of hallucination in natural language generation. ACM Comput. Surv. 2022, 55, 1–38. [Google Scholar] [CrossRef]
- Wu, Y.; Schunn, C.D. From plans to actions: A process model for why feedback features influence feedback implementation. Instr. Sci. 2021, 49, 365–394. [Google Scholar] [CrossRef]
- Zottmann, J.M.; Stegmann, K.; Strijbos, J.-W.; Vogel, F.; Wecker, C.; Fischer, F. Computer-supported collaborative learning with digital video cases in teacher education: The impact of teaching experience on knowledge convergence. Comput. Hum. Behav. 2013, 5, 2100–2108. [Google Scholar] [CrossRef]
- Fleiss, J.L.; Cohen, J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ. Psychol. Meas. 1973, 33, 613–619. [Google Scholar] [CrossRef]
- Weber, F.; Krause, T.; Müller, L. Enhancing legal writing skills: The impact of formative feedback in a hybrid intelligence system. Br. J. Educ. Technol. 2024. [Google Scholar] [CrossRef]
- Miranda, L.J.V.; Wang, Y.; Elazar, Y.; Kumar, S.; Pyatkin, V.; Brahman, F.; Smith, N.A.; Hajishirzi, H.; Dasigi, P. Hybrid preferences: Learning to route instances for human vs. AI feedback. arXiv 2025, arXiv:2410.19133. [Google Scholar] [CrossRef]
- German Ethics Council (Deutscher Ethikrat). Mensch und Maschine—Herausforderungen durch künstliche Intelligenz. 2023. Available online: https://www.ethikrat.org (accessed on 10 January 2025).
- Zawacki-Richter, O.; Marín, V.I.; Bond, M.; Gouveneur, F. Systematic review of research on artificial intelligence applications in higher education—Where are the educators? Int. J. Educ. Technol. High. Educ. 2019, 16, 39. [Google Scholar] [CrossRef]
- Azaria, A.; Azoulay, R.; Reches, S. ChatGPT is a remarkable tool–For experts. arXiv 2023, arXiv:2306.03102. [Google Scholar] [CrossRef]
Category | Subcategory | Good | Code | Average | Code | Suboptimal | Code | Example/Clarifying Comment |
---|---|---|---|---|---|---|---|---|
Context | Role | The role of the LLM and of the person asking the question is explained | 2 | Only one of the roles is explained | 1 | Neither the role of the LLM nor the role of the person asking the question is explained | 0 | Example: “You are a mathematics tutor assisting a high school student with geometry problems. I am the teacher creating a learning goal for this task”. |
Target audience | There is a clearly defined and described target audience | 2 | The target audience is roughly described | 1 | The target audience is not specified | 0 | Example: “The audience is high school students learning geometry in grade 10, with a focus on foundational concepts like the Pythagorean theorem”. | |
Medium/channel | The medium or channel in which the information is presented is clearly described | 2 | The medium or channel in which the information is presented is roughly described | 1 | The medium or channel in which the information is presented is not mentioned | 0 | Clarifying comment: The “medium or channel” specifies the style or platform in which the output is intended to be presented, such as a Twitter post, an academic essay, an email, or a PowerPoint slide. Defining the format or medium ensures that the LLM tailors its tone, structure, and level of detail to meet the specific requirements of the chosen communication method, making the response more relevant and effective. | |
Mission | Mission/question | The mission of the LLM is clearly described | 2 | The mission of the LLM is roughly described | 1 | The mission of the LLM is not clear | 0 | Example: “The mission is to create a clear and specific learning goal for a high school geometry class that will help students understand the Pythagorean theorem and its applications”. |
Clarity and specificity | Format and constraints | Stylistic properties as well as length specifications are described | 2 | Either stylistic properties are described or a length specification is given | 1 | Neither stylistic properties nor length specifications are given | 0 | Example: “You should provide feedback in concise bullet points, each not exceeding 20 words, and the response should fit within 200 words”. |
Conciseness | The prompt contains only information that is directly related and relevant to the output, and it is clear and concise | 2 | The prompt is concise with little superfluous information | 1 | The prompt contains a lot of information that is irrelevant to the mission/question | 0 | Clarifying comment: “Conciseness” evaluates whether the prompt contains only information that is essential and directly relevant to the task or mission. Unnecessary details can dilute the focus of the response and reduce efficiency. A concise prompt ensures that the LLM concentrates on the key elements, avoiding extraneous or unrelated content. | |
Domain specificity | Technical terms are used correctly and give the LLM the opportunity to refer to them in the answer | 2 | Technical terms are used sporadically or without explanation | 1 | No specific vocabulary that is relevant to the subject area of the question is used | 0 | Example: “Use terminology like ’right triangle’, ’hypotenuse’, and ’Pythagorean theorem’ to ensure alignment with geometry concepts”. | |
Logic | The prompt has a very good reading flow, internal logical coherence, a very coherent sequence of information, and a clearly understandable connection between the content and mission | 2 | The prompt fulfills only some of the conditions of the coding “2” | 1 | The prompt is illogically constructed | 0 | Clarifying comment: “Logic” assesses the internal coherence and sequence of information in the prompt. A logical prompt presents ideas in a clear, structured, and step-by-step manner, ensuring that the LLM can understand the relationships between different elements of the task. Prompts with good logic provide a smooth flow of information, making it easier for the LLM to generate accurate and meaningful responses. |
Category | Good Feedback Definition | Code | Average Feedback Definition | Code | Sub-Optimal Feedback Definition | Code | κ | Good Feedback Example |
---|---|---|---|---|---|---|---|---|
Assessment criteria | Aspects of a good learning goal are addressed using technical terms/theoretical models | 2 | Aspects of a good learning goal are addressed without technical terms/theoretical models | 1 | Aspects of a good learning goal are not addressed | 0 | 0.81 | “However, the learning goal, as currently stated, has room for improvement. The verb ‘recognize’ is on the lower end of Bloom’s taxonomy; it’s more about recall than application or analysis.” (LLM feedback 3.30) |
Specificity | All three error types are named and explicitly explained | 2 | Two types of errors are named and explicitly explained | 1 | One type of error is named and explicitly explained | 0 | 0.81 | “Your goal contains two separate objectives: […]Next, the verbs you’ve chosen, ‘recognize’ and ‘understand’, are a bit vague in the context of Bloom’s taxonomy […]And how do you envision this learning goal relating back to the learner? […]”(LLM feedback 3.28) |
Explanation | A detailed explanation is given regarding why the aspects of a good learning goal are relevant | 2 | A brief explanation is given of why the aspects of a good learning goal are relevant | 1 | No explanation is given regarding why the aspects of a good learning goal are relevant | 0 | 0.86 | “According to best practices, it’s beneficial to focus on just one learning goal at a time. This makes it clearer for both you and the students, streamlining the assessment process.” (LLM feedback 3.14) |
Presence of suggestions for improvement | Alternatives are suggested in a cognitively stimulating way | 2 | Alternatives are presented in concrete terms | 1 | No alternatives are named | 0 | 0.86 | “A more targeted learning goal will focus on just one of these. Which one is your priority?” (LLM feedback 3.28) |
Explanation of suggestions | Alternatives are explained in detail | 2 | Alternatives are briefly explained | 1 | Alternatives are not explained | 0 | 0.82 | “This would align the goal more closely with achieving deeper understanding and skill utilization. […] This goal is learner-centered, contains only one focus, and involves higher-level thinking skills. It also makes the intended learning outcome clear.” (LLM feedback 3.30) |
Errors | The feedback includes several content errors regarding learning goals | −2 | The feedback includes one error regarding learning goals | −1 | The feedback does not include errors regarding learning goals | 0 | 0.90 | |
Questions | The activating question is posed | 2 | The clarifying question is posed | 1 | No questions are posed | 0 | 1.00 | “So, what specific skill or understanding are you hoping your students will gain by the end of this lesson?” (LLM feedback 3.28) |
First person | The feedback is written in first person throughout | 2 | The feedback is occasionally written in first person | 1 | The feedback is not written in first person | 0 | 0.90 | “I appreciate the effort you’ve put into formulating this learning goal for your future teachers. […] Let me share my thoughts with you. Firstly, I noticed […]” (LLM feedback 3.23) |
Valence | There is a balance between positive and negative feedback | 2 | The feedback is mainly positive | 1 | The feedback is mainly negative | 0 | 0.76 | “I don’t think this learning goal is well worded. […] However, I like that your learning goal is formulated in a very clear and structured way.” (Novice feedback 13) |
Category | Concreteness | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Subcategory | Assessment criteria | Explanation | ||||||||||
M | SD | Min. | Max. | M | SD | Min. | Max. | |||||
Prompt 1 | 0.45 | 0.76 | 0 | 2 | 0.25 | 0.44 | 0 | 1 | ||||
Prompt 2 | 1.05 | 0.39 | 0 | 2 | 0.60 | 0.50 | 0 | 1 | ||||
Prompt 3 | 1.95 | 0.22 | 1 | 2 | 1.00 | 0.32 | 0 | 2 | ||||
Empathy | ||||||||||||
First person | Valence | |||||||||||
M | SD | Min. | Max. | M | SD | Min. | Max. | |||||
Prompt 1 | 0.00 | 0.00 | 0 | 0 | 0.85 | 0.56 | 0 | 2 | ||||
Prompt 2 | 0.10 | 0.45 | 0 | 2 | 1.00 | 0.00 | 1 | 1 | ||||
Prompt 3 | 1.05 | 0.83 | 0 | 2 | 1.00 | 0.00 | 1 | 1 | ||||
Activation | ||||||||||||
Questions | Presence of suggestions for improvement | Explanation of suggestions | ||||||||||
M | SD | Min. | Max. | M | SD | Min. | Max. | M | SD | Min. | Max. | |
Prompt 1 | 1.20 | 0.52 | 0 | 2 | 1.15 | 0.75 | 0 | 2 | 0.50 | 0.51 | 0 | 1 |
Prompt 2 | 0.90 | 0.72 | 0 | 2 | 1.15 | 0.37 | 1 | 2 | 1.25 | 0.55 | 0 | 2 |
Prompt 3 | 1.90 | 0.31 | 1 | 2 | 1.50 | 0.51 | 1 | 2 | 1.10 | 0.31 | 1 | 2 |
Correctness | ||||||||||||
Specificity | Errors | |||||||||||
M | SD | Min. | Max. | M | SD | Min. | Max. | |||||
Prompt 1 | 0.10 | 0.30 | 0 | 1 | −0.40 | 0.50 | −1 | 0 | ||||
Prompt 2 | 1.05 | 0.39 | 0 | 2 | −1.25 | 0.79 | −2 | 0 | ||||
Prompt 3 | 1.35 | 0.59 | 0 | 2 | −0.30 | 0.57 | −2 | 0 |
Concreteness | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Assessment criteria | Explanation | |||||||||||
M | SD | Min. | Max. | M | SD | Min. | Max. | |||||
Peers | 0.63 | 0.81 | 0 | 2 | 0.10 | 0.31 | 0 | 1 | ||||
Experts | 1.64 | 0.51 | 1 | 2 | 0.55 | 0.52 | 0 | 1 | ||||
ChatGPT-4 | 1.97 | 0.18 | 1 | 2 | 1.00 | 0.26 | 0 | 2 | ||||
Empathy | ||||||||||||
First person | Valence | |||||||||||
M | SD | Min. | Max. | M | SD | Min. | Max. | |||||
Peers | 1.10 | 0.71 | 0 | 2 | 1.10 | 0.30 | 1 | 2 | ||||
Experts | 1.18 | 0.60 | 0 | 2 | 1.25 | 0.50 | 1 | 2 | ||||
ChatGPT-4 | 1.10 | 0.76 | 0 | 2 | 1.00 | 0.39 | 0 | 2 | ||||
Activation | ||||||||||||
Questions | Presence of suggestions for improvement | Explanation of suggestions | ||||||||||
M | SD | Min. | Max. | M | SD | Min. | Max. | M | SD | Min. | Max. | |
Peers | 0.17 | 0.38 | 0 | 1 | 0.87 | 0.82 | 0 | 2 | 0.30 | 0.54 | 0 | 2 |
Experts | 1.36 | 0.81 | 0 | 2 | 1.73 | 0.47 | 1 | 2 | 0.82 | 0.60 | 0 | 2 |
ChatGPT-4 | 1.86 | 0.44 | 0 | 2 | 1.57 | 0.50 | 1 | 2 | 1.13 | 0.35 | 1 | 2 |
Correctness | ||||||||||||
Specificity | Errors | |||||||||||
M | SD | Min. | Max. | M | SD | Min. | Max. | |||||
Peers | 0.17 | 0.38 | 0 | 1 | −0.73 | 0.87 | −2 | 0 | ||||
Experts | 0.64 | 0.67 | 0 | 2 | −0.18 | 0.60 | −2 | 0 | ||||
ChatGPT-4 | 1.60 | 0.56 | 0 | 2 | −0.17 | 0.46 | −2 | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jacobsen, L.J.; Weber, K.E. The Promises and Pitfalls of Large Language Models as Feedback Providers: A Study of Prompt Engineering and the Quality of AI-Driven Feedback. AI 2025, 6, 35. https://doi.org/10.3390/ai6020035
Jacobsen LJ, Weber KE. The Promises and Pitfalls of Large Language Models as Feedback Providers: A Study of Prompt Engineering and the Quality of AI-Driven Feedback. AI. 2025; 6(2):35. https://doi.org/10.3390/ai6020035
Chicago/Turabian StyleJacobsen, Lucas Jasper, and Kira Elena Weber. 2025. "The Promises and Pitfalls of Large Language Models as Feedback Providers: A Study of Prompt Engineering and the Quality of AI-Driven Feedback" AI 6, no. 2: 35. https://doi.org/10.3390/ai6020035
APA StyleJacobsen, L. J., & Weber, K. E. (2025). The Promises and Pitfalls of Large Language Models as Feedback Providers: A Study of Prompt Engineering and the Quality of AI-Driven Feedback. AI, 6(2), 35. https://doi.org/10.3390/ai6020035