Next Article in Journal
Human-Centred Design Meets AI-Driven Algorithms: Comparative Analysis of Political Campaign Branding in the Harris–Trump Presidential Campaigns
Previous Article in Journal
A Pilot Study Using Natural Language Processing to Explore Textual Electronic Mental Healthcare Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Can AI Technologies Support Clinical Supervision? Assessing the Potential of ChatGPT

1
SiPGI–Postgraduate School of Integrated Gestalt Psychotherapy, 80058 Torre Annunziata, Italy
2
IPGE Istituto di Psicoterapia della Gestalt Espressiva, Via Costantino Morin, 24-00195 Roma, Italy
3
ASPIC Scuola di Psicoterapia, Via Vittore Carpaccio, 32-00147 Roma, Italy
4
iGAT Istituto di Psicoterapia della Gestalt e Analisi Transazionale, Via Pirro Ligorio, 20-80129 Napoli, Italy
5
IGA Istituto Gestalt Analitica, Via Padre Semeria, 33-00154 Roma, Italy
6
SiPGI–Postgraduate School of Integrated Gestalt Psychotherapy, 91100 Trapani, Italy
7
IGP Istituto Gestalt di Puglia, 73010 Arnesano, Italy
8
SGT Scuola Gestalt Torino, Via Po, 14-10123 Torino, Italy
9
IGR Istituto Gestalt Romagna, 48121 Ravenna, Italy
10
IGF Istituto Gestalt Firenze, Scuola di Specializzazione in Psicoterapia della Gestalt a Orientamento Fenomenologico-Esistenziale, 50100 Firenze, Italy
11
Scuola di Specializzazione in Psicoterapia della Gestalt CGV Centro Gestalt viva Claudio Naranjo, 57125 Livorno, Italy
12
Gestalt Therapy Institute, HCC—Human Communication Center, 97100 Ragusa, Italy
*
Author to whom correspondence should be addressed.
Informatics 2025, 12(1), 29; https://doi.org/10.3390/informatics12010029
Submission received: 16 January 2025 / Revised: 28 February 2025 / Accepted: 11 March 2025 / Published: 17 March 2025

Abstract

:
Clinical supervision is essential for trainees, preventing burnout and ensuring the effectiveness of their interventions. AI technologies offer increasing possibilities for developing clinical practices, with supervision being particularly suited for automation. The aim of this study is to evaluate the feasibility of using ChatGPT-4 as a supervisory tool in psychotherapy training. To achieve this, a clinical case was presented to three distinct groups (untrained AI, pre-trained AI, and qualified human supervisor), and their feedback was evaluated by Gestalt psychotherapy trainees using a Likert scale rating of satisfaction. Statistical analysis, using the statistical package SPSS version 25 and applying principal component analysis (PCA) and one-way analysis of variance (ANOVA), demonstrated significant differences in favor of pre-trained AI feedback. PCA highlighted four components of the questionnaire: relational and emotional (C1), didactic and technical quality (C2), treatment support and development (C3), and professional orientation and adaptability (C4). The ratings of satisfaction obtained from the three kinds of supervisory feedback were compared using ANOVA. The feedback generated by the pre-trained AI (f2) was rated significantly higher than the other two (untrained AI feedback (f1) and human feedback (f3)) in C4; in C1, the superiority of f2 over f1 but not over f3 appears significant. These results suggest that AI, when appropriately calibrated, may be an appreciable tool for complementing the effectiveness of clinical supervision, offering an innovative blended supervision methodology, in particular in the area of career guidance.

1. Introduction

Clinical supervision is a crucial element in psychotherapeutic treatments. It consists of a meeting between a psychotherapist and a colleague who has more expertise and is trained for the purpose. The purpose is to provide support for clinical case management through the presentation of another point of view that is external to the dynamics of the therapist–patient dyad. It is an indispensable element in the professional development of therapists, both in its just-described form of clinical supervision, in which the technical work of the psychotherapist with a specific patient is supervised, and in its declination of professional supervision, where it represents a more direct tool for the professional growth of the therapist. Supervision is essential for the professional development of therapists, both with a view to preventing burnout and enhancing their clinical effectiveness [1].
According to Watkins Jr, C.E. (2020), “Psychotherapy supervisors serve first and foremost as agents of transformation, their main goal being to implement and actualize a transformative process of therapist development” [2].
As highlighted by Yontef, G. (1996), colleagues in training, such as trainees and interns, through supervision can enhance awareness of their current therapeutic skills, thus laying an indispensable foundation to further enhance them [3].
One of the most notorious problems inherent in the practice of the health professions and, in particular, those related to mental health, is the emotional depletion of the therapist. The resulting risk of burnout can be managed or prevented through supervision, which turns out to be a key element for this purpose. It also serves as a catalyst in the personal and professional development of psychotherapists, especially in the early stages of their careers [4]. Indeed, the well-being of the therapist is an indispensable factor for the quality of clinical intervention in terms of efficiency and effectiveness [5]. The new potential for clinical practice offered by technological development is made evident by several studies [6,7,8,9]. ChatGPT-4 is a Large Language Model (LLM) of the Generative Pre-trained Transformer (GPT) type. GPT models are based on the transformer architecture: a deep learning model that employs self-attention mechanisms to assign different weights to different parts of the input data. This architecture enables ChatGPT-4 to analyze and generate text with a high degree of consistency and adaptability [10].
This type of architecture is primarily used in natural language processing tasks. An LLM based on such mechanisms is developed through a sequence of key macro-steps. First, the model undergoes pretraining on vast amounts of unlabeled textual data. During this stage, the LLM learns to predict the next word in a sentence, allowing it to acquire extensive knowledge of the language. Second, the model’s performance is fine-tuned using two approaches: supervised fine-tuning and reinforcement learning with human feedback (RLHF). Supervised fine-tuning consists of training the model on a smaller, more specific dataset of labeled examples (input–output); this allows the model to be adapted to specific tasks. Finally, reinforcement learning with human feedback (RLHF) uses feedback from real people to reinforce model responses that are considered correct, further improving performance and aligning the model with human preferences [11].
A fundamental mechanism behind transformers is the aforementioned self-attention: it allows the model to contextualize information by calculating weights that indicate the relevance of each word relative to all other words in the sentence. In this way, the model can identify relationships between words and maintain narrative coherence even in complex responses. This capability makes ChatGPT-4 particularly suitable for contexts where the accuracy of language comprehension is crucial, such as in the clinical supervision simulation that is the subject of this study. In this context, we place clinical supervision in relation to artificial intelligence technologies of the “Generative Pre-Trained Transformer” type, in this case, ChatGPT-4, as a new angle of investigation to delve into how they can be used and their possible usefulness for the improvement and implementation of clinical supervision. The aim is to increasingly and accurately assess the feasibility of making them a complementary tool for the pursuit of the purposes of classical supervision [12]. In particular, the use of AI could support psychotherapists in training to help address the problem of the current need for efficient clinical supervision by simulating feedback generation in anonymized clinical trials and, at the same time, allow the addition of valid automated tools to complement traditional supervision.

2. AIMS

This paper is part of a broader program aimed at exploring the potential of using ChatGPT as a support tool for psychotherapists by testing the AI’s ability to provide useful feedback for conducting psychotherapy sessions. The previous work involving ChatGPT-4 lacked the long-term memory (LTM) function; for this reason, the training psychotherapists dialogued with the chat to create an expert prompt, appropriately trained in Gestalt supervision. This provides the therapist with sufficient reliable supervision output compared to a clinical case presented. In this study, we evaluated the satisfaction degree of trainees about two supervisory feedback generated by a ChatGPT-4 model: the first untrained and the second trained with a prompt developed and described in a previous study [12]. Figure 1 describes the steps taken to create this prompt in the earlier study [12].
The possibility of integrating ChatGPT into psychotherapy and psychotherapeutic supervision is a topic of recent interest. Some preliminary research has explored the potential of ChatGPT as a therapeutic assistant [13,14].
The ultimate purpose of the curriculum in which this paper is embedded is to identify ways in which ChatGPT can be used more to support the work of clinicians in what concerns the therapeutic process and the professional development of therapists, with a view to complementing traditional supervision settings.
Some researchers have recently (5 September 2024) integrated LLMs, such as ChatGPT, with long-term memory (LTM), an ability to remember information beyond the current conversation session, to support ongoing interactions. This new feature allows the chatbot to store information on certain topics from each session and provide it to the model along with other inputs in each conversational turn, providing insights that the chatbot can refer to [15].

3. Methodology

We submitted a clinical case (anonymized) to two different ChatGPT-4 interfaces; the text provided by the AI chatbot included medical history data, question analysis, a previously conducted psychodiagnostic assessment, a detailed description of the patient’s personality functioning, and a description from the therapist’s perspective, of the first three sessions of psychotherapy, with a special focus on the therapist’s subjective experience in session.
We proposed the clinical case described above to two different chat interfaces: in one (test 1), we presented the case without pretraining the chat session regarding the scope of expertise of gestalt therapy and related supervision; in a further chat interface (test 2), we introduced the scope of expertise using a prompt as a pretraining tool, before proceeding with the presentation of the case. This prompt was generated using ChatGPT, and its reliability as a pretraining tool for the chat session was verified as a good alternative to a longer and more complex training process in a previous study we conducted [12]. The term “pretraining” refers to the process of using the memory function of ChatGPT to refine responses to the prompt. It should be noted that for the generation of this feedback, no systematic training or further model adjustments based on iterative feedback. The supervisory feedback is the result of the previous study [12], in which the first phase of chatbot testing served to verify the performance of ChatGPT-4 in understanding input regarding the Gestalt psychotherapy context. The model was tested to generate consistent and detailed responses in the context of psychotherapy, including the use of specific Gestalt Therapy techniques. This included creating imaginary transcripts of psychotherapy sessions to test understanding of the model. Next, an initial prompt was developed based on previous interactions with the model. This prompt was generated to create a theoretical and contextual framework to direct the model toward the specific domain of Gestalt supervision. Then, experiments were conducted both in a conversation window containing previously provided in-depth context and in a new “clean” conversation window. It was found that the context provided by the prompt significantly improved the quality and specificity of responses, confirming the importance of adequate pre-orientation over that of the “clean” window. The model provided consistent and detailed feedback in response to clinical cases, even when the details presented or case wording were varied. The validity of the responses was confirmed by the study’s human co-authored supervisors.
This suggests that the potential of ChatGPT can be optimized for clinical supervision purposes through appropriate framing of the dialogue without the need for additional training interventions.
In parallel (test 3), the same document presented to the chatbot was studied by an expert psychotherapist and supervisor who wrote his own feedback. The first feedback places a specific focus on both the strengths of the therapeutic process and the areas in need of improvement, providing an evaluation of the therapeutic strategies used. The second feedback, on the other hand, obtained through the use of the prompt, places more emphasis on the therapist’s use of “self”, that is, how they use their personal experiences in therapy, emphasizing the importance of supervision in managing transference and countertransference. Finally, the third contribution focuses more on the therapeutic relationship through two reflections that provide a better understanding of countertransference dynamics and the use of metaphors in therapy.
The two AI-generated feedbacks, along with the one written by an experienced supervisor, were presented to the attention of a group of trainees in integrated gestalt psychotherapy (aged 25–40 randomly acquired) after a comprehensive clinical case presentation and description of the sessions. The presentation of the 4 feedback items to the evaluators was blind, with no information as to how and by whom they were written.
For each of the 4 items, they were asked to provide an evaluation by responding to a Likert satisfaction questionnaire specially built on a 4-grade Likert scale (1 = Poor/Fair; 2 = Sufficient/Little; 3 = Good/Fair; 4 = Excellent/Very Good). The development of the above questionnaire (Table 1) was partially inspired by P. Clarkson’s model of supervision in Transactional Analysis. According to the author, supervision constitutes a continuous and hierarchical process characterized by the figure of a directive and responsible supervisor and a supervisee who is required to report regularly on the status of their work [16]. The seven components characterizing this model of supervision are as follows: (1). clear and appropriate contract; (2). key problems identified; (3). effective emotional contact with the trainee; (4). protection of both trainee and patient; (5). increased developmental directions; (6). awareness and effective use of the parallel process; and (7). equal relationship. The author discusses the characteristics of these seven components in relation to the three different stages of trainee development (beginning, intermediate, advanced) [16].
In this study, a combination of metrics, evaluatives, and specifics is useful for evaluating the effectiveness of ChatGPT as an integrative tool for clinical supervision. Metrics include the Likert scale [17,18], which is used to measure trainees’ enjoyment of supervision feedback. The Likert scale is a commonly used instrument in clinical psychology, social work, and other fields to assess perceptions of service quality.
The questionnaire was calibrated using data from 71 subjects, resulting in a total of 198 responses. The overall Cronbach’s alpha for the questionnaire was 0.92, indicating excellent internal consistency. The Cronbach’s alpha values for each subscale were as follows: Subscale 1—0.82, Subscale 2—0.82, Subscale 3—0.75, and Subscale 4—0.78.
Analysis of variance (ANOVA) was also used as a method of comparing the performance of ChatGPT with that of an experienced supervisor. These metrics were adopted for their ability to provide objective and comparable measurements of the performance. The methodology details are resumed in Figure 2.
This combination of standard and specific methods allowed us to draw well-founded conclusions about the effectiveness of ChatGPT in the context of clinical supervision.

4. Results

For the purpose of data analysis, the three forms of feedback evaluated were assigned the following designations: the feedback generated by ChatGPT without prior interactions was labeled “Feedback 1” or “Fb1”; the feedback generated by ChatGPT after pretraining by means of a special prompt [12] was labeled “Feedback 2” or “Fb2”; the feedback written by a supervising qualified psychotherapist was labeled “Feedback 3”.
While the first feedback places a specific focus both on the strengths of the therapeutic process and on the areas that need improvement, the second feedback highlights more the use of the therapist’s “self”, i.e., the way in which the therapist uses their personal experiences in therapy, underlining the importance of supervision for the management of transference and countertransference. Finally, the third contribution focuses on the therapeutic relationship through two reflections that allow a better understanding of the countertransference dynamics and the use of metaphors in therapy.
The proposed 16-item questionnaire for each feedback was completed by 25 subjects for Fb1, 24 for Fb2, and 22 for Fb3, for a total of 71 ratings. Sufficient to conduct a principal component analysis (Table 2).
This analysis identified four components that grouped the areas investigated by the 16 items.
PCA, using the varimax rotation method, identified the following dimensions:
  • Relational and emotional dimension: empathic approach, usefulness of reflection, confidence support, emotional impact (on the evaluator).
  • Didactic and technical quality: clarity, relevance, thoroughness, analysis of techniques.
  • Treatment support and development: how helpful the feedback is to the treatment; the presence of practical suggestions to the therapist; how much it highlights areas for improvement; how equal communication appears.
  • Professional orientation and adaptability: How deontologically oriented the feedback is; how much the feedback helps the contract; how well the feedback fits the therapist’s professional level; how helpful the feedback is to the therapist’s professional development. The individual component scores were obtained by summing the Likert scale values adjusted for the value of the coefficients. PCA explains 68.179% of the variance.
A one-way analysis of variance was conducted to assess the differences between the three feedbacks for each identified component (Table 3). This analysis revealed a significant internal difference between the feedback ratings for component 1 (relational and emotional dimensions) and component 4 (career orientation and adaptability); however, such significance is not found for the other two components.
In the subsequent analysis of multiple comparisons (Table 4), the significance of the differences in the ratings received for the three forms of feedback for the individual components was assessed. Component one, which refers to the relational and emotional dimension as analyzed in Table 3, includes item clarity as one of its key aspects in Table 4.
Regarding component one, the differences in ratings between fb 1 and fb 2 and between fb 1 and fb 3 were significant; in short, fb 1 appears to differ from both other fb. No significant difference is observed between fb 2 and fb 3.
Regarding components two and three, there is no significant difference in the feedback comparisons.
Regarding component four, there is a significant difference between fb 1 and fb 2, but not between fb 1 and fb 3. Also, a significant difference between fb 2 and fb 3 is observed, unlike what can be observed in component one.
To make a specific comparison between the AI-generated feedback (Fb1 and Fb2) and the human feedback (Fb3), a comparison was made between the ratings of the individual items, regardless of the components previously identified (Table 4). A statistically significant difference emerged between the three forms of feedback in the items related to empathic approach, contract help, appropriateness to the supervisee’s professional level, highlighting areas for improvement, trust help, and emotional impact. Feedback 2, generated by the pre-trained AI, is the one that received a higher average score than the untrained AI in the items related to empathetic approach, help with the contract, suitability for the professional level, highlighting areas for improvement, and helps with trust. In the item related to emotional impact, however, the pre-trained AI (fb 2) received a higher average score than both the untrained AI (fb 1) and human feedback (fb3).

5. Discussions

This study explored the effectiveness of CHAT-GPT-4 in the area of psychotherapeutic supervision: this was performed through a comparison of feedback generated by trained and untrained AI with that of an experienced human supervisor. The results highlight the potential effectiveness of AI, trained through the use of a specific prompt, in providing accurate supervisory feedback in specific areas.
As noted in the results, four components were identified from the principal component analysis that grouped the different areas investigated by the 16 items designed to assess the quality of the three different forms of supervisory feedback. Specifically, the analysis of variance showed there were differences in the ratings of the three feedback items with respect to the first component, “Relational and Emotional Dimension”, and the fourth component, “Professional Orientation”. With the subsequent analysis of multiple comparisons, it was possible to investigate how the ratings of the three feedbacks differed with respect to the above two components. The results showed the following:
(a) A higher satisfaction with fb2 and fb3 than with fb1, regarding the relational and emotional dimension; (b) a higher satisfaction with fb2 than with both fb1 and fb3, regarding the dimension related to professional orientation.
The first result would seem to indicate greater effectiveness of pretrained AI (fb2) than non-pretrained AI (fb1) in providing supervisory feedback with respect to the relational and emotional dimensions, which relate to the empathic approach, the usefulness of reflection, helping trust, and the emotional impact on the evaluator.
This result seems to be linked to the AI’s pretraining conducted through the prompt, which enabled it to focus on relational and emotional aspects, core elements of Gestalt psychotherapy [12].
The second result, on the other hand, would attest to greater effectiveness of the pretrained AI (fb 2) compared to both the non-pretrained AI (fb1) and the supervising experienced psychotherapist (fb3) in providing supervisory feedback with respect to the dimension of professional orientation, which concerns aspects related to how deontologically oriented the feedback is, how much it helps the contract, how well it fits the therapist’s professional level, and, finally, how useful it is for the therapist’s professional development.
Therefore, the second feedback, focusing more on the therapist’s use of the “self” and on the importance of supervision for the management of transference and countertransference, is more effective in framing the dimension of professional orientation if compared to the first and the third contribution. In fact, the first one places a specific focus both on the strengths of the therapeutic process and on the areas that need improvement; however, the third, similarly to the feedback generated by the pre-trained AI, mostly focuses on the therapeutic relationship and the subjectivity of the therapist as an element of the field [18], highlighting a better understanding of the countertransference dynamics and the use of metaphors in therapy.
This result seems to highlight the possibility of using AI to enrich the training process of therapists and for accurate and thorough information management and data analysis [19], and it can be integrated into clinical supervision as a real-time support tool useful for improving trainees’ relational and reflective skills.
In contrast, the subsequent specific comparison of AI-generated feedback (fb1 and fb2) and human feedback (fb3), carried out by comparing ratings to individual items without considering the previously identified components, revealed the following: (a) A higher mean score of fb2 with respect to the individual items of emotional approach, contract help, appropriateness to professional level, confidence help, and emotional impact, and (b) a higher mean score of fb3 with respect to the item on highlighting areas for improvement. Specifically, the first result would indicate the greater effectiveness of the pre-trained AI (fb2) compared to both the non-pre-trained AI (fb1) and the expert therapist supervisor (fb3) in providing supervisory feedback in terms of emotional approach, assistance with the therapeutic contract, appropriateness to the professional level, support in building confidence, and emotional impact. This result could reflect AI’s ability to analyze and process large amounts of data and use this function to formulate responses that resonate linguistically empathetic or knowledgeable with respect to the situations described [20]
The second result, on the other hand, would highlight the greater effectiveness of the supervising expert psychotherapist (fb3) compared to the AI (fb1 and fb2) in capturing and highlighting areas in need of improvement.
A higher average score of fb2 than fb1 for the individual items of the emotional approach helps with the contract, is suitable for the professional level, highlights areas for improvement, and helps with trust.
A higher average score of fb2 than both fb1 and fb3 for the item relating to the emotional impact. In particular, the first result would attest to the greater effectiveness of the pre-trained AI (fb2) compared to the non-pre-trained AI (fb1) in providing supervisory feedback with respect to the empathic approach, help with the contract, suitability for the professional level, highlighting areas for improvement and help with trust, obtaining results like those of human feedback. This result could reflect the ability of the AI to analyze and process large amounts of data and to use this function to formulate responses that sound linguistically empathetic or aware with respect to the situations described [17].
The second result, instead, would highlight the greater effectiveness of the pre-trained AI (fb2) compared to both the non-pre-trained AI (fb1) and the expert psychotherapist supervisor (fb3) in providing feedback focused on the emotional impact.
This result represents an opportunity to fill gaps between traditional supervision sessions, giving new insights to improve the relational outcome of the supervision. This is also in line with a similar study involving 600 participants to evaluate the level of empathy in the responses generated by humans and ChatGPT; this research also highlighted that the average empathy rating of the responses generated by ChatGPT exceeded those provided by humans by about 10% to a variety of positive and negative emotional situations [19].
In the long-term, the use of AI in clinical supervision would constitute a valid assistance in supporting trainees and offering them feedback on therapeutic strategies, especially in understanding and managing transference and countertransference. This would help prevent burnout through continuous and easily accessible monitoring, offering immediate guidance to contain emotional exhaustion. All this would favor professional growth, allowing the therapists to develop self-reflection skills and increase self-esteem and self-efficacy. Finally, a positive impact on the adaptability and professional orientation of psychotherapists is desirable.

6. Conclusions

The findings of this paper demonstrate the capability of ChatGPT-4 (particularly when pre-trained with targeted prompts) to provide effective supervisory feedback in psychotherapy training, showing how AI can be a useful complementary tool to support psychotherapists’ clinical work and training, in particular at three key levels: professional development (improving self-reflection, empathy, and achievement professional standards); clinical supervision (aiding the analysis of the therapeutic techniques and interpersonal dynamics); real-time support (offering personalized guidance during therapy sessions). It became evident, in fact, how supervisory feedback provided by an adequately pre-trained interface of ChatGPT-4 was perceived by psychotherapists-in-training as fully valid and, in part, comparable to human feedback.
Pre-trained AI outperformed untrained AI and performed comparably to human supervisors in several aspects of professional guidance. This suggests a potential role for AI as a complementary tool, enhancing efficiency and access in psychotherapy supervision. Further studies are needed to refine the training prompts and explore long-term outcomes in professional settings, paving the way for a blended approach to supervision that combines human expertise with AI support.
In research, the potential uses of artificial intelligence models in psychotherapy are gaining interest, with promising results that highlight it as a valid complementary tool to clinical practice [21]. However, the literature does not yet seem to highlight more specific areas of interest regarding particular aspects of psychotherapists’ work, such as supervision and specialist training. With this paper, in continuity with the previous one [12], we also intended to fill this gap, proposing to the scientific and professional community this more targeted area of investigation. Psychotherapy, as both a science and art focused on the relationship with the human being, recognizing their complexity, authenticity, and individuality in an inseparable connection with the ever-changing world, remains alive over time by evolving, updating itself, and adopting new forms in continuous co-adaptation with the transformations in the human–society–culture field [22]. The integration of new languages, new forms, and “places” (physical and non-physical) of interaction and relationship [23] and the new possibilities offered by technology must necessarily be part of this integration. As already outlined in the introduction to this work, clinical and professional supervision in the psychotherapeutic field assumes, among other things, an irreplaceable tool for preventing burnout [5]. However, the realization of a supervision meeting is not always immediately feasible when the need arises: integrating a valid AI-based tool as a support tool for the therapist, which he can use between supervision meetings, can prove extremely useful in the quality of the management of his own well-being and his own effectiveness in clinical practice. As discussed below, in future developments of this work, such integration could occur by cyclically making: (1) the reports of the supervision sessions ChatGPT training material by the therapist; (2) the supervision feedback provided by the AI object of real supervision. The long-term memory function, recently implemented in ChatGPT-4, would make this process easily achievable since, differently from what happened when the foundations of this study were laid, the learnings of said AI within the same account are used transversally in all conversation interfaces.

7. Limitations and Future Developments

The present study has some limitations related to the methodology: it is a single-case study, and the sample could limit the generalizability of the results obtained due to the low numerosity and homogeneity of the subjects. The sample selected for this study, consisting only of psychotherapy trainees, in fact, would require the involvement of psychotherapists with different levels of professional development.
The failure to evaluate the diverse sociocultural backgrounds of the participants, which could have influenced their evaluations of the feedback, may also represent a limitation of this research. Many studies, in fact, have highlighted that the perception of empathy is different in various cultures [24,25]. A further limitation of the present research may be the significant ethical issues it raises: although AI can provide technically correct supervisory feedback, its ability to understand and respond to human emotional complexities remains limited. As such, its application should be carefully balanced with human input in contexts requiring emotional sensitivity and ethics, such as psychotherapy.
As already mentioned in the conclusions of this work, we could think that the integration of the strengths of both types of feedback, the one generated by AI and the human one, could result in a blended supervision methodology characterized by the possibility of integrating a first supervisory feedback generated by AI with a second level supervision, carried out by an expert supervisor starting from the contribution of the AI itself. This integration of human creativity and AI systematicity, from the perspective of mutual support and learning, would allow for greater timeliness and accuracy of the supervision intervention, also preventing phenomena particularly frequent in helping professions such as burnout.
Our study represents a starting point for future developments to improve the experimentation of the use of ChatGPT in different healthcare contexts. Considering the vast and ever-evolving landscape of AI technologies, we believe that future studies should compare multiple AI platforms to evaluate their relative strengths and limitations in clinical supervision.
We recognize that a further potential limitation of our work is represented by the fact that only one expert provided feedback to compare with ChatGPT. Future research could involve more experts to make a more robust comparison.
This study, highlighting the effectiveness of AI in providing feedback assessed as empathically valid, could suggest the possibility in the future of implementing a tool based on AI itself, which, through the monitoring of the subjects’ involuntary physiological responses, such as heart rate, breathing, and muscle tone, uses these data to develop the ability to provide outputs that have a more empathic impact [26,27,28].

Author Contributions

Conceptualization: L.L.M., E.M., A.A., C.M., A.F., A.L. and G.S.; Methodology: S.C. and C.B.; Validation: V.B.; Resources: E.T. (Efisio Temporin); Writing—Original Draft: O.R. and E.T. (Enrica Tortora); Writing—Review and Editing: V.C. and R.S.; Supervision: E.G. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki and the Ethical Code for Research in Psychology of the Italian Association of Psychology (AIP), approved in 2015 and updated in July 2022 to comply with GDPR regulations (aipass.org). Since the study did not involve clinical interventions or the collection of sensitive data requiring formal approval from an ethics committee, obtaining a specific ethical approval code was not necessary. However, all procedures adhered to ethical standards to protect participants, ensuring anonymity, data confidentiality, and obtaining informed consent.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request (dr.valeria.cioffi@gmail.com). Due to privacy and ethical considerations, the data are not publicly accessible.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Giusti, E.; Montanari, C.; Spalletta, E. La Supervisione Clinica Integrata. Manuale di Formazione Pluralistica in Counseling e Psicoterapia; Elsevier: Amsterdam, The Netherlands, 2000. [Google Scholar]
  2. Watkins, C.E., Jr. The psychotherapy supervisor as an agent of transformation: To anchor and educate, facilitate and emancipate. Am. J. Psychother. 2020, 73, 57–62. [Google Scholar] [CrossRef] [PubMed]
  3. Yontef, G. Supervision from a Gestalt therapy perspective. Br. Gestalt. J. 1996, 5, 92–102. [Google Scholar] [CrossRef]
  4. Rønnestad, M.H.; Orlinsky, D.E.; Schröder, T.A.; Skovholt, T.M.; Willutzki, U. The professional development of counsellors and psychotherapists: Implications of empirical studies for supervision, training and practice. Couns. Psychother. Res. 2019, 19, 214–230. [Google Scholar] [CrossRef]
  5. Watkins, C.E., Jr. Does psychotherapy supervision contribute to patient outcomes? Considering thirty years of research. Clin. Superv. 2011, 30, 235–256. [Google Scholar] [CrossRef]
  6. Cioffi, V.; Mosca, L.L.; Moretto, E.; Ragozzino, O.; Stanzione, R.; Bottone, M.; Sperandeo, R. Computational Methods in Psychotherapy: A Scoping Review. Int. J. Environ. Res. Public Health 2022, 19, 12358. [Google Scholar] [CrossRef] [PubMed]
  7. Tahan, M.; Zygoulis, P. Artificial Intelligence and Clinical Psychology, Current Trends. J. Clin. Dev. Psychol. 2020, 2, 31–48. [Google Scholar]
  8. Miner, A.S.; Shah, N.; Bullock, K.D.; Arnow, B.A.; Bailenson, J.; Hancock, J. Key considerations for incorporating conversational AI in psychotherapy. Front. Psychiatry 2019, 10, 746. [Google Scholar] [CrossRef] [PubMed]
  9. Luxton, D.D. Artificial intelligence in psychological practice: Current and future applications and implications. Prof. Psychol. Res. Pract. 2014, 45, 332. [Google Scholar] [CrossRef]
  10. Brown, J.E.; Halpern, J. AI chatbots cannot replace human interactions in the pursuit of more inclusive mental healthcare. SSM Ment. Health 2021, 1, 100017. [Google Scholar] [CrossRef]
  11. OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; McGrew, B. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
  12. Cioffi, V.; Ragozzino, O.; Scognamiglio, C.; Mosca, L.L.; Moretto, E.; Stanzione, R.; Marino, F.; Acocella, A.; Ammendola, A.; D’Aquino, R.; et al. Towards integrated AI psychotherapy supervision: A proposal for a ChatGPT-4 study. 2025; in press. [Google Scholar] [CrossRef]
  13. Eshghie, M.; Eshghie, M. ChatGPT as a therapist assistant: A suitability study. arXiv 2023, arXiv:2304.09873. [Google Scholar] [CrossRef]
  14. Vahedifard, F.; Haghighi, A.S.; Dave, T.; Tolouei, M.; Zare, F.H. Practical Use of ChatGPT in Psychiatry for Treatment Plan and Psychoeducation. arXiv 2023, arXiv:2311.09131. [Google Scholar]
  15. Jo, E.; Jeong, Y.; Park, S.; Epstein, D.A.; Kim, Y.H. Understanding the impact of long-term memory on self-disclosure with large language model-driven chatbots for public health intervention. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11—16 May 2024; pp. 1–21. [Google Scholar]
  16. Clarkson, P. Psicoterapia Analitico-Transazionale: Un Approccio Integrato; Routledge: Londra, UK, 1992. [Google Scholar]
  17. Cronbach, L.J. Essentials of Psychological Testing; Harper & Row: New York, NY, USA, 1970. [Google Scholar]
  18. Jebb, A.T.; Ng, V.; Tay, L. A review of key Likert scale development advances: 1995–2019. Front. Psychol. 2021, 12, 637547. [Google Scholar] [CrossRef] [PubMed]
  19. Manzotti, R.; Rossi, S. IO & IA Mente, Cervello & GPT; Rubbettino: Soveria Mannelli, Italy, 2023. [Google Scholar]
  20. Lewin, K. Teoria del Campo Delle Scienze Sociali: Selected Theorical Papers; -Hardcover-; Harper & Brothers: Manhattan, NY, USA, 1951. [Google Scholar]
  21. Welivita, A.; Pu, P. Is ChatGPT More Empathetic than Humans? arXiv 2024, arXiv:2403.05572. [Google Scholar]
  22. Raile, P. The usefulness of ChatGPT for psychotherapists and patients. Humanit. Soc. Sci. Commun. 2024, 11, 1–8. [Google Scholar] [CrossRef]
  23. Francesetti, G.; Gecele, M.; Roubal, J. (Eds.) La Psicoterapia Della Gestalt Nella Pratica Clinica; Dalla Psicopatologia All’estetica del Contatto: Dalla Psicopatologia All’estetica del Contatto; FrancoAngeli: Milan, Italy, 2014. [Google Scholar]
  24. Lancini, M. Il Ritiro Sociale Negli Adolescenti: La Solitudine di una Generazione Iperconnessa; Raffaello Cortina Editore: Milan, Italy, 2020. [Google Scholar]
  25. Birkett, M. Autocompassione ed empatia Attraverso le culture: Confronto tra i giovani adulti in Cina e gli Stati Uniti. G. Internazionale Cerca Studi Psicol. 2014, 3, 25–34. [Google Scholar]
  26. Chopik, W.J.; O’Brien, E.; Konrath, S.H. Differences in empathic concern and perspective taking across 63 countries. J. Cross-Cult. Psychol. 2017, 48, 23–38. [Google Scholar] [CrossRef]
  27. Blackmore, K.L.; Smith, S.P.; Bailey, J.D.; Krynski, B. Integrating Biofeedback and Artificial Intelligence into eXtended Reality Training Scenarios: A Systematic Literature Review. Simul. Gaming 2024, 55, 445–478. [Google Scholar] [CrossRef]
  28. Sperandeo, R.; Di Sarno, A.D.; Longobardi, T.; Iennaco, D.; Mosca, L.L.; Maldonato, N.M. Toward a technological oriented assessment in psychology: A proposal for the use of contactless devices for heart rate variability and facial emotion recognition in psychological diagnosis. In Proceedings of the First Symposium on Psychology-Based Technologies co-located with XXXII National Congress of Italian Association of Psychology—Development and Education Section (AIP 2019), Naples, Italy, 25–26 September 2019. [Google Scholar]
Figure 1. Flowchart step-by-step explanation of the process used to construct the customized prompt for ChatGPT.
Figure 1. Flowchart step-by-step explanation of the process used to construct the customized prompt for ChatGPT.
Informatics 12 00029 g001
Figure 2. A methodology diagram summarizing the research process.
Figure 2. A methodology diagram summarizing the research process.
Informatics 12 00029 g002
Table 1. Satisfaction questionnaire.
Table 1. Satisfaction questionnaire.
ITEMSNot at All/PoorA Little/SufficientQuite
/Good
Very Good/Excellent
Did you find the feedback clear?1234
Is this feedback relevant to the clinical case presented?1234
Is this feedback comprehensive?1234
Did this feedback help the supervisee define the therapeutic contract with the patient?1234
Does this feedback provide useful insights for the patient’s treatment?1234
Does this feedback take into account the supervisee’s professional level?1234
Does this feedback contribute to the professional development of the therapist?1234
Did this feedback adequately address the ethical and deontological aspects of the clinical case?1234
Does this feedback provide practical suggestions useful for the therapist?1234
Does this feedback offer an analysis of the techniques used in the sessions?1234
Does this feedback constructively highlight any areas for improvement?1234
Did reading this feedback have an emotional impact on you?1234
Do you consider this feedback to be characterized by an empathetic approach?1234
Does this feedback come across as a collegial communication between peers?1234
Did this feedback help strengthen the supervisee’s capacity for self-reflection?1234
Has this feedback improved the supervisee’s confidence in clinical case management?1234
Table 2. Principal component analysis (PCA) extracted from the 16-item liking test.
Table 2. Principal component analysis (PCA) extracted from the 16-item liking test.
Component
C1:
Relational and Emotional Dimension
C2:
Didactic and Technical Quality
C3:
Treatment Support and Development
C4: Professional Orientation and Adaptability
Empathic approach0.812
It helps self-reflection0.633
It helps confidence0.741
Emotional impact0.776
Clarity 0.846
Relevance 0.544
Completeness 0.733
Analysis of techniques 0.637
Treatment aids 0.489
Practical suggestions to the therapist 0.82
It highlights areas for improvement 0.806
Peer-to-peer communication 0.512
Deontologically oriented 0.578
It helps contract 0.667
Appropriate to professional level 0.772
Useful for professional development 0.683
Extraction method: Principal component analysis.
Rotation method: Varimax with Kaiser normalization. The convergence for rotation performed in 7 iterations.
KMO measure of sampling adequacy = 0.860
Table 3. ANOVA comparing the ratings of the 4 components across the three forms of feedback.
Table 3. ANOVA comparing the ratings of the 4 components across the three forms of feedback.
AverageStandard DeviationOf 95% Confidence Interval
Lower LimitUpper LimitFp
C1: Relational and emotional dimensionfb110.3952.7739.25011.5405.733
fb213.011 **2.25512.05813.9620.005
2.616 (A.D.)0.779 (S.E.)1.0614.1700.001
fb311.9823.11410.60113.363
C2: Didactic and Technical Qualityfb112.5022.37011.52413.4811.2870.283
fb213.6252.32712.64214.607
fb312.8492.80211.60614.091
C3: Treatment support and developmentfb111.2812.05910.43112.1310.2690.765
fb210.9662.15310.05711.875
fb311.4502.63110.28312.616
C4: Professional Orientation and Adaptabilityfb110.4232.0809.56411.2823.385
fb211.911 *1.96111.08212.739 0.04
1.488 (A.D.)0.648 (S.E)0.1942.7820.025
fb310.4382.749.22311.652
fb1: untrained AI feedback; fb2: trained AI feedback; fb3: human feedback. ** Significantly different score with both two scores; * significantly different score between fb2 and fb3. A.D.: average difference; S.E.: standard error. The scores highlighted in gray refer to the post hoc analysis.
Table 4. ANOVA comparing the ratings of the 16 items across the three forms of feedback.
Table 4. ANOVA comparing the ratings of the 16 items across the three forms of feedback.
Items AverageStandard Deviationof 95% Confidence Interval for Average
LLULFp
Clarityfb13.080.642.823.34
2.42



0.097
0.032

fb23.33
0.470 (A.D)
0.702
0.214
(S.E.)
3.04
0.04
3.63
0.90
fb32.860.8342.493.23
Relevancefb13.080.7022.793.370.8870.417
fb23.290.6933.58
fb33.320.6463.033.6
Completenessfb12.880.7812.563.20.9140.406
fb23.130.6122.873.38
fb32.860.8342.493.23
Empathic approachfb12.280.8431.932.63
17.408





<0.001
0.000
0.006



fb23.33 *
1.053
0.697
(A.D.)
0.637
0.239
0.247
(S.E.)
3.06
0.20
3.6
1.19
fb32.641.0022.193.08
Deontologically orientedfb12.680.8522.333.030.2120.647
fb22.630.8242.282.97
fb32.361.0021.922.81
It helps the contractfb12.280.67822.567.062


0.01
0.011
fb22.88 *
0.595 (A.D.)
0.797
0.227
(S.E.)
2.54
0.14

3.21
1.05

fb32.410.9082.012.81
It helps the treatmentfb12.880.62.633.130.0670.797
fb22.960.7512.643.28
fb33.140.9412.723.55
Analysis of techniquesfb12.840.82.513.170.1640.687
fb22.750.7372.443.06
fb32.820.7952.473.17
Practical suggestions to the therapistfb12.960.8412.613.311.4650.23
fb22.830.7612.513.15
fb33.180.7332.863.51
Suitable for professional levelfb12.320.8521.972.67
13.325




0.001
0.001
0.005

fb23.17 *
0.847
0.758
(A.D.)
0.868
0.250
0.258
(S.E.)
2.8
0.35
0.25
3.53
1.35
1.27
fb32.410.9082.012.81
Useful for professional developmentfb130.6452.733.272.2410.139
fb23.290.7512.973.61
fb330.9262.593.41
It highlights areas for improvementfb13.280.67833.56
5.634


0.02
0.04
fb22.880.7412.563.19
fb33.32 *
0.443
(A.D.)
0.716
0.210
(S.E.)
3
0.02
3.64
0.86
It helps self-reflectionfb12.720.6142.472.971.4090.239
fb23.170.7022.873.46
fb33.180.8532.83.56
It helps confidencefb12.440.8212.12.78
7.648



0.007
0.002

fb23.17 *
0.727 (A.D.)
0.637
0.221
(S.E.)
2.9
0.29
3.44
1.17
fb32.820.8532.443.2
Emotional impactfb11.880.8331.542.22
6.378


0.014
0.000
fb22.96 **
1.078
(A.D.)
0.908
0.264
(S.E.)
2.57
0.55
3.34
1.61
fb32.861.0372.43.32
Peer communicationfb12.520.772.22.840.5230.472
fb22.380.8752.012.74
fb32.550.9632.122.97
fb1: untrained AI feedback; fb2: trained AI feedback; fb3: human feedback. ** significantly different scores with both two scores. * significantly different score between fb1 and fb2. A.D.: average difference; S.E.: standard error. The scores highlighted in gray refer to the post-hoc analysis.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cioffi, V.; Ragozzino, O.; Mosca, L.L.; Moretto, E.; Tortora, E.; Acocella, A.; Montanari, C.; Ferrara, A.; Crispino, S.; Gigante, E.; et al. Can AI Technologies Support Clinical Supervision? Assessing the Potential of ChatGPT. Informatics 2025, 12, 29. https://doi.org/10.3390/informatics12010029

AMA Style

Cioffi V, Ragozzino O, Mosca LL, Moretto E, Tortora E, Acocella A, Montanari C, Ferrara A, Crispino S, Gigante E, et al. Can AI Technologies Support Clinical Supervision? Assessing the Potential of ChatGPT. Informatics. 2025; 12(1):29. https://doi.org/10.3390/informatics12010029

Chicago/Turabian Style

Cioffi, Valeria, Ottavio Ragozzino, Lucia Luciana Mosca, Enrico Moretto, Enrica Tortora, Annamaria Acocella, Claudia Montanari, Antonio Ferrara, Stefano Crispino, Elena Gigante, and et al. 2025. "Can AI Technologies Support Clinical Supervision? Assessing the Potential of ChatGPT" Informatics 12, no. 1: 29. https://doi.org/10.3390/informatics12010029

APA Style

Cioffi, V., Ragozzino, O., Mosca, L. L., Moretto, E., Tortora, E., Acocella, A., Montanari, C., Ferrara, A., Crispino, S., Gigante, E., Lommatzsch, A., Pizzimenti, M., Temporin, E., Barlacchi, V., Billi, C., Salonia, G., & Sperandeo, R. (2025). Can AI Technologies Support Clinical Supervision? Assessing the Potential of ChatGPT. Informatics, 12(1), 29. https://doi.org/10.3390/informatics12010029

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop