Evaluation of the Quality of ChatGPT’s Responses to Top 20 Questions about Robotic Hip and Knee Arthroplasty: Findings, Perspectives and Critical Remarks on Healthcare Education

Venosa, Michele; Calvisi, Vittorio; Iademarco, Giulio; Romanini, Emilio; Ciminello, Enrico; Cerciello, Simone; Logroscino, Giandomenico

doi:10.3390/prosthesis6040066

Open AccessArticle

Evaluation of the Quality of ChatGPT’s Responses to Top 20 Questions about Robotic Hip and Knee Arthroplasty: Findings, Perspectives and Critical Remarks on Healthcare Education

by

Michele Venosa

^1,2

,

Vittorio Calvisi

^2,3,

Giulio Iademarco

²,

Emilio Romanini

^1,4

,

Enrico Ciminello

⁵,

Simone Cerciello

^6,7 and

Giandomenico Logroscino

^2,3,*

¹

RomaPro, Polo Sanitario San Feliciano, 00195 Rome, Italy

²

Department of Life, Health and Environmental Sciences, University of L’Aquila, 67100 L’Aquila, Italy

³

Mini-Invasive and Computer-Assisted Orthopaedic Surgery Unit, San Salvatore Hospital, 67100 L’Aquila, Italy

⁴

GLOBE, Italian Working Group on Evidence-Based Orthopaedics, 00195 Rome, Italy

⁵

Italian Implantable Prostheses Registry (RIPI), Italian National Institute of Health, 00161 Rome, Italy

⁶

Unit of Orthopedics, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, 00168 Rome, Italy

⁷

Orthopaedic Department, Casa di Cura Villa Betania, 00165 Rome, Italy

^*

Author to whom correspondence should be addressed.

Prosthesis 2024, 6(4), 913-922; https://doi.org/10.3390/prosthesis6040066

Submission received: 20 June 2024 / Revised: 22 July 2024 / Accepted: 9 August 2024 / Published: 13 August 2024

(This article belongs to the Special Issue State of Art in Hip, Knee and Shoulder Replacement (Volume 2))

Download

Browse Figures

Versions Notes

Abstract

:

Robotic-assisted hip and knee arthroplasty represents significant advancements in orthopedic surgery. Artificial intelligence (AI)-driven chatbots, such as ChatGPT, could play a significant role in healthcare education. This study aims to evaluate the quality of responses provided by ChatGPT to the top 20 questions concerning robotic-assisted hip and knee arthroplasty. We have asked ChatGPT to select the top 20 questions on Google concerning robotic hip and knee arthroplasty and to provide a detailed answer to each of them. The accuracy and completeness of the information provided were examined by three orthopedic surgeons with scientific and clinical experience in hip- and knee-replacement surgery. The accuracy was assessed through a 5-point Likert scale (from 1—completely incorrect to 5—correct); the completeness through a 4-point Likert scale (from 0—comprehensiveness not assessable for completely incorrect answers to 3—exhaustive information) on two different occasions to ensure the consistency of the assessment. Our analysis reveals that ChatGPT provides a relatively high degree of accuracy; moreover, the explanations can be considered satisfying, especially for factual questions. The findings suggest that ChatGPT can serve as a valuable initial resource for general information on robotic hip and knee arthroplasty but the integration with human expertise remains essential.

Keywords:

knee arthroplasty; hip arthroplasty; eHealth; internet; artificial intelligence; patient education

1. Introduction

Robotic-assisted knee and hip arthroplasty have emerged as innovative surgical techniques, revolutionizing the field of joint replacement. The integration of robotics into these procedures has promised a paradigm shift, aiming to enhance surgical accuracy, improve patient outcomes, and expedite postoperative recovery [1,2,3]. Robotic-assisted arthroplasty offers preoperative planning tools, intraoperative guidance, and real-time feedback mechanisms, enabling surgeons to execute procedures with meticulous accuracy [4,5]. Indications for robotic-assisted knee and hip arthroplasty extend to complex anatomies, younger and more-active patient populations, and revision surgeries, where their precision can be particularly advantageous. Nevertheless, the adoption of robotic systems in orthopedic surgery comes with a set of challenges, including cost considerations, a learning curve for surgeons, and limitations in specific clinical scenarios. Their cost-effectiveness and broader applicability remain subjects of ongoing research and debate [6,7].

This technological revolution has gained considerable interest and curiosity among various stakeholders, including patients seeking information about their potential treatment options, healthcare professionals striving to stay abreast of cutting-edge advancements, and researchers exploring the efficacy and nuances of robotic-assisted procedures.

Given the surge in interest and the ever-growing quest for credible information in the digital age, AI language models like ChatGPT have emerged as potential gateways to disseminate knowledge and address inquiries related to complex medical procedures [8,9]. The information provided by AI models has the potential to serve as a valuable resource for patients, caregivers, and healthcare professionals seeking knowledge and guidance regarding these surgical interventions. However, the reliability and accuracy of the information dispensed by such models are critical, especially in the context of medical procedures [10]. ChatGPT, powered by deep learning algorithms, has been trained on a diverse array of data sources, providing it with a breadth of information across multiple domains, including healthcare and orthopedic surgery (Figure 1). The reliability and accuracy of AI-generated information in the medical domain, especially concerning intricate procedures such as robotic hip and knee arthroplasty, have been subjects of scrutiny.

Patients contemplating joint replacement surgeries often seek detailed insights into the procedure, potential benefits, associated risks, postoperative care, and long-term outcomes. Similarly, healthcare professionals rely on accurate information to counsel and guide patients through the decision-making process, while researchers and practitioners delve into the nuances of these technologies to advance the field further. The intersection of advanced medical technology and AI-driven information dissemination presents a unique opportunity to address the growing need for accessible and reliable medical information. Evaluating the proficiency of AI language models, like ChatGPT, in providing accurate and comprehensive responses to inquiries regarding hip and knee robotic arthroplasty is crucial. By analyzing the quality, accuracy, and depth of information presented by these models, we can better understand their capabilities and limitations in meeting the informational needs of diverse stakeholders. Current AI technologies in medical education face several limitations that hinder their full potential. One major limitation is the lack of personalized learning experiences; AI systems often struggle to tailor content to the specific needs and learning styles of individual students. Additionally, while AI can provide vast amounts of information, it may lack the depth and context that experienced educators offer, potentially leading to misunderstandings or oversimplifications of complex medical concepts. Another significant issue is the potential for outdated or inaccurate information, as AI models depend on the quality and recency of their training data.

This study endeavors to critically assess ChatGPT’s efficacy in addressing the top 20 questions frequently posed about hip and knee robotic arthroplasty. Through rigorous analysis and expert evaluation, this research aims to provide insights into the reliability of AI-generated information and its potential role in complementing traditional sources of medical knowledge. In doing so, we seek to illuminate the strengths and limitations of AI language models in the context of complex medical-information dissemination, paving the way for informed decision-making and improved accessibility to reliable healthcare information in the era of advancing technology.

2. Materials and Methods

The methodology was designed to ensure a systematic and objective assessment of the AI language model’s proficiency in addressing queries related to hip and knee robotic arthroplasty.

Preliminary research was conducted to determine the total market share of the most popular search engines on the World Wide Web. Google^® (Google Inc., Mountain View, CA, USA) emerged as the leading search engine, holding 91.43% of the global market share (https://gs.statcounter.com/search-engine-market-share (accessed on 6 November 2023)). Given its significant dominance over other search engines, Google was exclusively chosen for this study. We selected a set of top 20 questions concerning robotic hip and knee arthroplasty on Google (extracted by OpenAI ChatGPT-4 version itself) to analyze the quality of the AI chatbot answers.

ChatGPT has been encouraged to elaborate informative and detailed responses, ensuring that the model has sufficient context to generate meaningful answers (Table 1).

These responses were logged and stored for subsequent analysis (Figure 2).

A panel of three expert orthopedic surgeons evaluated the quality of ChatGPT responses. These reviewers possessed a background in orthopedic surgery (specifically in the area of hip and knee arthroplasty) and a laudable academic career (having achieved the national qualification for Full Professor). They independently assessed the responses generated by ChatGPT in terms of accuracy and completeness with a predefined set of evaluation criteria. A five-point Likert scale (score 1–5) was used to assess the accuracy: 1—completely incorrect, 2—more incorrect than correct, 3—approximately equal correct and incorrect, 4—more correct than incorrect, and 5—correct. A four-point Likert scale (score 0–3) was used to examine the completeness of the response: 0—comprehensiveness not assessable for completely incorrect answers/accuracy scale 0; 1—incomplete; substantial parts of the answer rare, incomplete, or missing, 2—satisfying; it provides the minimum amount required to give complete information, addressing all aspects of the question and 3—exhaustive; all aspects of the question are investigated, with additional information beyond expectations.

Statistical tests were applied to assess the significance of differences in response quality across various questions and to evaluate the impact of fine-tuning on response quality. The evaluation was performed twice, at a distance of 15 days (t1 and t2) by each of the three expert orthopedic surgeons, to obtain adequate internal consistency. Wilcoxon and Mann–Whitney tests were performed to check for possible differences both within and between surgeon evaluations of the answers.

Institutional Review Board approval was not necessary, since neither animal nor human subjects were involved in this study; all data utilized were available for public use.

3. Results

The average accuracy of answers for questions regarding robotic hip and knee replacement was 4.37 (0.89) and 4.6 (0.58) points out of 5, respectively, while the average completeness was 2.02 (0.42) and 2.07 (0.38) points out of 3. No significant differences were found among answers at t1 and t2 (all p-values are over the significance threshold) for each surgeon; detailed results of the comparison of evaluation within and between surgeons at t1 and t2 are reported in Table 2.

In aggregate, ChatGPT demonstrated a reasonable level of proficiency in addressing inquiries related to hip and knee robotic arthroplasty. The model’s ability to provide accurate and relevant information, particularly in outlining the fundamental aspects of the procedures and their potential benefits, was notable.

4. Discussion

The dynamic nature of robotic arthroplasty is reflected in the integration of emerging technologies. Artificial intelligence (AI) and machine learning algorithms are being employed to enhance preoperative planning and predict patient-specific outcomes [11,12]. Augmented reality (AR) systems are being explored to provide surgeons with real-time, 3D visualization of the surgical field, aiding in precise implant placement [13,14,15]. The integration of robotic technology offers the promise of enhanced precision, improved implant survivorship, and tailored patient care. While substantial progress has been made, further research is required to address challenges related to training, cost-effectiveness, and the integration of cutting-edge technologies.

As these challenges are overcome, robotic arthroplasty is poised to play an increasingly prominent role in the future of orthopedic surgery, ultimately benefiting patients and healthcare systems alike [16]. The general opinion of common consumers regarding robotic surgery is a mix of fascination, curiosity, and some common myths and potential disappointments. Many consumers believe that robotic surgery offers greater precision and accuracy compared to traditional methods, which can lead to better surgical outcomes. Robotic surgery is often seen as minimally invasive, with smaller incisions and potentially faster recovery times. Most consumers may have misconceptions that robots perform surgeries entirely autonomously (in reality, surgeons control the robotic systems). Furthermore, the idea of a quicker recovery and shorter hospital stays is attractive to patients who want to return to their daily lives sooner [17]. The significance of accurate and comprehensive patient education in robotic knee and hip arthroplasty is paramount within the realm of orthopedic healthcare. Correct patient education ensures that individuals undergoing robotic knee and hip arthroplasty are equipped with a thorough understanding of the surgical procedure, including potential risks and benefits. Patients who receive accurate education are more likely to adhere to prescribed treatments, exercise regimens, and follow-up appointments, ultimately resulting in improved surgical outcomes [18]. In addition, such education helps manage patient expectations, minimizing postoperative anxiety and enhancing overall patient satisfaction. Moreover, it contributes to the efficient allocation of healthcare resources by reducing the likelihood of complications and readmissions, thereby curbing healthcare costs. On these premises, the utilization of AI-driven models, such as ChatGPT, to provide information and answer questions about these procedures has gained prominence [19]. Common users traditionally look for information online (with Google being the most popular search engine) but an increasingly larger share interact with chatbots, even for healthcare understanding. ChatGPT and Google serve different purposes in patient education. ChatGPT offers a conversational and personalized approach with the advantage of accessibility and customization, but lacks emotional support and may have limitations in medical expertise. Google provides a vast repository of healthcare information but requires patients to sift through search results, potentially leading to information overload and a higher risk of encountering misinformation [20]. The choice between these modalities depends on the specific needs of patients and the healthcare context, and in some cases, a combination of both may provide a comprehensive patient education strategy. It is important to note that while ChatGPT provides information based on a vast dataset, the quality of responses can vary depending on the complexity and specificity of the queries [21].

Our analysis reveals that ChatGPT provides a relatively high degree of accuracy and relevance, especially for factual questions. The completeness and depth of explanations can be considered satisfying, since the AI-driven chatbot ensures the minimum amount required to give complete information. The evaluation of ChatGPT’s responses revealed several noteworthy findings. In assessing the accuracy of ChatGPT’s responses, it was observed that the model generally provided information that was consistent with the established medical literature. For instance, the response elaborated by ChatGPT to the question “What is the typical recovery time and rehabilitation process after robotic knee replacement?” provided accurate information about the general recovery timeline and the role of physical therapy. The AI chatbot answered in this way: “The recovery time after robotic knee replacement can vary, but many patients can expect to be walking with the assistance of crutches or a walker within a day or two after surgery. Full recovery, including returning to normal activities, can take several weeks to a few months. Physical therapy plays a crucial role in rehabilitation, helping patients regain strength, range of motion, and mobility in the knee joint”. This aligns with well-documented post-operative protocols in knee arthroplasty [22]. Similarly, responses related to patient eligibility criteria, risks, and benefits of robotic joint replacement were in concordance with the medical literature [23]. ChatGPT demonstrated the capability to provide tailored information, especially in response to questions about customizing implant placement in robotic arthroplasty. The model appropriately emphasized the importance of customization based on the patient’s unique anatomy and the surgeon’s intraoperative adjustments. This aligns with the evolving trend in personalized medicine within orthopedics [24]. The results of our research are in line with the findings of the study performed by Kienzle et al. [25]. In this paper, the authors subjected ChatGPT to a set of questions, mirroring those commonly posed by patients in the preoperative assessment phase before undergoing total knee arthroplasty. All queries were framed in simple non-technical language and were presented to ChatGPT through a single continuous chat session. A panel of three orthopedic surgeons independently rated the responses elaborated by the AI chatbot, employing the DISCERN instrument to validate the accuracy of the answers. The authors found consistently high scores, suggestive of the good quality and accuracy of the information provided. However, on closer examination, the authors highlighted the generation of non-existing references for certain claims, emphasizing the necessity of cross-referencing information from established sources. Similarly, Mika et al., in their study examining ChatGPT responses to common patient questions regarding total hip arthroplasty [26], appreciated the quality of the chatbot responses, having provided evidence-based answers in a way that most patients could understand. The authors, based on these results, considered ChatGPT an adequate clinical tool for patient education before orthopedic consultation. On the other hand, these results differ from the observations made by Yang et al. [27], who compared the responses of two AI chatbots (ChatGPT and Bard) regarding treatments for hip and knee osteoarthritis with the American Academy of Orthopaedic Surgeons (AAOS) evidence-based clinical practice guidelines’ (CPGs) recommendations. According to their results, ChatGPT and Bard do not consistently provide responses that align with the AOOS CPGs, encouraging the use of non-recommended treatments in 30 and 60% of queries, respectively. On this basis, the authors ask patients and healthcare professionals for prudence to not blindly trust the information provided by the AI chatbots. The AI chatbot fallibility has emerged also in our study. Despite the overall credibility and completeness of ChatGPT, we noticed that in some cases some useful details were missing, potentially leaving patients with incomplete knowledge about their medical conditions or surgical options. For example, ChatGPT answered the question “What are the different types of hip implants available for robotic hip arthroplasty, and how are they selected?” in this way: “There are various types of hip implants, including metal-on-plastic, ceramic-on-ceramic, and metal-on-metal. The choice of implant depends on factors like patient age, activity level, and the surgeon’s recommendation.” It is clear that the answer is not quite correct, and that the question would merit a longer and more complete/accurate discussion to better detail all the aspects regarding this focus.

Assessing the credibility of information provided by a chatbot, like ChatGPT, is essential to ensure that users receive accurate and trustworthy information. Source verification plays a crucial role in healthcare patient education, as it directly impacts the accuracy, reliability, and trustworthiness of the information patients receive [28]. By steering patients toward credible sources, healthcare providers can help mitigate the negative effects of misinformation [29]. The ethical and practical implications of AI in patient education, particularly within the context of robotic hip and knee arthroplasty, are profound and multifaceted. Ethically, the deployment of AI models like ChatGPT in healthcare must navigate the delicate balance between providing accessible information and ensuring that this information is accurate, up-to-date, and trustworthy. Misleading or incorrect information can have serious consequences for patient health and decision-making, underscoring the necessity for stringent validation and oversight mechanisms. Furthermore, the reliance on AI could exacerbate inequalities in healthcare access, as individuals without digital literacy or access to technology might be excluded from the benefits of AI-driven patient education. AI can generate non-existing references or unsupported claims, highlighting a significant risk when using AI for medical information dissemination. This is particularly concerning in a medical context, where accuracy is paramount. Current AI technologies in medical education face several limitations that hinder their full potential. One significant issue is the potential for outdated or inaccurate information, as AI models depend on the quality and recency of their training data. Furthermore, ethical concerns regarding data privacy and the potential bias in AI algorithms raise important questions about the equitable use of these technologies in education. Another major limitation is the lack of personalized learning experiences; AI systems often struggle to tailor content to the specific needs and learning styles of individual students. Additionally, while AI can provide vast amounts of information, it may lack the depth and context that experienced educators offer, potentially leading to misunderstandings or oversimplifications of complex medical concepts [30,31]. Continuing the reasoning, human interaction remains superior to ChatGPT and other AI-driven tools in several aspects of healthcare patient education. While AI can provide valuable support, there are certain qualities and capabilities that human interactions offer. Human healthcare providers can empathize with patients, offering emotional support, reassurance, and comfort during difficult moments. They can address patients’ fears, concerns, and emotional needs in a way that AI cannot replicate. They can adapt their communication style and content to meet the individual needs of each patient. They can recognize and respond to unique cultural, emotional, and cognitive factors that influence a patient’s understanding and engagement. Much of human communication relies on non-verbal cues such as body language, facial expressions, and tone of voice. These cues convey empathy, sincerity, and reassurance, enhancing the patient–provider relationship and the understanding of information. In complex medical situations, healthcare providers can engage in nuanced discussions, explaining intricate medical concepts and addressing patients’ questions in real-time [32]. They can navigate uncertainty and adapt their responses to the patient’s level of understanding. Human healthcare providers can demonstrate cultural competency, taking into account a patient’s cultural background, beliefs, and values, when delivering information. This cultural sensitivity fosters better communication and understanding. In a conversation with a human healthcare provider, patients can seek immediate clarification or ask follow-up questions, fostering a dynamic and iterative learning process. Healthcare providers can offer real-time feedback and gauge patient comprehension. Building trust and rapport with patients is a critical aspect of healthcare [33]. While AI, including ChatGPT, has made significant advancements in healthcare patient education, it should complement rather than replace human interactions. The unique qualities of human providers, including empathy, adaptability, cultural competency, and the ability to provide emotional support, make them indispensable in delivering patient-centered care and education. These limitations highlight the need for continued development and careful integration of AI tools, ensuring they complement rather than replace the expertise and judgment of human educators. A balanced approach that leverages both human expertise and AI capabilities can provide the best outcomes for patients [34].

This study has some limitations that should be considered. The study relies heavily on ChatGPT to generate both the questions and the answers. Though the questions have been extracted by ChatGPT from among the most common ones routinely asked on Google search engine, this could introduce a bias, as the AI may generate questions that are more likely to be answered accurately by the same AI. This might not fully reflect the range of real-world questions patients might have. Second, while the responses are evaluated by orthopedic surgeons, there is no direct comparison between the AI-generated responses and responses from human experts. This should be addressed in further research. Third, ChatGPT responses are based on the data it was trained on and do not account for real-time updates or the latest advancements in the field. The static nature of the AI’s knowledge base might lead to outdated or less-accurate information over time. Fourth, the study uses Likert scales for evaluation, which, while useful, are subjective. The choice of a panel of three orthopedic experts with consistent academic and scientific experience should reduce the impact of this bias and grant a solid, appropriate, and realistic judgment of the quality of the information. This methodological setting has been used in other similar studies investigating the quality of ChatGPT responses in other healthcare fields [35,36,37,38]. In the future, a more diverse panel of evaluators, encompassing patients as well as various healthcare professionals, could be included to provide a broader and more comprehensive perspective on the quality of the responses.

5. Conclusions

The evaluation of ChatGPT’s responses to the top 20 questions regarding robotic hip and knee arthroplasty indicates that the model offers accurate, well-informed, and tailored information that aligns with established medical literature and guidelines. However, it is essential to recognize that AI models, including ChatGPT, are not a substitute for professional medical advice, and should serve as complementary sources of information. As the field of robotic-joint arthroplasty continues to evolve, ongoing assessments of AI-driven models like ChatGPT can contribute to their refinement and utility in providing valuable insights to patients and healthcare professionals.

Author Contributions

Conceptualization: M.V.; Writing—original draft preparation: M.V., E.C. and G.I.; Writing—review and editing: M.V., E.R. and G.L.; Supervision: G.L., V.C., E.R. and S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated or analyzed during this study are included in this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Marchand, K.B.; Moody, R.; Scholl, L.Y.; Bhowmik-Stoker, M.; Taylor, K.B.; Mont, M.A.; Marchand, R.C. Results of Robotic-Assisted Versus Manual Total Knee Arthroplasty at 2-Year Follow-up. J. Knee Surg. 2023, 36, 159–166. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.H.; Yoon, S.H.; Park, J.W. Does Robotic-assisted TKA Result in Better Outcome Scores or Long-Term Survivorship Than Conventional TKA? A Randomized, Controlled Trial. Clin. Orthop. Relat. Res. 2020, 478, 266–275. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Xiong, J.; Wang, P.; Zhu, S.; Qi, W.; Peng, H.; Yu, L.; Qian, W. Robotic-assisted compared with conventional total hip arthroplasty: Systematic review and meta-analysis. Postgrad. Med. J. 2018, 94, 335–341. [Google Scholar] [CrossRef] [PubMed]
Marmotti, A.; Rossi, R.; Castoldi, F.; Roveda, E.; Michielon, G.; Peretti, G.M. PRP and articular cartilage: A clinical update. Biomed Res. Int. 2015, 2015, 542502. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Deckey, D.G.; Rosenow, C.S.; Verhey, J.T.; Brinkman, J.C.; Mayfield, C.K.; Clarke, H.D.; Bingham, J.S. Robotic-assisted total knee arthroplasty improves accuracy and precision compared to conventional techniques. Bone Joint J. 2021, 103, 74–80. [Google Scholar] [CrossRef]
Rajan, P.V.; Khlopas, A.; Klika, A.; Molloy, R.; Krebs, V.; Piuzzi, N.S. The Cost-Effectiveness of Robotic-Assisted Versus Manual Total knee Arthroplasty: A Markov Model-Based Evaluation. J. Am. Acad. Orthop. Surg. 2022, 30, 168–176. [Google Scholar] [CrossRef] [PubMed]
Pierce, J.; Needham, K.; Adams, C.; Coppolecchia, A.; Lavernia, C. Robotic-assisted total hip arthroplasty: An economic analysis. J. Comp. Eff. Res. 2021, 10, 1225–1234. [Google Scholar] [CrossRef]
Hossain, E.; Rana, R.; Higgins, N.; Soar, J.; Barua, P.D.; Pisani, A.R.; Turner, K. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput. Biol. Med. 2023, 155, 106649. [Google Scholar] [CrossRef] [PubMed]
Goodman, R.S.; Patrinely, J.R., Jr.; Osterman, T.; Wheless, L.; Johnson, D.B. On the cusp: Considering the impact of artificial intelligence language models in healthcare. Med 2023, 4, 139–140. [Google Scholar] [CrossRef] [PubMed]
Shahsavar, Y.; Choudhury, A. User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study. JMIR Hum. Factors 2023, 10, e47564. [Google Scholar] [CrossRef]
Shaikh, H.J.F.; Hasan, S.S.; Woo, J.J.; Lavoie-Gagne, O.; Long, W.J.; Ramkumar, P.N. Exposure to Extended Reality and Artificial Intelligence-Based Manifestations: A Primer on the Future of Hip and Knee Arthroplasty. J. Arthroplast. 2023, 38, 2096–2104. [Google Scholar] [CrossRef]
Andriollo, L.; Picchi, A.; Sangaletti, R.; Perticarini, L.; Rossi, S.M.P.; Logroscino, G.; Benazzo, F. The Role of Artificial Intelligence in Anterior Cruciate Ligament Injuries: Current Concepts and Future Perspectives. Healthcare 2024, 12, 300. [Google Scholar] [CrossRef]
Fucentese, S.F.; Koch, P.P. A novel augmented reality-based surgical guidance system for total knee arthroplasty. Arch. Orthop. Trauma Surg. 2021, 141, 2227–2233. [Google Scholar] [CrossRef]
Fotouhi, J.; Alexander, C.P.; Unberath, M.; Taylor, G.; Lee, S.C.; Fuerst, B.; Johnson, A.; Osgood, G.; Taylor, R.H.; Khanuja, H.; et al. Plan in 2-D, execute in 3-D: An augmented reality solution for cup placement in total hip arthroplasty. J. Med. Imaging 2018, 5, 021205. [Google Scholar] [CrossRef]
Pokhrel, S.; Alsadoon, A.; Prasad, P.W.C.; Paul, M. A novel augmented reality (AR) scheme for knee replacement surgery by considering cutting error accuracy. Int. J. Med. Robot. 2019, 15, e1958. [Google Scholar] [CrossRef]
Suarez-Ahedo, C.; Lopez-Reyes, A.; Martinez-Armenta, C.; Martinez-Gomez, L.E.; Martinez-Nava, G.A.; Pineda, C.; Vanegas-Contla, D.R.; Domb, B. Revolutionizing orthopedics: A comprehensive review of robot-assisted surgery, clinical outcomes, and the future of patient care. J. Robot. Surg. 2023, 17, 2575–2581. [Google Scholar] [CrossRef]
Brinkman, J.C.; Christopher, Z.K.; Moore, M.L.; Pollock, J.R.; Haglin, J.M.; Bingham, J.S. Patient Interest in Robotic Total Joint Arthroplasty Is Exponential: A 10-Year Google Trends Analysis. Arthroplast. Today 2022, 15, 13–18. [Google Scholar] [CrossRef]
Griffiths, S.Z.; Albana, M.F.; Bianco, L.D.; Pontes, M.C.; Wu, E.S. Robotic-Assisted Total Knee Arthroplasty: An Assessment of Content, Quality, and Readability of Available Internet Resources. J. Arthroplast. 2021, 36, 946–952. [Google Scholar] [CrossRef]
Meo, S.A.; Al-Masri, A.A.; Alotaibi, M.; Meo, M.Z.S.; Meo, M.O.S. ChatGPT Knowledge Evaluation in Basic and Clinical Medical Sciences: Multiple Choice Question Examination-Based Performance. Healthcare 2023, 11, 2046. [Google Scholar] [CrossRef]
Van Bulck, L.; Moons, P. What if your patient switches from Dr. Google to Dr. ChatGPT? A vignette-based survey of the trustworthiness, value and danger of ChatGPT-generated responses to health questions. Eur. J. Cardiovasc. Nurs. 2024, 23, 95–98. [Google Scholar] [CrossRef]
Hou, Y.; Yeung, J.; Xu, H.; Su, C.; Wang, F.; Zhang, R. From Answers to Insights: Unveiling the Strengths and Limitations of ChatGPT and Biomedical Knowledge Graphs. Res. Sq. 2023, 3, rs-3185632. [Google Scholar] [CrossRef]
American Academy of Orthopaedic Surgeons. Total Knee Replacement Exercise Guide, 2014. Available online: https://orthoinfo.aaos.org/en/recovery/total-knee-replacement-exercise-guide/ (accessed on 6 November 2023).
Nogalo, C.; Meena, A.; Abermann, E.; Fink, C. Complications and downsides of the robotic total knee arthroplasty: A systematic review. Knee Surg. Sports Traumatol. Arthrosc. 2023, 31, 736–750. [Google Scholar] [CrossRef]
Hasan, S.; Ahmed, A.; Waheed, M.A.; Saleh, E.S.; Omari, A. Transforming Orthopedic Joint Surgeries: The Role of Artificial Intelligence (AI) and Robotics. Cureus 2023, 15, e43289. [Google Scholar] [CrossRef]
Kienzle, A.; Niemann, M.; Meller, S.; Gwinner, C. ChatGPT May Offer an Adequate Substitute for Informed Consent to Patients Prior to Total Knee Arthroplasty-Yet Caution Is Needed. J. Pers. Med. 2024, 14, 69. [Google Scholar] [CrossRef] [PubMed]
Mika, A.P.; Martin, J.R.; Engstrom, S.M.; Polkowski, G.G.; Wilson, J.M. Assessing ChatGPT Responses to Common Patient Questions Regarding Total Hip Arthroplasty. J. Bone Joint Surg. Am. 2023, 105, 1519–1526. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Ardavanis, K.S.; Slack, K.E.; Fernando, N.D.; Della Valle, C.J.; Hernandez, N.M. Chat Generative Pre-Trained Transformer (ChatGPT) and Bard: Artificial Intelligence Does Not Yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis. J. Arthroplast. 2024, 39, 1184–1190. [Google Scholar] [CrossRef] [PubMed]
Walker, H.L.; Ghani, S.; Kuemmerli, C.; Nebiker, C.A.; Müller, B.P.; Raptis, D.A.; Staubli, S.M. Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument. J. Med. Internet Res. 2023, 25, e47479. [Google Scholar] [CrossRef] [PubMed]
Eastin, M.S. Credibility assessments of online health information: The effects of source expertise and knowledge of content. J. Comp. Mediated Commun. 2001, 6, JCMC643. [Google Scholar] [CrossRef]
Venosa, M.; Romanini, E.; Cerciello, S.; Angelozzi, M.; Graziani, M.; Calvisi, V. ChatGPT and healthcare: Is the future already here? Opportunities, challenges, and ethical concerns. A narrative mini-review. Acta Biomed. 2024, 95, e2024005. [Google Scholar]
Dave, T.; Athaluri, S.A.; Singh, S. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front. Artif. Intell. 2023, 6, 1169595. [Google Scholar] [CrossRef] [PubMed]
Miller, A. The intrinsically linked future for human and Artificial Intelligence interaction. J. Big Data 2019, 6, 38. [Google Scholar] [CrossRef]
Pirhonen, A.; Silvennoinen, M.; Sillence, E. Patient Education as an Information System, Healthcare Tool and Interaction. J. Inf. Syst. Educ. 2014, 25, 327–332. [Google Scholar]
Feng, S.; Shen, Y. ChatGPT and the Future of Medical Education. Acad. Med. 2023, 98, 867–868. [Google Scholar] [CrossRef] [PubMed]
Yeo, Y.H.; Samaan, J.S.; Ng, W.H.; Ting, P.S.; Trivedi, H.; Vipani, A.; Ayoub, W.; Yang, J.D.; Liran, O.; Spiegel, B.; et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin. Mol. Hepatol. 2023, 29, 721–732. [Google Scholar] [CrossRef] [PubMed]
Cakir, H.; Caglar, U.; Yildiz, O.; Meric, A.; Ayranci, A.; Ozgor, F. Evaluating the performance of ChatGPT in answering questions related to urolithiasis. Int. Urol. Nephrol. 2024, 56, 17–21. [Google Scholar] [CrossRef]
Johnson, D.; Goodman, R.; Patrinely, J.; Stone, C.; Zimmerman, E.; Donald, R.; Chang, S.; Berkowitz, S.; Finn, A.; Jahangir, E.; et al. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Res Sq. 2023, 3, rs-2566942. [Google Scholar] [CrossRef]
Ozgor, B.Y.; Simavi, M.A. Accuracy and reproducibility of ChatGPT’s free version answers about endometriosis. Int. J. Gynaecol. Obstet. 2024, 165, 691–695. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Number of publications for the top 10 research directions concerning ChatGPT and healthcare between January 2022 and January 2024.

Figure 2. The flowchart shows the framework used to ensure a systematic and objective assessment of the competence of the AI language model in addressing questions related to robotic hip and knee arthroplasty.

Table 1. Scheme of prompts used in ChatGPT requests.

First Prompt Utilized

A. Exact Wording of the Prompt:
“Write the top 20 questions performed on Google concerning robotic-assisted hip and knee arthroplasty”

B. Objective of the Prompt:
1. To gather and analyze the most common inquiries from the public
2. To gain insights into public interest and concerns regarding the procedure

Second Prompt Utilized

A. Exact Wording of the Prompt:
“Now, please, provide detailed, complete and exhaustive answers to each of these questions concerning robotic hip and knee arthroplasty”

B. Objective of the Prompt:
1. To ensure thorough responses to the identified top 20 questions
2. To enhance public understanding and address common concerns comprehensively

Table 2. Average results of completeness and accuracy of the answers to the Top 20 questions concerning robotic-assisted hip and knee arthroplasty elaborated by ChatGPT.

		Accuracy		Completeness
		Hip	Knee	Hip	Knee
Surgeon 1	t1	4.4 (0.94)	4.65 (0.59)	1.95 (0.39)	2 (0.32)
	t2	4.4 (0.94)	4.65 (0.59)	1.95 (0.39)	2 (0.32)
	Within surgeon p	1	1	1	1
Surgeon 2	t1	4.3 (0.92)	4.5 (0.61)	2.15 (0.42)	2.05 (0.39)
	t2	4.4 (0.94)	4.65 (0.59)	2.0 (0.32)	2.1 (0.45)
	within surgeon p	0.35	0.15	0.23	0.77
Surgeon 3	t1	4.3 (0.86)	4.65 (0.59)	2.05 (0.51)	2.1 (0.45)
	t2	4.4 (0.82)	4.5 (0.61)	2.05 (0.39)	2.15 (0.37)
	within surgeon p	0.35	0.15	1	0.77
Between surgeon p	t1	1	1	0.49	1
Between surgeon p	t2	1	1	1	0.56

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Venosa, M.; Calvisi, V.; Iademarco, G.; Romanini, E.; Ciminello, E.; Cerciello, S.; Logroscino, G. Evaluation of the Quality of ChatGPT’s Responses to Top 20 Questions about Robotic Hip and Knee Arthroplasty: Findings, Perspectives and Critical Remarks on Healthcare Education. Prosthesis 2024, 6, 913-922. https://doi.org/10.3390/prosthesis6040066

AMA Style

Venosa M, Calvisi V, Iademarco G, Romanini E, Ciminello E, Cerciello S, Logroscino G. Evaluation of the Quality of ChatGPT’s Responses to Top 20 Questions about Robotic Hip and Knee Arthroplasty: Findings, Perspectives and Critical Remarks on Healthcare Education. Prosthesis. 2024; 6(4):913-922. https://doi.org/10.3390/prosthesis6040066

Chicago/Turabian Style

Venosa, Michele, Vittorio Calvisi, Giulio Iademarco, Emilio Romanini, Enrico Ciminello, Simone Cerciello, and Giandomenico Logroscino. 2024. "Evaluation of the Quality of ChatGPT’s Responses to Top 20 Questions about Robotic Hip and Knee Arthroplasty: Findings, Perspectives and Critical Remarks on Healthcare Education" Prosthesis 6, no. 4: 913-922. https://doi.org/10.3390/prosthesis6040066

Article Menu

Evaluation of the Quality of ChatGPT’s Responses to Top 20 Questions about Robotic Hip and Knee Arthroplasty: Findings, Perspectives and Critical Remarks on Healthcare Education

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI