Comparative Analysis of Artificial Intelligence Virtual Assistant and Large Language Models in Post-Operative Care
Abstract
:1. Introduction
2. Methods
3. Results
4. Discussion
4.1. AIVA vs. Large Language Models
4.2. Comparative Analysis of ChatGPT’s Performance
4.3. Written and Verbal Prompt Responses in ChatGPT-4 Compared to BARD
4.4. Our Study Limitations and Future Research Directions
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Pozza, E.D.; D’souza, G.F.; DeLeonibus, A.; Fabiani, B.; Gharb, B.B.; Zins, J.E. Patient satisfaction with an early smartphone-based cosmetic surgery postoperative follow-up. Aesthetic Surg. J. 2018, 38, 101–109. [Google Scholar] [CrossRef] [PubMed]
- Avila, F.R.; Boczar, D.; Spaulding, A.C.; Quest, D.J.; Samanta, A.; Torres-Guzman, R.A.; Maita, K.C.; Garcia, J.P.; Eldaly, A.S.; Forte, A.J. High Satisfaction with a Virtual Assistant for Plastic Surgery Frequently Asked Questions. Aesthetic Surg. J. 2023, 43, 494–503. [Google Scholar] [CrossRef] [PubMed]
- Bickmore, T.; Giorgino, T. Health dialog systems for patients and consumers. J. Biomed. Inform. 2006, 39, 556–571. [Google Scholar] [CrossRef] [PubMed]
- Solnyshkina, M.; Zamaletdinov, R.; Gorodetskaya, L.; Gabitov, A. Evaluating text complexity and Flesch-Kincaid grade level. J. Soc. Stud. Educ. Res. 2017, 8, 238–248. [Google Scholar]
- MedlinePlus. Choosing Effective Patient Education Materials; National Library of Medicine: Bethesda, MD, USA, 2021; Volume 30. [Google Scholar]
- Levine, E.C.; McGee, S.A.; Kohan, J.; Fanning, J.; Willson, T.D. A Comprehensive Analysis on the Readability of Rhinoplasty-Based Web Content for Patients. Plastic Surgery 2023, 1–9. [Google Scholar] [CrossRef]
- Sharma, A. Artificial intelligence in health care. Int. J. Humanit. Arts Med. Sci. 2021, 5, 106–109. [Google Scholar]
- Noorbakhsh-Sabet, N.; Zand, R.; Zhang, Y.; Abedi, V. Artificial intelligence transforms the future of health care. Am. J. Med. 2019, 132, 795–801. [Google Scholar] [CrossRef] [PubMed]
- Sosa, B.R.; Cung, M.; Suhardi, V.J.; Morse, K.; Thomson, A.; Yang, H.S.; Iyer, S.; Greenblatt, M.B. Capacity for large language model chatbots to aid in orthopedic management, research, and patient queries. J. Orthop. Res. Off. Publ. Orthop. Res. Soc. 2024, 42, 1276–1282. [Google Scholar] [CrossRef]
- Anandan, P.; Kokila, S.; Elango, S.; Gopinath, P.; Sudarsan, P. Artificial Intelligence based Chat Bot for Patient Health Care. In Proceedings of the 2022 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 25–27 January 2022; pp. 1–4. [Google Scholar]
- Vryoni, V.; Βρυώνη, Β. Chatbots in Healthcare: Towards AI-Enabled General Diagnosis and Medical Support. Ph.D. Thesis, University of Piraeus, Pireas, Greece, 2021. [Google Scholar]
- Page, L.C.; Gehlbach, H. How an artificially intelligent virtual assistant helps students navigate the road to college. Aera Open 2017, 3, 2332858417749220. [Google Scholar] [CrossRef]
- Perez-Pino, A.; Yadav, S.; Upadhyay, M.; Cardarelli, L.; Tadinada, A. The accuracy of artificial intelligence-based virtual assistants in responding to routinely asked questions about orthodontics. Angle Orthod. 2023, 93, 427–432. [Google Scholar] [CrossRef]
- van Bussel, M.J.P.; Odekerken–Schröder, G.J.; Ou, C.; Swart, R.R.; Jacobs, M.J. Analyzing the determinants to accept a virtual assistant and use cases among cancer patients: A mixed methods study. BMC Health Serv. Res. 2022, 22, 890. [Google Scholar] [CrossRef] [PubMed]
- Boczar, D.; Sisti, A.; Oliver, J.D.; Helmi, H.; Restrepo, D.J.; Huayllani, M.T.; Spaulding, A.C.; Carter, R.; Rinker, B.D.; Forte, A.J. Artificial intelligent virtual assistant for plastic surgery patient’s frequently asked questions: A pilot study. Ann. Plast. Surg. 2020, 84, e16–e21. [Google Scholar] [CrossRef] [PubMed]
- Roumeliotis, K.I.; Tselikas, N.D. ChatGPT and open-ai models: A preliminary review. Future Internet 2023, 15, 192. [Google Scholar] [CrossRef]
- Haupt, C.E.; Marks, M. AI-Generated Medical Advice—GPT and Beyond. JAMA 2023, 329, 1349–1350. [Google Scholar] [CrossRef] [PubMed]
- OpenAI Blog. 2024. Available online: https://openai.com/ (accessed on 19 March 2024).
- Bickmore, T.W.; Trinh, H.; Olafsson, S.; O’Leary, T.K.; Asadi, R.; Rickles, N.M.; Cruz, R. Patient and consumer safety risks when using conversational assistants for medical information: An observational study of Siri, Alexa, and Google Assistant. J. Med. Internet Res. 2018, 20, e11510. [Google Scholar] [CrossRef] [PubMed]
- Li’evin, V.; Hother, C.E.; Winther, O. Can large language models reason about medical questions? Patterns 2022, 5, 100943. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; McCoy, A.B.; Wright, A.P.; Carew, B.; Genkins, J.Z.; Huang, S.S.; Peterson, J.F.; Steitz, B.; Wright, A. Leveraging Large Language Models for Generating Responses to Patient Messages. J. Am. Med. Inform. Assoc. 2023. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; Lu, L.; Yang, S.; Liang, B.; Peng, X.; Pang, J.; Ding, J.; Shi, X.; Yang, L.; Song, H.; et al. MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine. arXiv 2023, arXiv:2305.07340. [Google Scholar]
- Guo, Q.; Cao, S.; Yi, Z. A medical question answering system using large language models and knowledge graphs. Int. J. Intell. Syst. 2022, 37, 8548–8564. [Google Scholar] [CrossRef]
- Li, Y.; Li, Z.; Zhang, K.; Dan, R.; Jiang, S.; Zhang, Y. ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge. Cureus 2023, 15, e40895. [Google Scholar] [CrossRef]
- Huynh, J.; Jiao, C.; Gupta, P.; Mehri, S.; Bajaj, P.; Chaudhary, V.; Eskenazi, M. Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation. arXiv 2023, arXiv:2301.12004. [Google Scholar]
- Google BARD-Gemini. 2024. Available online: https://gemini.google.com/u/1/app (accessed on 16 November 2023).
- Flesch Kincaid Calculator. 2024. Available online: https://goodcalculators.com/flesch-kincaid-calculator/ (accessed on 16 November 2023).
- Hemingway Editor. 2024. Available online: https://beta.hemingwayapp.com/ (accessed on 16 November 2023).
- Azzini, I.; Falavigna, D.; Giorgino, T.; Gretter, R.; Quaglini, S.; Rognoni, C.; Stefanelli, M. Automated spoken dialog system for home care and data acquisition from chronic patients. In The New Navigators: From Professionals to Patients; IOS Press: Amsterdam, The Netherlands, 2003; pp. 146–151. [Google Scholar]
- Giorgino, T.; Azzini, I.; Rognoni, C.; Quaglini, S.; Stefanelli, M.; Gretter, R.; Falavigna, D. Automated spoken dialogue system for hypertensive patient home management. Int. J. Med. Inform. 2005, 74, 159–167. [Google Scholar] [CrossRef]
- Iannantuono, G.M.; Bracken-Clarke, D.; Floudas, C.S.; Roselli, M.; Gulley, J.L.; Karzai, F. Applications of large language models in cancer care: Current evidence and future perspectives. Front. Oncol. 2023, 13, 1268915. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Yu, Z.; Zhang, X. Artificial intelligence-based clinical decision-support system improves cancer treatment and patient satisfaction. J. Clin. Oncol. 2019, 37 (Suppl. S15), e18303. [Google Scholar] [CrossRef]
- Tisman, G.; Seetharam, R. OpenAi’s ChatGPT-4, BARD and YOU.com (AI) and the Cancer Patient, for Now, Caveat Emptor, but Stay Tuned. In Digital Medicine Healthcare and Technology; IntechOpen: London, UK, 2023; Volume 2. [Google Scholar]
- Jacob, J. Google Bard: Utility in drug interactions. Scr. Medica 2023, 54, 311–314. [Google Scholar] [CrossRef]
- Hamidi, A.; Roberts, K. Evaluation of AI Chatbots for Patient-Specific EHR Questions. arXiv 2023, arXiv:2306.02549. [Google Scholar]
- Moons, P.; Van Bulck, L. Using ChatGPT and Google Bard to improve the readability of written patient information: A proof-of-concept. Eur. J. Cardiovasc. Nurs. 2023, 23, 122–126. [Google Scholar] [CrossRef] [PubMed]
- Dahmen, J.; Kayaalp, M.E.; Ollivier, M.; Pareek, A.; Hirschmann, M.T.; Karlsson, J.; Winkler, P.W. Artificial intelligence bot ChatGPT in medical research: The potential game changer as a double-edged sword. Knee Surg. Sports Traumatol. Arthrosc. 2023, 31, 1187–1189. [Google Scholar] [CrossRef] [PubMed]
- Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef]
- Busch, F.; Hoffmann, L.; Rueger, C.; van Dijk, E.H.; Kader, R.; Ortiz-Prado, E.; Makowski, M.R.; Saba, L.; Hadamitzky, M.; Kather, J.K.; et al. Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges. medRxiv 2024. medRxiv:2024.03.04.24303733. [Google Scholar] [CrossRef]
- Temel, M.H.; Erden, Y.; Bağcıer, F. Information Quality and Readability: ChatGPT’s Responses to the Most Common Questions About Spinal Cord Injury. World Neurosurg. 2024, 181, e1138–e1144. [Google Scholar] [CrossRef] [PubMed]
- Kılınç, D.D.; Mansız, D. Examination of the reliability and readability of Chatbot Generative Pretrained Transformer’s (ChatGPT) responses to questions about orthodontics and the evolution of these responses in an updated version. Am. J. Orthod. Dentofac. Orthop. 2024, 165, 546–555. [Google Scholar] [CrossRef] [PubMed]
- Haver, H.L.; Gupta, A.K.; Ambinder, E.B.; Bahl, M.; Oluyemi, E.T.; Jeudy, J.; Yi, P.H. Evaluating the Use of ChatGPT to Accurately Simplify Patient-centered Information about Breast Cancer Prevention and Screening. Radiol. Imaging Cancer 2024, 6, e230086. [Google Scholar] [CrossRef]
- Shen, S.A.; Perez-Heydrich, C.A.; Xie, D.X.; Nellis, J.C. ChatGPT vs. web search for patient questions: What does ChatGPT do better? Eur. Arch. Oto-Rhino-Laryngol. 2024, 281, 3219–3225. [Google Scholar] [CrossRef] [PubMed]
- Fahy, S.; Oehme, S.; Milinkovic, D.; Jung, T.; Bartek, B. Assessment of Quality and Readability of Information Provided by ChatGPT in Relation to Anterior Cruciate Ligament Injury. J. Pers. Med. 2024, 14, 104. [Google Scholar] [CrossRef] [PubMed]
- Chowdhury, M.; Lim, E.; Higham, A.; McKinnon, R.; Ventoura, N.; He, Y.; De Pennington, N. Can Large Language Models Safely Address Patient Questions Following Cataract Surgery? In Proceedings of the 5th Clinical Natural Language Processing Workshop, Toronto, ON, Canada, 14 July 2023. [Google Scholar]
- Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E.; Narang, S.; Chowdhery, A.; Zhou, D. Self-consistency improves chain of thought reasoning in language models. arXiv 2022, arXiv:2203.11171. [Google Scholar]
- Lechner, F.; Lahnala, A.; Welch, C.; Flek, L. Challenges of GPT-3-Based Conversational Agents for Healthcare. arXiv 2023, arXiv:2308.14641. [Google Scholar]
- Sun, H.; Xu, G.; Deng, J.; Cheng, J.; Zheng, C.; Zhou, H.; Peng, N.; Zhu, X.; Huang, M. On the safety of conversational models: Taxonomy, dataset, and benchmark. arXiv 2021, arXiv:2110.08466. [Google Scholar]
- Henderson, P.; Sinha, K.; Angelard-Gontier, N.; Ke, N.R.; Fried, G.; Lowe, R.; Pineau, J. Ethical Challenges in Data-Driven Dialogue Systems. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, New Orleans, LA, USA, 2–3 February 2017. [Google Scholar]
- Moult, B.; Franck, L.S.; Brady, H. Ensuring quality information for patients: Development and preliminary validation of a new instrument to improve the quality of written health care information. Health Expect. 2004, 7, 165–175. [Google Scholar] [CrossRef]
- Zhou, S.; Jeong, H.; Green, P.A. How consistent are the best-known readability equations in estimating the readability of design standards? IEEE Trans. Prof. Commun. 2017, 60, 97–111. [Google Scholar] [CrossRef]
- Shoemaker, S.J.; Wolf, M.S.; Brach, C. Development of the Patient Education Materials Assessment Tool (PEMAT): A new measure of understandability and actionability for print and audiovisual patient information. Patient Educ. Couns. 2014, 96, 395–403. [Google Scholar] [CrossRef] [PubMed]
- Pressman, S.M.; Borna, S.; Gomez-Cabello, C.A.; Haider, S.A.; Haider, C.; Forte, A.J. AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research. Healthcare 2024, 12, 825. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Borna, S.; Gomez-Cabello, C.A.; Pressman, S.M.; Haider, S.A.; Sehgal, A.; Leibovich, B.C.; Cole, D.; Forte, A.J. Comparative Analysis of Artificial Intelligence Virtual Assistant and Large Language Models in Post-Operative Care. Eur. J. Investig. Health Psychol. Educ. 2024, 14, 1413-1424. https://doi.org/10.3390/ejihpe14050093
Borna S, Gomez-Cabello CA, Pressman SM, Haider SA, Sehgal A, Leibovich BC, Cole D, Forte AJ. Comparative Analysis of Artificial Intelligence Virtual Assistant and Large Language Models in Post-Operative Care. European Journal of Investigation in Health, Psychology and Education. 2024; 14(5):1413-1424. https://doi.org/10.3390/ejihpe14050093
Chicago/Turabian StyleBorna, Sahar, Cesar A. Gomez-Cabello, Sophia M. Pressman, Syed Ali Haider, Ajai Sehgal, Bradley C. Leibovich, Dave Cole, and Antonio Jorge Forte. 2024. "Comparative Analysis of Artificial Intelligence Virtual Assistant and Large Language Models in Post-Operative Care" European Journal of Investigation in Health, Psychology and Education 14, no. 5: 1413-1424. https://doi.org/10.3390/ejihpe14050093