Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Design
2.2. Evaluation Tools
2.3. Statistical Analysis
3. Results
3.1. Medical Accuracy
3.2. Relevance
3.3. Readability
3.4. Time of Response
4. Discussion
5. Strengths and Limitations
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hadi, M.U.; Al-Tashi, Q.; Qureshi, R.; Shah, A.; Muneer, A.; Irfan, M.; Zafar, A.; Shaikh, M.B.; Akhtar, N.; Al-Garadi, M.A.; et al. Large Language Models: A Comprehensive Survey of Applications, Challenges, Limitations, and Future Prospects. Authorea Prepr. 2023. [Google Scholar]
- Mohapatra, D.P.; Thiruvoth, F.M.; Tripathy, S.; Rajan, S.; Vathulya, M.; Lakshmi, P.; Singh, V.K.; Haq, A.U. Leveraging Large Language Models (LLM) for the Plastic Surgery Resident Training: Do They Have a Role? Indian J. Plast. Surg. 2023, 56, 413–420. [Google Scholar] [CrossRef] [PubMed]
- Johnson, D.; Goodman, R.; Patrinely, J.; Stone, C.; Zimmerman, E.; Donald, R.; Chang, S.; Berkowitz, S.; Finn, A.; Jahangir, E.; et al. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Res Sq. 2023. [Google Scholar]
- Loftus, T.J.; Altieri, M.S.; Balch, J.A.; Abbott, K.L.; Choi, J.; Marwaha, J.S.; Hashimoto, D.A.; Brat, G.A.; Raftopoulos, Y.; Evans, H.L.; et al. Artificial Intelligence-enabled Decision Support in Surgery: State-of-the-art and Future Directions. Ann. Surg. 2023, 278, 51–58. [Google Scholar] [CrossRef] [PubMed]
- Navarrete-Welton, A.J.; Hashimoto, D.A. Current applications of artificial intelligence for intraoperative decision support in surgery. Front. Med. 2020, 14, 369–381. [Google Scholar] [CrossRef] [PubMed]
- Suliburk, J.W.; Buck, Q.M.; Pirko, C.J.; Massarweh, N.N.; Barshes, N.; Singh, H.; Rosengart, T.K. Analysis of Human Performance Deficiencies Associated with Surgical Adverse Events. JAMA Netw. Open 2019, 2, e198067. [Google Scholar] [CrossRef] [PubMed]
- Ren, Y.; Loftus, T.J.; Datta, S.; Ruppert, M.M.; Guan, Z.; Miao, S.; Shickel, B.; Feng, Z.; Giordano, C.; Upchurch, G.R.; et al. Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Predict Postoperative Complications and Report on a Mobile Platform. JAMA Netw. Open 2022, 5, e2211973. [Google Scholar] [CrossRef] [PubMed]
- Abi-Rafeh, J.; Henry, N.; Xu, H.H.; Bassiri-Tehrani, B.; Arezki, A.; Kazan, R.; Gilardino, M.S.; Nahai, F. Utility and Comparative Performance of Current Artificial Intelligence Large Language Models as Postoperative Medical Support Chatbots in Aesthetic Surgery. Aesthet. Surg. J. 2024. [Google Scholar] [CrossRef]
- He, Y.; Tang, H.; Wang, D.; Gu, S.; Ni, G.; Wu, H. Will ChatGPT/GPT-4 be a Lighthouse to Guide Spinal Surgeons? Ann. Biomed. Eng. 2023, 51, 1362–1365. [Google Scholar] [CrossRef]
- Oh, N.; Choi, G.S.; Lee, W.Y. ChatGPT goes to the operating room: Evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models. Ann. Surg. Treat. Res. 2023, 104, 269–273. [Google Scholar] [CrossRef]
- Cheng, K.; Li, Z.; Guo, Q.; Sun, Z.; Wu, H.; Li, C. Emergency surgery in the era of artificial intelligence: ChatGPT could be the doctor’s right-hand man. Int. J. Surg. 2023, 109, 1816–1818. [Google Scholar] [CrossRef] [PubMed]
- Cheng, K.; Sun, Z.; He, Y.; Gu, S.; Wu, H. The potential impact of ChatGPT/GPT-4 on surgery: Will it topple the profession of surgeons? Int. J. Surg. 2023, 109, 1545–1547. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Zhang, Y.; Chen, F. ChatGPT in Colorectal Surgery: A Promising Tool or a Passing Fad? Ann. Biomed. Eng. 2023, 51, 1892–1897. [Google Scholar] [CrossRef]
- Atkinson, C.J.; Seth, I.; Xie, Y.; Ross, R.J.; Hunter-Smith, D.J.; Rozen, W.M.; Cuomo, R. Artificial Intelligence Language Model Performance for Rapid Intraoperative Queries in Plastic Surgery: ChatGPT and the Deep Inferior Epigastric Perforator Flap. J. Clin. Med. 2024, 13, 900. [Google Scholar] [CrossRef] [PubMed]
- Gupta, R.; Pande, P.; Herzog, I.; Weisberger, J.; Chao, J.; Chaiyasate, K.; Lee, E.S. Application of ChatGPT in Cosmetic Plastic Surgery: Ally or Antagonist? Aesthet. Surg. J. 2023, 43, NP587–NP590. [Google Scholar] [CrossRef] [PubMed]
- Leypold, T.; Schäfer, B.; Boos, A.; Beier, J. Can AI Think Like a Plastic Surgeon? Evaluating GPT-4’s Clinical Judgment in Reconstructive Procedures of the Upper Extremity. Plast. Reconstr. Surg. Glob. Open 2023, 11, e5471. [Google Scholar] [CrossRef] [PubMed]
- Abi-Rafeh, J.; Hanna, S.; Bassiri-Tehrani, B.; Kazan, R.; Nahai, F. Complications Following Facelift and Neck Lift: Implementation and Assessment of Large Language Model and Artificial Intelligence (ChatGPT) Performance Across 16 Simulated Patient Presentations. Aesthet. Plast. Surg. 2023, 47, 2407–2414. [Google Scholar] [CrossRef] [PubMed]
- Abi-Rafeh, J.; Xu, H.H.; Kazan, R.; Tevlin, R.; Furnas, H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet. Surg. J. 2024, 44, 329–343. [Google Scholar] [CrossRef] [PubMed]
- Cox, A.; Seth, I.; Xie, Y.; Lang, D.; Hunter-Smith, D.J.; Rozen, W.M. Utilizing ChatGPT-4 for Providing Medical Information on Blepharoplasties to Patients. Aesthet. Surg. J. 2023, 43, NP658–NP662. [Google Scholar] [CrossRef]
- Kwon, D.Y.; Wang, A.; Restrepo Mejia, M.; Saturno, M.P.; Oleru, O.; Seyidova, N.; Taub, P.J. Adherence of a Large Language Model to Clinical Guidelines for Craniofacial Plastic and Reconstructive Surgeries. Ann. Plast. Surg. 2024, 92, 261–262. [Google Scholar] [CrossRef]
- Liu, H.Y.; Alessandri-Bonetti, M.; Arellano, J.A.; Egro, F.M. Can ChatGPT be the Plastic Surgeon’s New Digital Assistant? A Bibliometric Analysis and Scoping Review of ChatGPT in Plastic Surgery Literature. Aesthet. Plast. Surg. 2023, 40, 1644–1652. [Google Scholar] [CrossRef] [PubMed]
- Seth, I.; Cox, A.; Xie, Y.; Bulloch, G.; Hunter-Smith, D.J.; Rozen, W.M.; Ross, R.J. Evaluating Chatbot Efficacy for Answering Frequently Asked Questions in Plastic Surgery: A ChatGPT Case Study Focused on Breast Augmentation. Aesthet. Surg. J. 2023, 43, 1126–1135. [Google Scholar] [CrossRef] [PubMed]
- Seth, I.; Lim, B.; Xie, Y.; Cevik, J.; Rozen, W.M.; Ross, R.J.; Lee, M. Comparing the Efficacy of Large Language Models ChatGPT, BARD, and Bing AI in Providing Information on Rhinoplasty: An Observational Study. Aesthet. Surg. J. Open Forum 2023, 5, ojad084. [Google Scholar] [CrossRef] [PubMed]
- Seth, I.; Xie, Y.; Rodwell, A.; Gracias, D.; Bullock, G.; Hunter-Smith, D.J.; Rozen, W.M. Exploring the Role of a Large Language Model on Carpal Tunnel Syndrome Management: An Observation Study of ChatGPT. J. Hand Surg. Am. 2023, 48, 1025–1033. [Google Scholar] [CrossRef] [PubMed]
- Soto-Galindo, G.A.; Capelleras, M.; Cruellas, M.; Apaydin, F. Effectiveness of ChatGPT in Identifying and Accurately Guiding Patients in Rhinoplasty Complications. Facial Plast. Surg. 2023. [Google Scholar] [CrossRef] [PubMed]
- Vallurupalli, M.; Shah, N.D.; Vyas, R.M. Validation of ChatGPT 3.5 as a Tool to Optimize Readability of Patient-facing Craniofacial Education Materials. Plast. Reconstr. Surg. Glob. Open 2024, 12, e5575. [Google Scholar] [CrossRef] [PubMed]
- Yun, J.Y.; Kim, D.J.; Lee, N.; Kim, E.K. A comprehensive evaluation of ChatGPT consultation quality for augmentation mammoplasty: A comparative analysis between plastic surgeons and laypersons. Int. J. Med. Inform. 2023, 179, 105219. [Google Scholar] [CrossRef] [PubMed]
- Humar, P.; Asaad, M.; Bengur, F.B.; Nguyen, V. ChatGPT Is Equivalent to First-Year Plastic Surgery Residents: Evaluation of ChatGPT on the Plastic Surgery In-Service Examination. Aesthet. Surg. J. 2023, 43, NP1085–NP1089. [Google Scholar] [CrossRef] [PubMed]
- Wolfe, S.W.; Pederson, W.C.; Kozin, S.H.; Cohen, M.S. Green’s Operative Hand Surgery 2-Volume Set, 8th ed.; Elsevier: Amsterdam, The Netherlands, 2022. [Google Scholar]
- Loose, J.E.; Hopper, R.A.; Neligan, P.C. Plastic Surgery: Volume 3: Craniofacial, Head and Neck Surgery and Pediatric Surgery, 5th ed.; Elsevier: Amsterdam, The Netherlands, 2024. [Google Scholar]
- Song, D.H.; Hong, J.P.; Neligan, P.C. Plastic Surgery: Volume 4: Lower Extremity, Trunk and Burns, 5th ed.; Elsevier: Amsterdam, The Netherlands, 2024. [Google Scholar]
- Nahabedian, M.Y.; Neligan, P.C. Plastic Surgery: Volume 5: Breast, 5th ed.; Elsevier: Amsterdam, The Netherlands, 2024. [Google Scholar]
- Chung, K. Grabb and Smith’s Plastic Surgery; Lippincott Williams & Wilkins: Philadelphia, PA, USA, 2019. [Google Scholar]
- Readable. Flesch Reading Ease and the Flesch Kincaid Grade Level. 6 April 2024. Available online: https://readable.com/readability/flesch-reading-ease-flesch-kincaid-grade-level/ (accessed on 6 April 2024).
- Copeland-Halperin, L.R.; O’Brien, L.; Copeland, M. Evaluation of Artificial Intelligence-generated Responses to Common Plastic Surgery Questions. Plast. Reconstr. Surg. Glob. Open 2023, 11, e5226. [Google Scholar] [CrossRef]
- Momenaei, B.; Wakabayashi, T.; Shahlaee, A.; Durrani, A.F.; Pandit, S.A.; Wang, K.; Mansour, H.A.; Abishek, R.M.; Xu, D.; Sridhar, J.; et al. Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases. Ophthalmol. Retina 2023, 7, 862–868. [Google Scholar] [CrossRef]
- Al-Sharif, E.M.; Penteado, R.C.; Dib El Jalbout, N.; Topilow, N.J.; Shoji, M.K.; Kikkawa, D.O.; Liu, C.Y.; Korn, B.S. Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence. Ophthalmic Plast. Reconstr. Surg. 2024, 40, 303–311. [Google Scholar] [CrossRef] [PubMed]
- Yuan, J.; Tang, R.; Jiang, X.; Hu, H. Large language models for healthcare data augmentation: An example on patient-trial matching. AMIA Annu. Symp. Proc. 2023, 2023, 1324–1333. [Google Scholar]
- Leslie, D.; Mazumder, A.; Peppin, A.; Wolters, M.K.; Hagerty, A. Does “AI” stand for augmenting inequality in the era of COVID-19 healthcare? BMJ 2021, 372, n304. [Google Scholar] [CrossRef] [PubMed]
- Zaidi, D.; Miller, T. Implicit Bias and Machine Learning in Health Care. South Med. J. 2023, 116, 62–64. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gomez-Cabello, C.A.; Borna, S.; Pressman, S.M.; Haider, S.A.; Forte, A.J. Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini. Medicina 2024, 60, 957. https://doi.org/10.3390/medicina60060957
Gomez-Cabello CA, Borna S, Pressman SM, Haider SA, Forte AJ. Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini. Medicina. 2024; 60(6):957. https://doi.org/10.3390/medicina60060957
Chicago/Turabian StyleGomez-Cabello, Cesar A., Sahar Borna, Sophia M. Pressman, Syed Ali Haider, and Antonio J. Forte. 2024. "Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini" Medicina 60, no. 6: 957. https://doi.org/10.3390/medicina60060957
APA StyleGomez-Cabello, C. A., Borna, S., Pressman, S. M., Haider, S. A., & Forte, A. J. (2024). Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini. Medicina, 60(6), 957. https://doi.org/10.3390/medicina60060957