Testing and Validation of a Custom Retrained Large Language Model for the Supportive Care of HN Patients with External Knowledge Base
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. High-Quality User-Defined Database
2.2. Workflow of Custom-Trained Chatbot
2.3. Testing and Validation of Custom-Trained Chatbot
2.4. Auto-Evaluation of Chatbot Answers
3. Results
3.1. Testing and Validation on Chatbot Performance
3.2. Comparison of Auto-Evaluation and RadOnc Scores
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Rubinstein, E.B.; Miller, W.L.; Hudson, S.V.; Howard, J.; O’Malley, D.; Tsui, J.; Lee, H.S.; Bator, A.; Crabtree, B.F. Cancer Survivorship Care in Advanced Primary Care Practices: A Qualitative Study of Challenges and Opportunities. JAMA Intern. Med. 2017, 177, 1726–1732. [Google Scholar] [CrossRef] [PubMed]
- Pedersen, B.; Koktved, D.P.; Nielsen, L.L. Living with side effects from cancer treatment—A challenge to target information. Scand. J. Caring Sci. 2013, 27, 715–723. [Google Scholar] [CrossRef] [PubMed]
- Burg, M.A.; Adorno, G.; Lopez, E.D.S.; Loerzel, V.; Stein, K.; Wallace, C.; Sharma, D.K.B. Current unmet needs of cancer survivors: Analysis of open-ended responses to the American Cancer Society Study of Cancer Survivors II. Cancer 2015, 121, 623–630. [Google Scholar] [CrossRef] [PubMed]
- Molassiotis, A.; Stricker, C.T.; Eaby, B.; Velders, L.; Coventry, P.A. Understanding the concept of chemotherapy-related nausea: The patient experience. Eur. J. Cancer Care 2008, 17, 444–453. [Google Scholar] [CrossRef] [PubMed]
- Adler, J.; Paelecke-Habermann, Y.; Jahn, P.; Landenberger, M.; Leplow, B.; Vordermark, D. Patient information in radiation oncology: A cross-sectional pilot study using the EORTC QLQ-INFO26 module. Radiat. Oncol. 2009, 4, 40. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Li, L.; Li, Y. Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering. arXiv 2024, arXiv:2401.07510. [Google Scholar] [CrossRef]
- Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large language models encode clinical knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef] [PubMed]
- Clusmann, J.; Kolbinger, F.R.; Muti, H.S.; Carrero, Z.I.; Eckardt, J.-N.; Laleh, N.G.; Löffler, C.M.L.; Schwarzkopf, S.-C.; Unger, M.; Veldhuizen, G.P.; et al. The future landscape of large language models in medicine. Commun. Med. 2023, 3, 141. [Google Scholar] [CrossRef] [PubMed]
- Nori, H.; King, N.; McKinney, S.M.; Carignan, D.; Horvitz, E. Capabilities of gpt-4 on medical challenge problems. arXiv 2023, arXiv:2303.13375. [Google Scholar]
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
- Sandmann, S.; Riepenhausen, S.; Plagwitz, L.; Varghese, J. Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks. Nat. Commun. 2024, 15, 2050. [Google Scholar] [CrossRef] [PubMed]
- Yalamanchili, A.; Sengupta, B.; Song, J.; Lim, S.; Thomas, T.O.; Mittal, B.B.; Abazeed, M.E.; Teo, P.T. Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions. JAMA Netw. Open 2024, 7, e244630. [Google Scholar] [CrossRef] [PubMed]
- Burman, A.; Haeder, S.F. Potemkin Protections: Assessing Provider Directory Accuracy and Timely Access for Four Specialties in California. J. Health Polit. Policy Law 2022, 47, 319–349. [Google Scholar] [CrossRef] [PubMed]
- Moons, P.; Van Bulck, L. ChatGPT: Can artificial intelligence language models be of value for cardiovascular nurses and allied health professionals. Eur. J. Cardiovasc. Nurs. 2023, 22, e55–e59. [Google Scholar] [CrossRef] [PubMed]
- Zhang, P.; Kamel Boulos, M.N. Generative AI in Medicine and Healthcare: Promises, Opportunities and Challenges. Futur. Internet 2023, 15, 286. [Google Scholar] [CrossRef]
- Thirunavukarasu, A.J.; Ting, D.S.J.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S.W. Large language models in medicine. Nat. Med. 2023, 29, 1930–1940. [Google Scholar] [CrossRef] [PubMed]
- Haupt, C.E.; Marks, M. AI-Generated Medical Advice—GPT and Beyond. JAMA 2023, 329, 1349–1350. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.; Kann, B.H.; Foote, M.B.; Aerts, H.J.W.L.; Savova, G.K.; Mak, R.H.; Bitterman, D.S. Use of Artificial Intelligence Chatbots for Cancer Treatment Information. JAMA Oncol. 2023, 9, 1459–1462. [Google Scholar] [CrossRef] [PubMed]
- Kleber, T.; Floyd, W.; Pasli, M.; Qazi, J.J.; Huang, C.C.; Leng, J.X.; Carpenter, D.J.; Ackerson, B.; Salama, J.K.; Boyer, M.J. ChatGPT is an Unreliable Tool for Reviewing Radiation Oncology Literature. Int. J. Radiat. Oncol. Biol. Phys. 2023, 117 (Suppl. S2), e523. [Google Scholar] [CrossRef]
- Bernstein, I.A.; Zhang, Y.; Govil, D.; Majid, I.; Chang, R.T.; Sun, Y.; Shue, A.; Chou, J.C.; Schehlein, E.; Christopher, K.L.; et al. Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions. JAMA Netw. Open 2023, 6, e2330320. [Google Scholar] [CrossRef]
- Ebrahimi, B.; Howard, A.; Carlson, D.J.; Al-Hallaq, H. ChatGPT: Can a Natural Language Processing Tool Be Trusted for Radiation Oncology Use? Int. J. Radiat. Oncol. Biol. Phys. 2023, 116, 977–983. [Google Scholar] [CrossRef]
- Waters, M.R.; Aneja, S.; Hong, J.C. Unlocking the Power of ChatGPT, Artificial Intelligence, and Large Language Models: Practical Suggestions for Radiation Oncologists. Pract. Radiat. Oncol. 2023, 13, e484–e490. [Google Scholar] [CrossRef]
- Ayers, J.W.; Poliak, A.; Dredze, M.; Leas, E.C.; Zhu, Z.; Kelley, J.B.; Faix, D.J.; Goodman, A.M.; Longhurst, C.A.; Hogarth, M.; et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern. Med. 2023, 183, 589–596. [Google Scholar] [CrossRef] [PubMed]
- Palanica, A.; Flaschner, P.; Thommandram, A.; Li, M.; Fossat, Y. Physicians’ Perceptions of Chatbots in Health Care: Cross-Sectional Web-Based Survey. J. Med. Internet Res. 2019, 21, e12887. [Google Scholar] [CrossRef] [PubMed]
- Khene, Z.E.; Bigot, P.; Mathieu, R.; Roupret, M.; Bensalah, K.; French Committee of Urologic, O. Development of a Personalized Chat Model Based on the European Association of Urology Oncology Guidelines: Harnessing the Power of Generative Artificial Intelligence in Clinical Practice. Eur. Urol. Oncol. 2023, 7, 160–162. [Google Scholar] [CrossRef] [PubMed]
- Tang, L.; Sun, Z.; Idnay, B.; Nestor, J.G.; Soroush, A.; Elias, P.A.; Xu, Z.; Ding, Y.; Durrett, G.; Rousseau, J.F.; et al. Evaluating large language models on medical evidence summarization. npj Digit. Med. 2023, 6, 158. [Google Scholar] [CrossRef] [PubMed]
- Gortz, M.; Baumgartner, K.; Schmid, T.; Muschko, M.; Woessner, P.; Gerlach, A.; Byczkowski, M.; Sultmann, H.; Duensing, S.; Hohenfellner, M. An artificial intelligence-based chatbot for prostate cancer education: Design and patient evaluation study. Digit. Health 2023, 9, 20552076231173304. [Google Scholar] [CrossRef]
- Li, T.; Shetty, S.; Kamath, A.; Jaiswal, A.; Jiang, X.; Ding, Y.; Kim, Y. CancerGPT for few shot drug pair synergy prediction using large pretrained language models. npj Digit. Med. 2024, 7, 40. [Google Scholar] [CrossRef] [PubMed]
- Tayebi Arasteh, S.; Han, T.; Lotfinia, M.; Kuhl, C.; Kather, J.N.; Truhn, D.; Nebelung, S. Large language models streamline automated machine learning for clinical studies. Nat. Commun. 2024, 15, 1603. [Google Scholar] [CrossRef]
- Nasopharyngeal Cancer. NCCN Guidelines 2024. Available online: https://www.nccn.org/guidelines/guidelines-detail?category=patients&id=43 (accessed on 1 January 2024).
- Mouth Cancer. NCCN Guidelines 2024. Available online: https://www.nccn.org/guidelines/guidelines-detail?category=patients&id=31 (accessed on 1 January 2024).
- Throat Cancer. NCCN Guidelines 2024. Available online: https://www.nccn.org/guidelines/guidelines-detail?category=patients&id=44 (accessed on 1 January 2024).
- Adult Cancer Pain; Antiemesis; Palliative Care; Distress Management; Survivorship. Available online: https://www.nccn.org/guidelines/category_1 (accessed on 1 January 2024).
- Side Effects of Radiation Therapy. 2022. Available online: https://www.cancer.net/navigating-cancer-care/how-cancer-treated/radiation-therapy/side-effects-radiation-therapy (accessed on 1 January 2024).
- Head and Neck Cancers. Living as a Cancer Survivor. 2022. Available online: https://www.cancer.org/cancer/types/head-neck-cancer.html (accessed on 1 January 2024).
- Cancer Survivorship. Head and Neck Cancer—Patient Version. 2024. Available online: https://www.cancer.gov/about-cancer/coping/survivorship (accessed on 1 January 2024).
- Chuanhu; MZhao; Keldos. Chuanhu Chat (Version 20230709) [Computer Software]. 2023. Available online: https://github.com/GaiZhenbiao/ChuanhuChatGPT (accessed on 20 December 2023).
- Ramshaw, L.A.; Marcus, M.P. Text Chunking Using Transformation-Based Learning. In Natural Language Processing Using Very Large Corpora; Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D., Eds.; Springer: Dordrecht, The Netherlands, 1999; pp. 157–176. [Google Scholar] [CrossRef]
- Zamani, H.; Croft, W.B. Estimating embedding vectors for queries. In Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, Newark, DE, USA, 12–16 September 2016; pp. 123–132. [Google Scholar]
- Johnson, D.; Goodman, R.; Patrinely, J.; Stone, C.; Zimmerman, E.; Donald, R.; Chang, S.; Berkowitz, S.; Finn, A.; Jahangir, E.; et al. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Res. Sq. 2023; preprint. [Google Scholar] [CrossRef]
- Lyu, Q.; Tan, J.; Zapadka, M.E.; Ponnatapura, J.; Niu, C.; Myers, K.J.; Wang, G.; Whitlow, C.T. Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: Results, limitations, and potential. Vis. Comput. Ind. Biomed. Art. 2023, 6, 9. [Google Scholar] [CrossRef]
- Xu, Y.; Logie, N.; Phan, T.; Barbera, L.; Nordal, R.A.; Stosky, J.M.; Lee, S.L. Evaluating the Performance of ChatGPT at Breast Tumor Board. Int. J. Radiat. Oncol. Biol. Phys. 2023, 117, e493. [Google Scholar] [CrossRef]
- Zhou, Z.; Wang, X.; Li, X.; Liao, L. Is ChatGPT an Evidence-based Doctor? Eur. Urol. 2023, 84, 355–356. [Google Scholar] [CrossRef] [PubMed]
- OpenAI, in Privacy Policy. 2023. Available online: https://openai.com/policies/privacy-policy/ (accessed on 2 January 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, L.; Rong, Y.; McGee, L.A.; Rwigema, J.-C.M.; Patel, S.H. Testing and Validation of a Custom Retrained Large Language Model for the Supportive Care of HN Patients with External Knowledge Base. Cancers 2024, 16, 2311. https://doi.org/10.3390/cancers16132311
Zhu L, Rong Y, McGee LA, Rwigema J-CM, Patel SH. Testing and Validation of a Custom Retrained Large Language Model for the Supportive Care of HN Patients with External Knowledge Base. Cancers. 2024; 16(13):2311. https://doi.org/10.3390/cancers16132311
Chicago/Turabian StyleZhu, Libing, Yi Rong, Lisa A. McGee, Jean-Claude M. Rwigema, and Samir H. Patel. 2024. "Testing and Validation of a Custom Retrained Large Language Model for the Supportive Care of HN Patients with External Knowledge Base" Cancers 16, no. 13: 2311. https://doi.org/10.3390/cancers16132311
APA StyleZhu, L., Rong, Y., McGee, L. A., Rwigema, J. -C. M., & Patel, S. H. (2024). Testing and Validation of a Custom Retrained Large Language Model for the Supportive Care of HN Patients with External Knowledge Base. Cancers, 16(13), 2311. https://doi.org/10.3390/cancers16132311