Using Large Language Models to Retrieve Critical Data from Clinical Processes and Business Rules
Abstract
:1. Introduction
2. Methods
2.1. Selection of LLM
2.2. Dataset
2.3. Encoding the Clinical Procedures
2.4. Embedding the Encoded Processes
2.5. Information Retrieval Approach
2.6. Measuring Model Accuracy
3. Results
4. Discussion
5. Recommendations for Healthcare Organizations
- Integration: Integrate LLMs into existing clinical workflows and decision support systems to maximize their impact and usability [23].
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CDSSs | Clinical Decision Support Systems |
CPGs | Clinical practice guidelines |
CPM | Care Pathway Model |
EHRs | Electronic Health Records |
JSON | JavaScript Object Notation |
LLM | Large language model |
RAG | Retrieval-Augmented Generation |
SMEs | Subject matter experts |
References
- Papadopoulos, P.; Soflano, M.; Chaudy, Y.; Adejo, W.; Connolly, T.M. A systematic review of technologies and standards used in the development of rule-based clinical decision support systems. Health Technol. 2022, 12, 713–727. [Google Scholar] [CrossRef]
- Litvin, C.B.; Ornstein, S.M.; Wessell, A.M.; Nemeth, L.S.; Nietert, P.J. Adoption of a clinical decision support system to promote judicious use of antibiotics for acute respiratory infections in primary care. Int. J. Med. Inform. 2012, 81, 521–526. [Google Scholar] [CrossRef]
- Cricelli, I.; Marconi, E.; Lapi, F. Clinical Decision Support System (CDSS) in primary care: From pragmatic use to the best approach to assess their benefit/risk profile in clinical practice. Curr. Med. Res. Opin. 2022, 38, 827–829. [Google Scholar] [CrossRef]
- Jiang, F.; Deng, L.; Zhang, L.; Cai, Y.; Cheung, C.W.; Xia, Z. Review of the Clinical Characteristics of Coronavirus Disease 2019 (COVID-19). J. Gen. Intern. Med. 2020, 35, 1545–1549. [Google Scholar] [CrossRef]
- WHO. WHO COVID-19 Dashboard. Available online: https://data.who.int/dashboards/covid19/cases?n=c (accessed on 15 May 2024).
- Ahmed, F.; Hossain, M.S.; Islam, R.U.; Andersson, K. An Evolutionary Belief Rule-Based Clinical Decision Support System to Predict COVID-19 Severity under Uncertainty. Appl. Sci. 2021, 11, 5810. [Google Scholar] [CrossRef]
- Gomez-Cabello, C.A.; Borna, S.; Pressman, S.; Haider, S.A.; Haider, C.R.; Forte, A.J. Artificial-Intelligence-Based Clinical Decision Support Systems in Primary Care: A Scoping Review of Current Clinical Implementations. Eur. J. Investig. Health Psychol. Educ. 2024, 14, 685–698. [Google Scholar] [CrossRef] [PubMed]
- Sutton, R.T.; Pincock, D.; Baumgart, D.C.; Sadowski, D.C.; Fedorak, R.N.; Kroeker, K.I. An overview of clinical decision support systems: Benefits, risks, and strategies for success. NPJ Digit. Med. 2020, 3, 17. [Google Scholar] [CrossRef] [PubMed]
- Ramgopal, S.; Sanchez-Pinto, L.N.; Horvat, C.M.; Carroll, M.S.; Luo, Y.; Florin, T.A. Artificial intelligence-based clinical decision support in pediatrics. Pediatr. Res. 2023, 93, 334–341. [Google Scholar] [CrossRef] [PubMed]
- Peiffer-Smadja, N.; Rawson, T.M.; Ahmad, R.; Buchard, A.; Georgiou, P.; Lescure, F.X.; Birgand, G.; Holmes, A.H. Machine learning for clinical decision support in infectious diseases: A narrative review of current applications. Clin. Microbiol. Infect. 2020, 26, 584–595. [Google Scholar] [CrossRef]
- Pressman, S.M.; Borna, S.; Gomez-Cabello, C.A.; Haider, S.A.; Forte, A.J. AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries. J. Clin. Med. 2024, 13, 2832. [Google Scholar] [CrossRef] [PubMed]
- Yu, P.; Xu, H.; Hu, X.; Deng, C. Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. Healthcare 2023, 11, 2776. [Google Scholar] [CrossRef] [PubMed]
- Al Nazi, Z.; Peng, W. Large language models in healthcare and medical domain: A review. arXiv 2023, arXiv:2401.06775. [Google Scholar]
- Ferdush, J.; Begum, M.; Hossain, S.T. ChatGPT and clinical decision support: Scope, application, and limitations. Ann. Biomed. Eng. 2023, 52, 1119–1124. [Google Scholar] [CrossRef] [PubMed]
- Miao, J.; Thongprayoon, C.; Fulop, T.; Cheungpasitporn, W. Enhancing clinical decision-making: Optimizing ChatGPT’s performance in hypertension care. J. Clin. Hypertens 2024, 26, 588–593. [Google Scholar] [CrossRef]
- Borna, S.; Gomez-Cabello, C.A.; Pressman, S.M.; Haider, S.A.; Forte, A.J. Comparative Analysis of Large Language Models in Emergency Plastic Surgery Decision-Making: The Role of Physical Exam Data. J. Pers. Med. 2024, 14, 612. [Google Scholar] [CrossRef]
- Gomez-Cabello, C.A.; Borna, S.; Pressman, S.M.; Haider, S.A.; Forte, A.J. Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini. Medicina 2024, 60, 957. [Google Scholar] [CrossRef]
- Haider, S.A.; Pressman, S.M.; Borna, S.; Gomez-Cabello, C.A.; Sehgal, A.; Leibovich, B.C.; Forte, A.J. Evaluating Large Language Model (LLM) Performance on Established Breast Classification Systems. Diagnostics 2024, 14, 1491. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Wright, A.P.; Patterson, B.L.; Wanderer, J.P.; Turer, R.W.; Nelson, S.D.; McCoy, A.B.; Sittig, D.F.; Wright, A. Assessing the value of ChatGPT for clinical decision support optimization. MedRxiv 2023. [Google Scholar] [CrossRef]
- Benary, M.; Wang, X.D.; Schmidt, M.; Soll, D.; Hilfenhaus, G.; Nassir, M.; Sigler, C.; Knodler, M.; Keller, U.; Beule, D.; et al. Leveraging Large Language Models for Decision Support in Personalized Oncology. JAMA Netw. Open 2023, 6, e2343689. [Google Scholar] [CrossRef]
- Wang, G.; Gao, K.; Liu, Q.; Wu, Y.; Zhang, K.; Zhou, W.; Guo, C. Potential and Limitations of ChatGPT 3.5 and 4.0 as a Source of COVID-19 Information: Comprehensive Comparative Analysis of Generative and Authoritative Information. J. Med. Internet Res. 2023, 25, e49771. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Y.; Yuan, H.; Wang, S.; Liu, J.; Liu, W.; Deng, C.; Dou, Z.; Wen, J.-R. Large language models for information retrieval: A survey. arXiv 2023, arXiv:2308.07107. [Google Scholar]
- Hasan, W.U.; Zaman, K.T.; Wang, X.; Li, J.; Xie, B.; Tao, C. Empowering Alzheimer’s caregivers with conversational AI: A novel approach for enhanced communication and personalized support. NPJ Biomed. Innov. 2024, 1, 3. [Google Scholar] [CrossRef]
- Lakatos, R.; Pollner, P.; Hajdu, A.; Joo, T. Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems. arXiv 2024, arXiv:2403.09727. [Google Scholar]
- Miao, J.; Thongprayoon, C.; Suppadungsuk, S.; Garcia Valencia, O.A.; Cheungpasitporn, W. Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications. Medicina 2024, 60, 445. [Google Scholar] [CrossRef]
- Zakka, C.; Shad, R.; Chaurasia, A.; Dalal, A.R.; Kim, J.L.; Moor, M.; Fong, R.; Phillips, C.; Alexander, K.; Ashley, E.; et al. Almanac—Retrieval-Augmented Language Models for Clinical Medicine. NEJM AI 2024, 1, AIoa2300068. [Google Scholar] [CrossRef] [PubMed]
- BioRender. Available online: https://www.biorender.com/ (accessed on 16 May 2024).
- Mayo Foundation for Medical Education and Research COVID-19 navigator. AskMayoExpert Website. Available online: https://askmayoexpert.mayoclinic.org/navigator/covid-19 (accessed on 15 May 2024).
- Gansner, E.; Koutsofios, E.; North, S. Drawing Graphs with Dot; 2006. Available online: https://www.graphviz.org/pdf/dotguide.pdf (accessed on 15 May 2024).
- Parkulo, M.A.; Post, J.A.; Ristagno, E.H.; Tande, A.J.; Eggers, S.D.; Wald, M.K. COVID-19 Plus Seasonal Illness: Outpatient Testing (Adult). Available online: https://askmayoexpert.mayoclinic.org/topic/clinical-answers/prt-20503524/cpm-20522078 (accessed on 15 May 2024).
- Moxey, A.; Robertson, J.; Newby, D.; Hains, I.; Williamson, M.; Pearson, S.A. Computerized clinical decision support for prescribing: Provision does not guarantee uptake. J. Am. Med. Inform. Assoc. 2010, 17, 25–33. [Google Scholar] [CrossRef]
- Harada, T.; Miyagami, T.; Kunitomo, K.; Shimizu, T. Clinical Decision Support Systems for Diagnosis in Primary Care: A Scoping Review. Int. J. Environ. Res. Public Health 2021, 18, 8435. [Google Scholar] [CrossRef] [PubMed]
- Meunier, P.Y.; Raynaud, C.; Guimaraes, E.; Gueyffier, F.; Letrilliart, L. Barriers and Facilitators to the Use of Clinical Decision Support Systems in Primary Care: A Mixed-Methods Systematic Review. Ann. Fam. Med. 2023, 21, 57–69. [Google Scholar] [CrossRef] [PubMed]
- Agrawal, M.; Hegselmann, S.; Lang, H.; Kim, Y.; Sontag, D. Large language models are few-shot clinical information extractors. arXiv 2022, arXiv:2205.12689. [Google Scholar]
- Yang, J.; Jin, H.; Tang, R.; Han, X.; Feng, Q.; Jiang, H.; Zhong, S.; Yin, B.; Hu, X. Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Trans. Knowl. Discov. Data 2024, 18, 1–32. [Google Scholar] [CrossRef]
- Oniani, D.; Wu, X.; Visweswaran, S.; Kapoor, S.; Kooragayalu, S.; Polanska, K.; Wang, Y. Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines. arXiv 2024, arXiv:2401.11120. [Google Scholar]
- Rau, A.; Rau, S.; Zoeller, D.; Fink, A.; Tran, H.; Wilpert, C.; Nattenmueller, J.; Neubauer, J.; Bamberg, F.; Reisert, M.; et al. A Context-based Chatbot Surpasses Trained Radiologists and Generic ChatGPT in Following the ACR Appropriateness Guidelines. Radiology 2023, 308, e230970. [Google Scholar] [CrossRef] [PubMed]
- Ge, J.; Sun, S.; Owens, J.; Galvez, V.; Gologorskaya, O.; Lai, J.C.; Pletcher, M.J.; Lai, K. Development of a liver disease-specific large language model chat interface using retrieval-augmented generation. Hepatology 2024, 80, 1158–1168. [Google Scholar] [CrossRef] [PubMed]
- Van Veen, D.; Van Uden, C.; Blankemeier, L.; Delbrouck, J.B.; Aali, A.; Bluethgen, C.; Pareek, A.; Polacin, M.; Reis, E.P.; Seehofnerova, A.; et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 2024, 30, 1134–1142. [Google Scholar] [CrossRef]
- Gomez-Cabello, C.A.; Borna, S.; Pressman, S.M.; Haider, S.A.; Sehgal, A.; Leibovich, B.C.; Forte, A.J. Artificial Intelligence in Postoperative Care: Assessing Large Language Models for Patient Recommendations in Plastic Surgery. Healthcare 2024, 12, 1083. [Google Scholar] [CrossRef]
- Li, Y.; Li, Z.; Zhang, K.; Dan, R.; Jiang, S.; Zhang, Y. ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge. Cureus 2023, 15, e40895. [Google Scholar] [CrossRef] [PubMed]
- Tripathi, S.; Sukumaran, R.; Cook, T.S. Efficient healthcare with large language models: Optimizing clinical workflow and enhancing patient care. J. Am. Med. Inform. Assoc. 2024, 31, 1436–1440. [Google Scholar] [CrossRef]
- Leslie, D.; Mazumder, A.; Peppin, A.; Wolters, M.K.; Hagerty, A. Does “AI” stand for augmenting inequality in the era of covid-19 healthcare? BMJ 2021, 372, n304. [Google Scholar] [CrossRef] [PubMed]
- Pressman, S.M.; Borna, S.; Gomez-Cabello, C.A.; Haider, S.A.; Haider, C.; Forte, A.J. AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research. Healthcare 2024, 12, 825. [Google Scholar] [CrossRef] [PubMed]
- Zaidi, D.; Miller, T. Implicit Bias and Machine Learning in Health Care. South. Med. J. 2023, 116, 62–64. [Google Scholar] [CrossRef] [PubMed]
- Shekelle, P.G. Clinical Practice Guidelines: What’s Next? JAMA 2018, 320, 757–758. [Google Scholar] [CrossRef] [PubMed]
- Shaneyfelt, T.M.; Centor, R.M. Reassessment of Clinical Practice Guidelines: Go Gently Into That Good Night. JAMA 2009, 301, 868–869. [Google Scholar] [CrossRef] [PubMed]
- Morris, Z.S.; Wooding, S.; Grant, J. The answer is 17 years, what is the question: Understanding time lags in translational research. J. R. Soc. Med. 2011, 104, 510–520. [Google Scholar] [CrossRef] [PubMed]
- Guerra-Farfan, E.; Garcia-Sanchez, Y.; Jornet-Gibert, M.; Nunez, J.H.; Balaguer-Castro, M.; Madden, K. Clinical practice guidelines: The good, the bad, and the ugly. Injury 2023, 54 (Suppl. S3), S26–S29. [Google Scholar] [CrossRef]
- Umapathi, L.K.; Pal, A.; Sankarasubbu, M. Med-halt: Medical domain hallucination test for large language models. arXiv 2023, arXiv:2307.15343. [Google Scholar]
- Harrer, S. Attention is not all you need: The complicated case of ethically using large language models in healthcare and medicine. EBioMedicine 2023, 90, 104512. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Patil, S.G.; Jain, N.; Shen, S.; Zaharia, M.; Stoica, I.; Gonzalez, J.E. Raft: Adapting language model to domain specific rag. arXiv 2024, arXiv:2403.10131. [Google Scholar]
- Shuster, K.; Poff, S.; Chen, M.; Kiela, D.; Weston, J. Retrieval augmentation reduces hallucination in conversation. arXiv 2021, arXiv:2104.07567. [Google Scholar]
- Ong, J.C.L.; Jin, L.; Elangovan, K.; Lim, G.Y.S.; Lim, D.Y.Z.; Sng, G.G.R.; Ke, Y.; Tung, J.Y.M.; Zhong, R.J.; Koh, C.M.Y. Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties. arXiv 2024, arXiv:2402.01741. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.-t.; Rocktäschel, T. Retrieval-augmented generation for knowledge-intensive nlp tasks. arXiv 2021, arXiv:2005.11401. [Google Scholar]
- Huang, Y.; Huang, J. A Survey on Retrieval-Augmented Text Generation for Large Language Models. arXiv 2024, arXiv:2404.10981. [Google Scholar]
- Afzal, A.; Kowsik, A.; Fani, R.; Matthes, F. Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop. arXiv 2024, arXiv:2407.05925. [Google Scholar]
- Park, Y.J.; Pillai, A.; Deng, J.; Guo, E.; Gupta, M.; Paget, M.; Naugler, C. Assessing the research landscape and clinical utility of large language models: A scoping review. BMC Med. Inform. Decis. Mak. 2024, 24, 72. [Google Scholar] [CrossRef] [PubMed]
- Mesko, B.; Topol, E.J. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit. Med. 2023, 6, 120. [Google Scholar] [CrossRef]
Query: A patient, female, 55 years old, visit her physician. Her chief complaints are fever and cough. She said that she had vaccinations two weeks ago. Please give me the next step based on the CPM and the given information. | Response from LLM: Based on the context provided, the next steps for a patient with a chief complaint of cough and fever who has had COVID-19 vaccination within the past 14 days would be:
|
CPM Name | Node Accuracy | Edge Accuracy |
---|---|---|
Screening for anal dysplasia and cancer in people living with HIV | 0.93 | 1.00 |
Nonalcoholic fatty liver disease | 0.89 | 1.00 |
Mpox | 0.94 | 1.00 |
Symptomatic severe tricuspid regurgitation: Indications for referral | 0.92 | 0.93 |
Phenytoin or fosphenytoin order alert logic | 0.87 | 1.00 |
Pediatric pain management: First-line analgesics, adjunctive therapies, and opioid options | 0.90 | 0.73 |
COVID-19: Outpatient management (child) | 0.91 | 1.00 |
Diabetic ketoacidosis or hyperosmolar hyperglycemia state in pregnancy | 0.90 | 1.00 |
Postpartum hemorrhage (PPH) | 0.94 | 0.71 |
Differentiated thyroid cancer: Postoperative risk stratification | 0.97 | 0.62 |
Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) | 0.88 | 1.00 |
Emergency department and inpatient management of atrial fibrillation with rapid ventricular response | 0.95 | 0.79 |
Management of elevated anion gap acidosis | 0.94 | 0.94 |
Belzutifan alert logic | 0.85 | 1.00 |
Sacituzumab-govitecan order alert logic | 0.85 | 1.00 |
Differentiated thyroid cancer: Radioiodine whole body scan and guide to subsequent management | 0.96 | 0.83 |
Tricuspid regurgitation | 0.97 | 0.86 |
COVID-19: Postinfection return to physical activity and sports (child) | 0.89 | 1.00 |
Preoperative medication management | 0.80 | 1.00 |
Average Accuracy | 0.91 | 0.92 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, Y.; Gomez-Cabello, C.A.; Makarova, S.; Parte, Y.; Borna, S.; Haider, S.A.; Genovese, A.; Prabha, S.; Forte, A.J. Using Large Language Models to Retrieve Critical Data from Clinical Processes and Business Rules. Bioengineering 2025, 12, 17. https://doi.org/10.3390/bioengineering12010017
Yu Y, Gomez-Cabello CA, Makarova S, Parte Y, Borna S, Haider SA, Genovese A, Prabha S, Forte AJ. Using Large Language Models to Retrieve Critical Data from Clinical Processes and Business Rules. Bioengineering. 2025; 12(1):17. https://doi.org/10.3390/bioengineering12010017
Chicago/Turabian StyleYu, Yunguo, Cesar A. Gomez-Cabello, Svetlana Makarova, Yogesh Parte, Sahar Borna, Syed Ali Haider, Ariana Genovese, Srinivasagam Prabha, and Antonio J. Forte. 2025. "Using Large Language Models to Retrieve Critical Data from Clinical Processes and Business Rules" Bioengineering 12, no. 1: 17. https://doi.org/10.3390/bioengineering12010017
APA StyleYu, Y., Gomez-Cabello, C. A., Makarova, S., Parte, Y., Borna, S., Haider, S. A., Genovese, A., Prabha, S., & Forte, A. J. (2025). Using Large Language Models to Retrieve Critical Data from Clinical Processes and Business Rules. Bioengineering, 12(1), 17. https://doi.org/10.3390/bioengineering12010017