Addressing Semantic Variability in Clinical Outcome Reporting Using Large Language Models
Abstract
:1. Introduction
2. Materials and Methods
2.1. Step 1: Connecting to the CTG Search API Endpoint
2.2. Step 2: Text Normalization and Outcome Alignment
2.2.1. Rule-Based Approach: Ontology Linkage
2.2.2. Machine Learning-Based Approach
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. arXiv:1706.03762v7. [Google Scholar]
- Liévin, V.; Hother, C.E.; Winther, O. Can large language models reason about medical questions? arXiv 2022, arXiv:2207.08143. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Zhou, F.; Heybati, K.; Ali, S.; Zuo, Q.K.; Hou, W.; Dhivagaran, T.; Ramaraju, H.B.; Chang, O.; Wong, C.Y.; et al. Efficacy of chloroquine and hydroxychloroquine for the treatment of hospitalized COVID-19 patients: A meta-analysis. Future Virol. 2022, 17, 95–118. [Google Scholar] [CrossRef] [PubMed]
- Gautret, P.; Lagier, J.C.; Parola, P.; Meddeb, L.; Mailhe, M.; Doudier, B.; Courjon, J.; Giordanengo, V.; Vieira, V.E.; Dupont, H.T.; et al. Hydroxychloroquine and azithromycin as a treatment of COVID-19: Results of an open-label non-randomized clinical trial. Int. J. Antimicrob. Agents 2020, 56, 105949. [Google Scholar] [CrossRef] [PubMed]
- CDISC Library API Documentation. Available online: https://www.cdisc.org/cdisc-library/api-documentation (accessed on 12 December 2023).
- NCI REST API Documentation. Available online: https://api-evsrest.nci.nih.gov/swagger-ui/index.html#/ (accessed on 16 December 2023).
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. arXiv 2022, arXiv:2204.02311. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar]
- Wei, J.; Bosma, M.; Zhao, V.Y.; Guu, K.; Yu, A.W.; Lester, B.; Du, N.; Dai, A.M.; Le, Q.V. Finetuned language models are zero-shot learners. arXiv 2021, arXiv:2109.01652. [Google Scholar]
- Schulman, J.; Zoph, B.; Kim, C.; Hilton, J.; Menick, J.; Weng, J.; Uribe, J.F.; Fedus, L.; Metz, L.; Pokorny, M.; et al. ChatGPT: Optimizing Language Models for Dialogue. OpenAI blog. 2022. Available online: https://autogpt.net/chatgpt-optimizing-language-models-for-dialogue/ (accessed on 12 December 2023).
- Anil, R.; Dai, A.M.; Firat, O.; Johnson, M.; Lepikhin, D.; Passos, A.; Shakeri, S.; Taropa, E.; Bailey, P.; Chen, Z.; et al. Palm 2 technical report. arXiv 2023, arXiv:2305.10403. [Google Scholar]
- OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Jin, Q.; Leaman, R.; Lu, Z. Retrieve, Summarize, and Verify: How will ChatGPT impact information seeking from the medical literature? J. Am. Soc. Nephrol. 2023, 34, 1302–1304. [Google Scholar] [CrossRef] [PubMed]
- Jin, Q.; Yuan, Z.; Xiong, G.; Yu, Q.; Ying, H.; Tan, C.; Chen, M.; Huang, S.; Liu, X.; Yu, S. Biomedical question answering: A survey of approaches and challenges. ACM Comput. Surv. (CSUR) 2022, 55, 35. [Google Scholar] [CrossRef]
- Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large language models encode clinical knowledge. arXiv 2022, arXiv:2212.13138. [Google Scholar] [CrossRef]
- Wang, Z.; Xiao, C.; Sun, J. AutoTrial: Prompting Language Models for Clinical Trial Design. arXiv 2023, arXiv:2305.11366. [Google Scholar]
- Asudani, D.S.; Nagwani, N.K.; Singh, P. Impact of word embedding models on text analytics in deep learning environment: A review. Artif. Intell. Rev. 2023, 56, 10345–10425. [Google Scholar] [CrossRef] [PubMed]
- Oubenali, N.; Messaoud, S.; Filiot, A.; Lamer, A.; Andrey, P. Visualization of medical concepts represented using word embeddings: A scoping review. BMC Med. Inform. Decis. Mak. 2022, 22, 83. [Google Scholar] [CrossRef]
- Naseem, U.; Razzak, I.; Khan, S.K.; Prasad, M. A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. Trans. Asian Low-Resour. Lang. Inf. Process. 2021, 20, 74. [Google Scholar] [CrossRef]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. arXiv 2018, arXiv:1802.05365. [Google Scholar]
- Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
- Shah-Mohammadi, F.; Finkelstein, J. Contextualized Large Language Model-Based Architecture for Outcome Measure Alignment in Clinical Trials. In Proceedings of the 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Istanbul, Turkey, 5–8 December 2023; pp. 4411–4417. [Google Scholar] [CrossRef]
- OpenAI Chat Completion API. Available online: https://platform.openai.com/docs/api-reference/chat/create (accessed on 13 May 2024).
- Johnson, D.; Goodman, R.; Patrinely, J.; Stone, C.; Zimmerman, E.; Donald, R.; Chang, S.; Berkowitz, S.; Finn, A.; Jahangir, E.; et al. Assessing the accuracy and reliability of AI-generated medical responses: An evaluation of the Chat-GPT model. Res. Sq. 2023. [Google Scholar] [CrossRef]
- Finkelstein, J.; Chen, Q.; Adams, H.; Friedman, C. Automated Summarization of Publications Associated with Adverse Drug Reactions from PubMed. AMIA Jt. Summits Transl. Sci. Proc. 2016, 2016, 68–77. [Google Scholar] [PubMed] [PubMed Central]
- Elghafari, A.; Finkelstein, J. Automated Identification of Common Disease-Specific Outcomes for Comparative Effectiveness Research Using ClinicalTrials.gov: Algorithm Development and Validation Study. JMIR Med. Inform. 2021, 9, e18298. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Borziak, K.; Parvanova, I.; Finkelstein, J. ReMeDy: A platform for integrating and sharing published stem cell research data with a focus on iPSC trials. Database 2021, 2021, baab038. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
NCT Number | Outcome |
---|---|
NCT05766176 | Quantitative anti-SARS-CoV-2 |
NCT04902157 | Prognosis |
NCT04333550 | Mortality rate |
Number of trials considered in this study using the query terms “COVID-19” and “Hydroxychloroquine” | 261 | |
Number of unique primary outcomes | 475 | |
MeSH terms | ||
Percentage of outcomes assigned with a MeSH ID | 4% | |
Rule-based approach | NCI | |
Percentage of outcomes assigned with an NCI ID | 7% | |
CDISC | ||
Percentage of outcomes assigned with a CDISC ID (concept ID) | 10% | |
Machine learning-based approach | SBERT + GPT | |
Number of outcomes assigned with an at least one semantically similar outcome | 68% |
Length of Outcome | Percentage (%) | MeSHs (%) | NCI (%) | CDISC (%) |
---|---|---|---|---|
One word | 3 | 75 | 88 | 88 |
Two words | 6 | 13 | 60 | 67 |
Three words | 8 | 5 | 5 | 30 |
More than three words | 83 | 0 | 0 | 1 |
SBERT Output for the Outcome “Mortality” | MeSH Term |
Mortality outcome | ‘ ’ |
Overall mortality | ‘ ’ |
Mortality rate | ‘D009026’ |
Reduce mortality | ‘ ’ |
Mortality rates | ‘D009026’ |
SBERT output for the outcome “Time to clinical improvement” | MeSH Term |
Time of clinical improvements (days) | ‘ ’ |
Time to clinical improvement pivotal stage | ‘ ’ |
Days to clinical improvement | ‘ ’ |
Number of days to clinical improvement | ‘ ’ |
Time of clinical improvement | ‘ ’ |
One Word Outcome |
Answer: Yes, “hospitalization” and “hospitalization due to SARS-CoV-2 infection” are semantically the same. |
Answer: Yes, “mortality” and “death” are semantically the same. |
Two-words outcome |
Answer: Yes, “disease severity” and “severity of COVID-19 disease” are semantically the same. |
Answer: Yes, “clinical improvement” and “improvement of clinical status” are semantically the same. |
Three-words outcome |
Answer: Yes, “duration of hospitalization” and “duration of hospitalization (in days)” are semantically the same. |
Answer: Yes, “in hospital mortality” and “in hospital mortality (binary yes/no)” are semantically the same. |
More than Three-words outcome |
Answer: Yes, “proportion of patients requiring invasive mechanical ventilation or dying” and “invasive mechanical ventilation or mortality” are semantically the same. |
Answer: Yes, “length of total stay at the hospital” and “duration of hospitalization (in days)” are semantically the same. |
SBERT Output for the Outcome “Mortality” | GPT Response |
Mortality outcome | No, “mortality” and “mortality outcome” are not the same terms. |
Overall mortality | Yes, “mortality” and “overall mortality” are the same. |
Mortality rate | No, “mortality” and “mortality rate” are not the same things. |
Reduce mortality | No, “mortality” and “reduce mortality” are not the same. |
Mortality rates | No, “mortality” and “mortality rates” are not the same, although they are related concepts. |
SBERT output for the outcome “Time to clinical improvement” | GPT Response |
Time to clinical improvements (days) | Yes, “time to clinical improvement” and “time to clinical improvement (days)” generally refer to the same concept. |
Time to clinical improvement pivotal stage | No, “time to clinical improvement” and “time to clinical improvement pivotal stage” are not the same. |
Days to clinical improvement | Yes, “time to clinical improvement” and “days to clinical improvement” generally refer to the same concept—the duration or number of days it takes for a patient’s clinical condition to improve. |
Number of days to clinical improvement | Yes, “time to clinical improvement” and “number of days to clinical improvement” generally refer to the same concept. |
Time of clinical improvement | No, the phrases “time to clinical improvement” and “time of clinical improvement” have different meanings. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shah-Mohammadi, F.; Finkelstein, J. Addressing Semantic Variability in Clinical Outcome Reporting Using Large Language Models. BioMedInformatics 2024, 4, 2173-2185. https://doi.org/10.3390/biomedinformatics4040116
Shah-Mohammadi F, Finkelstein J. Addressing Semantic Variability in Clinical Outcome Reporting Using Large Language Models. BioMedInformatics. 2024; 4(4):2173-2185. https://doi.org/10.3390/biomedinformatics4040116
Chicago/Turabian StyleShah-Mohammadi, Fatemeh, and Joseph Finkelstein. 2024. "Addressing Semantic Variability in Clinical Outcome Reporting Using Large Language Models" BioMedInformatics 4, no. 4: 2173-2185. https://doi.org/10.3390/biomedinformatics4040116
APA StyleShah-Mohammadi, F., & Finkelstein, J. (2024). Addressing Semantic Variability in Clinical Outcome Reporting Using Large Language Models. BioMedInformatics, 4(4), 2173-2185. https://doi.org/10.3390/biomedinformatics4040116