Large Language Models in Genomics—A Perspective on Personalized Medicine
Abstract
:1. Introduction
1.1. Motivation
1.2. Contributions
- Present a primer to LLMs and their architecture.
- Understand the LLMs in genomics from the perspective of personalized medicine.
- Identify the limitations of LLMs and possible future research directions in this domain.
1.3. Related Work
2. Large Language Models: An Introduction
2.1. Encoder-Only Models
2.2. Decoder-Only Models
2.3. Hybrid Models
3. Personalized Medicine
3.1. Precision Medicine and Genomic Analysis
3.1.1. Genomic Data Analysis
3.1.2. Biomarker Identification
3.1.3. Pharmacogenomics
3.2. Key Enablers of Personalized Medicine
3.2.1. Next-Generation Sequencing
3.2.2. Single-Cell Genomics
3.2.3. Gene Editing Technology
3.2.4. Novel Computational Methods and Bioinformatics
4. Role of LLMs in Precision Medicine
4.1. Genomic Data Integration and Interpretation
4.2. Drug Development and Personalized Therapeutics
4.3. Integration of Multi-Omics Data
5. Challenges, Limitations, and Future of LLMs in Precision Medicine
5.1. Data Sparsity and Complexity
5.2. Interpretability and Model Transparency
5.3. Computational Resources
5.4. Relevance and Generalization Accuracy
5.5. Privacy and Security
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Molla, G.; Bitew, M. Revolutionizing personalized medicine: Synergy with multi-omics data generation, main hurdles, and future perspectives. Biomedicines 2024, 12, 2750. [Google Scholar] [CrossRef] [PubMed]
- Collins, F.S.; Varmus, H. A new initiative on precision medicine. N. Engl. J. Med. 2015, 372, 793–795. [Google Scholar] [CrossRef] [PubMed]
- Pritchard, D.E.; Moeckel, F.; Villa, M.S.; Housman, L.T.; McCarty, C.A.; McLeod, H.L. Strategies for integrating personalized medicine into healthcare practice. Pers. Med. 2017, 14, 141–152. [Google Scholar] [CrossRef] [PubMed]
- Marques, L.; Costa, B.; Pereira, M.; Silva, A.; Santos, J.; Saldanha, L.; Silva, I.; Magalhães, P.; Schmidt, S.; Vale, N. Advancing Precision Medicine: A Review of Innovative In Silico Approaches for Drug Development, Clinical Pharmacology and Personalized Healthcare. Pharmaceutics 2024, 16, 332. [Google Scholar] [CrossRef] [PubMed]
- Relling, M.V.; Dervieux, T. Pharmacogenetics and cancer therapy. Nat. Rev. Cancer 2001, 1, 99–108. [Google Scholar] [CrossRef]
- Alqahtani, T.; Badreldin, H.A.; Alrashed, M.; Alshaya, A.I.; Alghamdi, S.S.; bin Saleh, K.; Alowais, S.A.; Alshaya, O.A.; Rahman, I.; Al Yami, M.S.; et al. The emergent role of artificial intelligence, natural learning processing, and large language models in higher education and research. Res. Soc. Adm. Pharm. 2023, 19, 1236–1242. [Google Scholar] [CrossRef] [PubMed]
- Schaeffer, R.; Miranda, B.; Koyejo, S. Are emergent abilities of large language models a mirage? In Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: San Jose, CA, USA, 2023; Volume 36, pp. 55565–55581. [Google Scholar]
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2022, arXiv:2108.07258. [Google Scholar]
- Song, Y.; Liu, Y.; Lin, Z.; Zhou, J.; Li, D.; Zhou, T.; Leung, M.F. Learning From AI-Generated Annotations for Medical Image Segmentation. IEEE Trans. Consum. Electron. 2024, 70. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
- Fantozzi, P.; Naldi, M. The Explainability of Transformers: Current Status and Directions. Computers 2024, 13, 92. [Google Scholar] [CrossRef]
- Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A survey of transformers. AI Open 2022, 3, 111–132. [Google Scholar] [CrossRef]
- Kasneci, E.; Sessler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E.; et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]
- Zhou, P.; Wang, L.; Liu, Z.; Hao, Y.; Hui, P.; Tarkoma, S.; Kangasharju, J. A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming. arXiv 2024, arXiv:2404.16038. [Google Scholar]
- NVIDIA: Large Language Models. Available online: https://www.nvidia.com/en-us/glossary/large-language-models/ (accessed on 30 November 2017).
- Derraz, B.; Breda, G.; Kaempf, C.; Baenke, F.; Cotte, F.; Reiche, K.; Köhl, U.; Kather, J.N.; Eskenazy, D.; Gilbert, S. New regulatory thinking is needed for aibased personalised drug and cell therapies in precision oncology. NPJ Precis. Oncol. 2024, 8, 23. [Google Scholar] [CrossRef] [PubMed]
- Nazi, Z.A.; Peng, W. Large language models in healthcare and medical domain: A review. Informatics 2024, 11, 57. [Google Scholar] [CrossRef]
- Jablonka, K.M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 2024, 6, 161–169. [Google Scholar] [CrossRef]
- Huang, K.; Fu, T.; Glass, L.M.; Zitnik, M.; Xiao, C.; Sun, J. Deeppurpose: A deep learning library for drug–target interaction prediction. Bioinformatics 2020, 2236, 5545–5547. [Google Scholar] [CrossRef]
- Wang, C.; Li, M.; He, J.; Wang, Z.; Darzi, E.; Chen, Z.; Ye, J.; Li, T.; Su, Y.; Ke, J.; et al. A survey for large language models in biomedicine. arXiv 2024, arXiv:2409.00133. [Google Scholar]
- Zheng, Y.; Gan, W.; Chen, Z.; Qi, Z.; Liang, Q.; Yu, P.S. Large language models for medicine: A survey. Int. J. Mach. Learn. Cybern. 2025, 16, 1015–1040. [Google Scholar] [CrossRef]
- Zhou, H.; Liu, F.; Gu, B.; Zou, X.; Huang, J.; Wu, J.; Li, Y.; Chen, S.S.; Zhou, P.; Liu, J.; et al. A Survey of Large Language Models in Medicine: Progress, Application, and Challenge. arXiv 2024, arXiv:2311.05112. [Google Scholar]
- Liu, L.; Yang, X.; Lei, J.; Shen, Y.; Wang, J.; Wei, P.; Chu, Z.; Qin, Z.; Ren, K. A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions. arXiv 2024, arXiv:2406.03712. [Google Scholar]
- He, K.; Mao, R.; Lin, Q.; Ruan, Y.; Lan, X.; Feng, M.; Cambria, E. A survey of large language models for healthcare: From data, technology, and applications to accountability and ethics. Inf. Fusion 2025, 118, 102963. [Google Scholar] [CrossRef]
- AbuNasser, R.J.; Ali, M.Z.; Jararweh, Y.; Daraghmeh, M.; Ali, T.Z. Large language models in drug discovery: A comprehensive analysis of drug-target interaction prediction. In Proceedings of the 2024 2nd International Conference on Foundation and Large Language Models (FLLM), Dubai, United Arab Emirates, 26–29 November 2024; pp. 417–431. [Google Scholar] [CrossRef]
- Guan, S.; Wang, G. Drug discovery and development in the era of artificial intelligence: From machine learning to large language models. Artif. Intell. Chem. 2024, 2, 100070. [Google Scholar] [CrossRef]
- Valentini, G.; Malchiodi, D.; Gliozzo, J.; Mesiti, M.; Soto-Gomez, M.; Cabri, A.; Reese, J.; Casiraghi, E.; Robinson, P.N. The promises of large languagenmodels for protein design and modeling. Front. Bioinform. 2023, 3, 1304099. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.; Ding, K.; Lv, T.; Wang, X.; Yin, Q.; Zhang, Y.; Yu, J.; Wang, Y.; Li, X.; Xiang, Z.; et al. Scientific large language models: A survey on biological & chemical domains. ACM Comput. Surv. 2025, 57, 371531823. [Google Scholar]
- Stokel-Walker, C. ChatGPT listed as author on research papers: Many scientists disapprove. Nature 2023, 613, 620–621. [Google Scholar] [CrossRef] [PubMed]
- Biever, C. CHATGPT broke the Turing test—The race is on for new ways to assess AI. Nature 2023, 619, 686–689. [Google Scholar] [CrossRef] [PubMed]
- Bettayeb, M.; Halawani, Y.; Khan, M.U.; Saleh, H.; Mohammad, B. Efficient memristor accelerator for transformer self-attention functionality. Sci. Rep. 2024, 14, 24173. [Google Scholar] [CrossRef]
- Naik, N.; Jenkins, P.; Prajapat, S.; Grace, P. (Eds.) Contributions Presented at The International Conference on Computing, Communication, Cybersecurity and AI, London, UK, 3–4 July 2024; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2024; Volume 884. [Google Scholar] [CrossRef]
- Yang, J.; Jin, H.; Tang, R.; Han, X.; Feng, Q.; Jiang, H.; Zhong, S.; Yin, B.; Hu, X. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. ACM Trans. Knowl. Discov. Data 2024, 18, 3649506. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Jeong, C. Domain-specialized LLM: Financial fine-tuning and utilization method using Mistral 7B. J. Intell. Inf. Syst. 2024, 30, 93–120. [Google Scholar] [CrossRef]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- Bao, H.; Dong, L.; Wei, F.; Wang, W.; Yang, N.; Liu, X.; Wang, Y.; Piao, S.; Gao, J.; Zhou, M.; et al. UNILMv2: Pseudo-masked language models for unified language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 13–18 July 2020. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pretraining for Natural Language Generation, Translation, and Comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving language understanding by generative pre-training. OpenAI 2018. preprint. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774v6. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized autoregressive pretraining for language understanding. arXiv 2019, arXiv:1906.08237. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
- Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Tian, H.; Wu, H.; Wang, H. ERNIE: Enhanced Representation through Knowledge Integration. arXiv 2019, arXiv:1904.09223. [Google Scholar]
- Rosset, C.; De Freitas, A.; Smolensky, P.; Nakanishi, J.; John, K.; Bhatia, A.; Burch, E.; Riedl, C. Turing-NLG: A 17-Billion-Parameter Language Model by Microsoft; Microsoft Research Blog: Redmond, WA, USA, 2020. [Google Scholar]
- Liu, H.; Wang, H. GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians. arXiv 2024, arXiv:2406.15341. [Google Scholar]
- Zeng, Z.; Yin, B.; Wang, S.; Liu, J.; Yang, C.; Yao, H.; Sun, X.; Sun, M.; Xie, G.; Liu, Z. Chatmol: Interactive molecular discovery with natural language. Bioinformatics 2024, 40, 534. [Google Scholar] [CrossRef] [PubMed]
- Chithrananda, S.; Grand, G.; Ramsundar, B. ChemBERTa: Large-Scale Self Supervised Pretraining for Molecular Property Prediction. arXiv 2020, arXiv:2010.09885. [Google Scholar]
- Ahdritz, G.; Bouatta, N.; Floristean, C.; Kadyan, S.; Xia, Q.; Gerecke, W.; O’Donnell, T.J.; Berenberg, D.; Fisk, I.; Zanichelli, N.; et al. OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat. Methods 2024, 21, 1514–1524. [Google Scholar] [CrossRef] [PubMed]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Zıdek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
- Makarious, M.B.; Leonard, H.L.; Vitale, D.; Iwaki, H.; Saffo, D.; Sargent, L.; Dadu, A.; Castaño, E.S.; Carter, J.F.; Maleknia, M.; et al. GenoML: Automated Machine Learning for Genomics. arXiv 2021, arXiv:2103.03221. [Google Scholar]
- Brandes, N.; Ofer, D.; Peleg, Y.; Rappoport, N.; Linial, M. ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38, 2102–2110. [Google Scholar] [CrossRef]
- Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. Prottrans: Towards cracking the language of life’s code through self-supervised learning. bioRxiv 2021. [Google Scholar] [CrossRef]
- Ji, Y.; Zhou, Z.; Liu, H.; Davuluri, R.V. Dnabert: Pre-trained bidirectional encoder representations from transformers model for dna-language in genome. Bioinformatics 2021, 37, 2112–2120. [Google Scholar] [CrossRef]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; Kim, C.; Kang, J. BioBERT: A pretrained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [PubMed]
- Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-specific language model pretraining for biomedical natural language processing. arXiv 2020, arXiv:2007.15779. [Google Scholar] [CrossRef]
- Huang, K.; Altosaar, J.; Ranganath, R. ClinicalBERT: Modeling clinical notes and predicting hospital readmission. arXiv 2019, arXiv:1904.05342. [Google Scholar]
- Rasmy, L.; Xiao, C.; Xin, Y.; Zhang, H.; Wang, F. Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ Digit. Med. 2021, 4, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Huang, C.H. QuST-LLM: Integrating Large Language Models for Comprehensive Spatial Transcriptomics Analysis. arXiv 2024, arXiv:2406.14307. [Google Scholar]
- Mondal, D.; Inamdar, A. SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing. arXiv 2024, arXiv:2407.03381. [Google Scholar]
- Fishman, V.; Kuratov, Y.; Shmelev, A.; Petrov, M.; Penzar, D.; Shepelin, D.; Chekanov, N.; Kardymon, O.; Burtsev, M. Gena-lm: A family of open-source foundational dna language models for long sequences. bioRxiv 2024. [Google Scholar] [CrossRef]
- Liu, T.; Xiao, Y.; Luo, X.; Xu, H.; Zheng, W.J.; Zhao, H. Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research. arXiv 2024, arXiv:2406.15534. [Google Scholar]
- Sanabria, M.; Hirsch, J.; Joubert, P.M. Dna language model grover learns sequence context in the human genome. Nat. Mach. Intell. 2024, 6, 872–880. [Google Scholar] [CrossRef]
- Maier, M. Personalized medicine—A tradition in general practice! Eur. J. Gen. Pract. 2019, 25, 63–64. [Google Scholar] [CrossRef]
- Ginsburg, G.S.; Willard, H.F. Genomic and personalized medicine: Foundations and applications. Transl. Res. 2009, 154, 277–287. [Google Scholar] [CrossRef] [PubMed]
- Collins, F.S.; Morgan, M.; Patrinos, A. The Human Genome Project: Lessons from large-scale biology. Science 2003, 300, 286–290. [Google Scholar] [CrossRef]
- Pennisi, E. Reaching their goal early, sequencing labs celebrate. Am. Assoc. Adv. Sci. 2003, 300, 409. [Google Scholar] [CrossRef] [PubMed]
- McCombie, W.R.; McPherson, J.D.; Mardis, E.R. Next-generation sequencing technologies. Cold Spring Harb. Perspect. Med. 2019, 9, 036798. [Google Scholar] [CrossRef] [PubMed]
- Levine, D.A.; Network, C.G.A.R. Integrated genomic characterization of endometrial carcinoma. Nature 2013, 497, 67–73. [Google Scholar] [CrossRef]
- Holt, J.M.; Wilk, B.; Birch, C.L.; Brown, D.M.; Gajapathy, M.; Moss, A.C.; Sosonkina, N.; Wilk, M.A.; Anderson, J.A.; Harris, J.M.; et al. VarSight: Prioritizing clinically reported variants with binary classification algorithms. BMC Bioinform. 2019, 20, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Ashley, E.A. Towards precision medicine. Nat. Rev. Genet. 2016, 17, 507–522. [Google Scholar] [CrossRef]
- Phillips, K.A.; Trosman, J.R.; Deverka, P.A.; Quinn, B.; Tunis, S.; Neumann, P.J.; Chambers, J.D.; Garrison, L.P., Jr.; Douglas, M.P.; Weldon, C.B. Insurance coverage for genomic tests. Science 2018, 360, 278–279. [Google Scholar] [CrossRef] [PubMed]
- Wudel, J.H.; Hlozek, C.C.; Smedira, N.G.; McCarthy, P.M. Extracorporeal life support as a post left ventricular assist device implant supplement. ASAIO J. 1997, 43, 444. [Google Scholar] [CrossRef]
- The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 2020, 578, 82–93. [Google Scholar] [CrossRef] [PubMed]
- Zhao, E.Y.; Jones, M.; Jones, S.J. Whole-genome sequencing in cancer. Cold Spring Harb. Perspect. Med. 2019, 9, 034579. [Google Scholar] [CrossRef]
- Jelin, A.C.; Vora, N. Whole exome sequencing: Applications in prenatal genetics. Obstet. Gynecol. Clin. 2018, 45, 69–81. [Google Scholar] [CrossRef] [PubMed]
- Norton, M.E.; Van Ziffle, J.; Lianoglou, B.R.; Hodoglugil, U.; Devine, W.P.; Sparks, T.N. Exome sequencing vs targeted gene panels for the evaluation of nonimmune hydrops fetalis. Am. J. Obstet. Gynecol. 2022, 226, 128.e1–128.e11. [Google Scholar] [CrossRef] [PubMed]
- Drugan, T.; Leucut, A.D. Evaluating novel biomarkers for personalized medicine. Diagnostics 2024, 14, 587. [Google Scholar] [CrossRef] [PubMed]
- Bodaghi, A.; Fattahi, N.; Ramazani, A. Biomarkers: Promising and valuable tools towards diagnosis, prognosis and treatment of COVID-19 and other diseases. Heliyon 2023, 9, 13323. [Google Scholar] [CrossRef]
- Slamon, D.J.; Leyland-Jones, B.; Shak, S.; Fuchs, H.; Paton, V.; Bajamonde, A.; Fleming, T.; Eiermann, W.; Wolter, J.; Pegram, M.; et al. Use of chemotherapy plus a monoclonal antibody against her2 for metastatic breast cancer that overexpresses her2. N. Engl. J. Med. 2001, 344, 783–792. [Google Scholar] [CrossRef] [PubMed]
- Tokunaga, E.; Oki, E.; Nishida, K.; Koga, T.; Egashira, A.; Morita, M.; Kakeji, Y.; Maehara, Y. Trastuzumab and breast cancer: Developments and current status. Int. J. Clin. Oncol. 2006, 11, 199–208. [Google Scholar] [CrossRef] [PubMed]
- Oates, J.T.; Lopez, D. Pharmacogenetics: An important part of drug development with a focus on its application. Int. J. Biomed. Investig. 2018, 1, 111. [Google Scholar] [CrossRef]
- Lezhava, T.; Kakauridze, N.; Jokhadze, T.; Buadze, T.; Gaiozishvili, M.; Gargulia, K.; Sigua, T. Frequency of VKORC1 and CYP2C9 genes polymorphism in Abkhazian population. Georgian Med. News 2023, 338, 96–101. [Google Scholar]
- Johnson, J.A.; Cavallari, L.H. Warfarin pharmacogenetics. Trends Cardiovasc. Med. 2015, 25, 33–41. [Google Scholar] [CrossRef]
- Khera, A.V.; Emdin, C.A.; Drake, I.; Natarajan, P.; Bick, A.G.; Cook, N.R.; Chasman, D.I.; Baber, U.; Mehran, R.; Rader, D.J.; et al. Genetic risk, adherence to a healthy lifestyle, and coronary disease. N. Engl. J. Med. 2016, 375, 2349–2358. [Google Scholar] [CrossRef] [PubMed]
- Wang, M.; Yeung, S.L.A.; Luo, S.; Jang, H.; Ho, H.S.; Sharp, S.J.; Wijndaele, K.; Brage, S.; Wareham, N.J.; Kim, Y. Adherence to a healthy lifestyle, genetic susceptibility to abdominal obesity, cardiometabolic risk markers, and risk of coronary heart disease. Am. J. Clin. Nutr. 2023, 118, 911–920. [Google Scholar] [CrossRef] [PubMed]
- McGuire, A.L.; Fisher, R.; Cusenza, P.; Hudson, K.; Rothstein, M.A.; McGraw, D.; Matteson, S.; Glaser, J.; Henley, D.E. Confidentiality, privacy, and security of genetic and genomic test information in electronic health records: Points to consider. Genet. Med. 2008, 10, 495–499. [Google Scholar] [CrossRef] [PubMed]
- Evans, J.P.; Burke, W. Genetic exceptionalism. Too much of a good thing? Genet. Med. 2008, 10, 500–501. [Google Scholar] [CrossRef] [PubMed]
- Huisjes, H.J. Problems in studying functional teratogenicity in man. Prog. Brain Res. 1988, 73, 51–58. [Google Scholar]
- Van Dijk, E.L.; Jaszczyszyn, Y.; Naquin, D.; Thermes, C. The third revolution in sequencing technology. Trends Genet. 2018, 34, 666–681. [Google Scholar] [CrossRef]
- Dorado, G.; Gálvez, S.; Rosales, T.E.; Vásquez, V.F.; Hernández, P. Analyzing modern biomolecules: The revolution of nucleic-acid sequencing–review. Biomolecules 2021, 11, 1111. [Google Scholar] [CrossRef]
- Cao, J.; Packer, J.S.; Ramani, V.; Cusanovich, D.A.; Huynh, C.; Daza, R.; Qiu, X.; Lee, C.; Furlan, S.N.; Steemers, F.J.; et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 2017, 357, 661–667. [Google Scholar] [CrossRef]
- Wang, W.; Min, L.; Qiu, X.; Wu, X.; Liu, C.; Ma, J.; Zhang, D.; Zhu, L. Biological function of long non-coding RNA (LncRNA) Xist. Front. Cell Dev. Biol. 2021, 9, 645647. [Google Scholar] [CrossRef]
- Lim, B.; Lin, Y.; Navin, N. Advancing cancer research and medicine with single-cell genomics. Cancer Cell 2020, 37, 456–470. [Google Scholar] [CrossRef]
- Erfanian, N.; Heydari, A.A.; Feriz, A.M.; Iãnez, P.; Derakhshani, A.; Ghasemigol, M.; Farahpour, M.; Razavi, S.M.; Nasseri, S.; Safarpour, H.; et al. Deep learning applications in single-cell genomics and transcriptomics data analysis. Biomed. Pharmacother. 2023, 165, 115077. [Google Scholar] [CrossRef] [PubMed]
- Aljabali, A.A.A.; El-Tanani, M.; Tambuwala, M.M. Principles of crispr-cas9 technology: Advancements in genome editing and emerging trends in drug delivery. J. Drug Deliv. Sci. Technol. 2024, 92, 105338. [Google Scholar] [CrossRef]
- Boretti, A. The transformative potential of ai-driven crispr-cas9 genome editing to enhance car t-cell therapy. Comput. Biol. Med. 2024, 182, 109137. [Google Scholar] [CrossRef] [PubMed]
- Sari, O.; Liu, Z.; Pan, Y.; Shao, X. Predicting crispr-cas9 off-target effects inhuman primary cells using bidirectional lstm with bert embedding. Bioinform. Adv. 2024, 5, 184. [Google Scholar] [CrossRef] [PubMed]
- Abbasi, A.F.; Asim, M.N.; Dengel, A. Transitioning from wet lab to artificial intelligence: A systematic review of ai predictors in crispr. J. Transl. Med. 2025, 23, 153. [Google Scholar] [CrossRef] [PubMed]
- Gupta, D.; Bhattacharjee, O.; Mandal, D.; Sen, M.K.; Dey, D.; Dasgupta, A.; Kazi, T.A.; Gupta, R.; Sinharoy, S.; Acharya, K.; et al. CRISPR-Cas9 system: A new-fangled dawn in gene editing. Life Sci. 2019, 232, 116636. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.Y.; Doudna, J.A. CRISPR technology: A decade of genome editing is only the beginning. Science 2023, 379, 8643. [Google Scholar] [CrossRef] [PubMed]
- Libbrecht, M.W.; Noble, W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 2015, 16, 321–332. [Google Scholar] [CrossRef]
- Koboldt, D.C.; Steinberg, K.M.; Larson, D.E.; Wilson, R.K.; Mardis, E.R. The next-generation sequencing revolution and its impact on genomics. Cell 2013, 155, 27–38. [Google Scholar] [CrossRef]
- Goble, C.; Stevens, R.; Hull, D.; Wolstencroft, K.; Lopez, R.; Parkinson, H.; McEntyre, J.; Sansone, S.A.; Brooksbank, C.; Smedley, D.; et al. Precision medicine needs pioneering clinical bioinformaticians. Brief. Bioinform. 2019, 20, 752–766. [Google Scholar] [CrossRef]
- Ferri, E.; Petosa, C.; McKenna, C.E. Bromodomains: Structure, function and pharmacology of inhibition. Biochem. Pharmacol. 2016, 106, 1–18. [Google Scholar] [CrossRef] [PubMed]
- Telenti, A.; Auli, M.; Hie, B.L.; Maher, C.; Saria, S.; Ioannidis, J.P.A. Large language models for science and medicine. Eur. J. Clin. Investig. 2024, 54, 14183. [Google Scholar] [CrossRef]
- Sarumi, O.A.; Heider, D. Large language models and their applications in bioinformatics. Comput. Struct. Biotechnol. J. 2024, 23, 3498–3505. [Google Scholar] [CrossRef]
- Ruprecht, N.A.; Kennedy, J.D.; Bansal, B.; Singhal, S.; Sens, D.; Maggio, A.; Doe, V.; Hawkins, D.; Campbel, R.; O’Connell, K.; et al. Transcriptomics and epigenetic data integration learning module on google cloud. Brief. Bioinform. 2024, 25 (Suppl. S1), 352. [Google Scholar] [CrossRef]
- Ma, A.; Wang, X.; Li, J.; Wang, C.; Xiao, T.; Liu, Y.; Cheng, H.; Wang, J.; Li, Y.; Chang, Y.; et al. Single-cell biological network inference using a heterogeneous graph transformer. Nat. Commun. 2023, 14, 964. [Google Scholar] [CrossRef] [PubMed]
- Liu, J.; Yang, M.; Yu, Y.; Xu, H.; Li, K.; Zhou, X. Large language models in bioinformatics: Applications and perspectives. arXiv 2024, arXiv:2401.04155. [Google Scholar]
- Borkakoti, N.; Thornton, J.M. Alphafold2 protein structure prediction: Implications for drug discovery. Curr. Opin. Struct. Biol. 2023, 78, 102526. [Google Scholar] [CrossRef] [PubMed]
- Xiao, Y.; Sun, E.; Jin, Y.; Wang, Q.; Wang, W. ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding. arXiv 2024, arXiv:2408.11363v1. [Google Scholar]
- Li, T.; Shetty, S.; Kamath, A.; Jaiswal, A.; Jiang, X.; Ding, Y.; Kim, Y. CancerGPT for few shot drug pair synergy prediction using large pretrained language models. NPJ Digit. Med. 2024, 7, 40. [Google Scholar] [CrossRef]
- Regev, A.; Teichmann, S.A.; Lander, E.S.; Amit, I.; Benoist, C.; Birney, E.; Bodenmiller, B.; Campbell, P.; Carninci, P.; Clatworthy, M.; et al. The human cell atlas. eLife 2017, 6, 27041. [Google Scholar] [CrossRef] [PubMed]
- Kang, M.; Ko, E.; Mersha, T.B. A roadmap for multi-omics data integration using deep learning. Brief. Bioinform. 2022, 23, 454. [Google Scholar] [CrossRef]
- Tang, W.; Wen, H.; Liu, R.; Ding, J.; Jin, W.; Xie, Y.; Liu, H.; Tang, J. Single Cell Multimodal Prediction via Transformers. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 2422–2431. [Google Scholar]
- Davenport, T.; Kalakota, R. The potential for artificial intelligence in healthcare. Future Healthc. J. 2019, 6, 94. [Google Scholar] [CrossRef] [PubMed]
- Alowais, S.A.; Alghamdi, S.S.; Alsuhebany, N.; Alqahtani, T.; Alshaya, A.I.; Almohareb, S.N.; Aldairem, A.; Alrashed, M.; Bin Saleh, K.; Badreldin, H.A.; et al. Revolutionizing healthcare: The role of artificial intelligence in clinical practice. BMC Med. Educ. 2023, 23, 689. [Google Scholar] [CrossRef]
- Lähnemann, D.; Köster, J.; Szczurek, E.; McCarthy, D.J.; Hicks, S.C.; Robinson, M.D.; Vallejos, C.A.; Campbell, K.R.; Beerenwinkel, N.; Mahfouz, A.; et al. Eleven grand challenges in single-cell data science. Genome Biol. 2020, 21, 31. [Google Scholar] [CrossRef] [PubMed]
- Yang, F.; Wang, W.; Wang, F.; Fang, Y.; Tang, D.; Huang, J.; Lu, H. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell rna-seq data. Nat. Mach. Intell. 2022, 4, 852–866. [Google Scholar] [CrossRef]
- Pan, B.; Shen, Y.; Liu, H.; Mishra, M.; Zhang, G.; Oliva, A.; Raffel, C.; Panda, R. Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models. arXiv 2024, arXiv:2404.05567. [Google Scholar] [CrossRef]
- Gupta, J.; Seeja, K.R. A comparative study and systematic analysis of xai models and their applications in healthcare. Healthc. Inform. J. 2024, 31, 3977–4002. [Google Scholar] [CrossRef]
- Salih, A.M.; Galazzo, I.B.; Gkontra, P.; Rauseo, E.; Lee, A.M.; Lekadir, K.; Radeva, P.; Petersen, S.E.; Menegaz, G. A review of evaluation approaches for explainable AI with applications in cardiology. Artif. Intell. Rev. 2024, 57, 240. [Google Scholar] [CrossRef] [PubMed]
- Cheng, H.; Zhang, M.; Shi, J.Q. A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10558–10578. [Google Scholar] [CrossRef]
- Moslemi, A.; Briskina, A.; Dang, Z.; Li, J. A survey on knowledge distillation: Recent advancements. Mach. Learn. Appl. 2024, 18, 100605. [Google Scholar] [CrossRef]
- Zhang, X.; Yang, S.; Duan, L.; Lang, Z.; Shi, Z.; Sun, L. Transformer-xl with graph neural network for source code summarization. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; pp. 3436–3441. [Google Scholar] [CrossRef]
- Wei, X.; Moalla, S.; Pascanu, R.; Gulcehre, C. Investigating low-rank training in transformer language models: Efficiency and scaling analysis. arXiv 2024, arXiv:2407.09835v2. [Google Scholar] [CrossRef]
- Zhou, Z.; Ji, Y.; Li, W.; Dutta, P.; Davuluri, R.; Liu, H. Dnabert-2: Efficient foundation model and benchmark for multi-species genome. arXiv 2023, arXiv:2306.15006. [Google Scholar]
- Liu, Q.; Hu, Q.; Liu, S.; Hutson, A.; Morgan, M. Reusedata: An r/bioconductor tool for reusable and reproducible genomic data management. BMC Bioinform. 2024, 25, 8. [Google Scholar] [CrossRef] [PubMed]
- Barker, A.D.; Alba, M.M.; Mallick, P.; Agus, D.B.; Lee, J.S. An inflection point in cancer protein biomarkers: What was and what’s next. Mol. Cell. Proteom. 2023, 22, 100569. [Google Scholar] [CrossRef] [PubMed]
- Guttmacher, A.E.; McGuire, A.L.; Ponder, B.; Stefánsson, K. Personalized genomic information: Preparing for the future of genetic medicine. Nat. Rev. Genet. 2010, 11, 161–165. [Google Scholar] [CrossRef]
- Williamson, S.M.; Prybutok, V. Privacy dilemmas and opportunities in large language models: A brief review. Front. Comput. Sci. 2024, 19, 1910356. [Google Scholar] [CrossRef]
- Williamson, S.M.; Prybutok, V. Balancing privacy and progress: A review of privacy challenges, systemic oversight, and patient perceptions in ai-driven healthcare. Appl. Sci. 2024, 14, 675. [Google Scholar] [CrossRef]
Model | Developer | Key Features | Applications | Reference |
---|---|---|---|---|
BERT |
| Text classification, named entity recognition, chatbots, language translation | [34] | |
GPT | OpenAI |
| Text generation, language modeling, chatbots, creative writing | [41] |
GPT-3 | OpenAI |
| Text generation, code generation, language translation, text summarization, chatbots | [37] |
GPT-4 | OpenAI |
| Text generation, code generation, language translation, text summarization, chatbots | [42] |
Text-To-Text Transfer Transformer (T5) | Google Research |
| Text translation and summarization, chatbots, text classification | [40] |
RoBERTa | Meta AI |
| Text classification, named entity recognition, chatbots, sentiment analysis | [35] |
XLNet | Google/Carnegie Mellon University |
| Text classification, sentiment analysis, chatbots | [43] |
A Lite BERT (ALBERT) | Google and Toyota Technological Institute |
| Text classification, natural language inference, chatbots | [44] |
BART | Meta AI |
| Text generation and summarization, machine translation, chatbots | [39] |
ERNIE (Enhanced Representation through Knowledge Integration) | Baidu |
| Text classification, chatbots, natural language understanding, language generation | [45] |
Turing-NLG | Microsoft |
| Text generation, chatbots, text summarization, dialogue systems | [46] |
Model | Developer | Key Features | Applications | Reference |
---|---|---|---|---|
ChemBERTa | Industry-Academic Collaboration | Self-supervised learning on SMILES strings | Lead identification, drug optimization | [49] |
AlphaFold | DeepMind | DL for 3D protein structure prediction | Protein structure prediction, function understanding | [51] |
GenoML | GenoML | ML for automated variant analysis | Variant annotation and prioritization in genomics | [53] |
ProteinBERT | Industry-Academic Collaboration | BERT-based pre-trained on about 106M proteins from UniRef90 | Protein function prediction, protein–protein interaction, drug discovery | [54] |
ProtBERT | Industry-Academic Collaboration | BERT applied to protein sequences | Protein classification, function prediction, interaction analysis | [55] |
DNABERT | Northwestern/Brook University | Transformer models for DNA sequences | Genomic variant identification, gene function prediction | [56] |
MedBERT | Stanford University | BERT-based model pre-trained on electronic health records | Patient diagnosis prediction, treatment recommendation, medical image analysis | [60] |
BioBERT | Naver/Korea University | BERT model pre-trained on biomedical literature from PubMed and PMC | Biomedical text mining, named entity recognition, relation extraction, interactive systems | [57] |
PubMedBERT | Microsoft Research | BERT-based, pre-trained specifically on PubMed abstracts and full-text articles | Biomedical text mining, information retrieval, named entity recognition, relationship extraction | [58] |
ClinicalBERT | MIT | BERT-based, pre-trained on clinical notes from electronic health records | Clinical text mining, patient outcome prediction, medical information extraction | [59] |
GenoTEX | Collaborative Genomics Group | Benchmarking and LLM integration for gene expression data | Evaluation and benchmarking of LLMs in gene expression data analysis | [47] |
QuST-LLM | QuPath and Bioinformatics | Spatial transcriptomics enhanced by LLMs | Analysis and interpretation of spatial transcriptomics | [61] |
SeqMate | RNA-Seq Analysis Initiative | Automated RNA sequencing analysis pipeline with LLM support | RNA sequencing data preparation and differential expression analysis | [62] |
GENA-LM | AIRI | Foundational DNA language model | Long DNA sequence handling | [63] |
Geneverse | T Liu et al. | Multimodal LLM | Genomics and proteomics research | [64] |
GROVER | German Cancer Research Center | DNA language model | Human genome context learning | [65] |
Dataset Name | Description | Source/Website |
---|---|---|
1000 Genomes Project | A comprehensive resource of human genetic variation, supporting studies on genetic variation, health, and disease. It includes data from diverse populations worldwide. | https://www.internationalgenome.org/data/ (accessed on 20 April 2025) |
ENCODE Project | Provides functional genomic data, including ChIP-seq, RNA-seq, and epigenomic data, to identify all functional elements in the human genome. | https://www.encodeproject.org/ (accessed on 20 April 2025) |
Genotype-Tissue Expression (GTEx) | Offers data on gene expression and regulation across 54 tissue sites from nearly 1000 individuals, enabling studies on tissue-specific gene expression. | https://www.gtexportal.org/home (accessed on 20 April 2025) |
The Cancer Genome Atlas (TCGA) | Contains genomic, epigenomic, transcriptomic, and proteomic data for over 20,000 primary cancer and matched normal samples across 33 cancer types. | https://www.cancer.gov/ccg/research/genome-sequencing/tcga (accessed on 20 April 2025) |
Human Microbiome Project | Provides data on microbial communities in the human body, including metagenomic and 16S sequencing data. | https://www.hmpdacc.org/resources/data_browser.php (accessed on 20 April 2025) |
UniProt | A comprehensive database of protein sequences and functional information, supporting studies in proteomics and genomics. | https://www.uniprot.org/ (accessed on 20 April 2025) |
dbSNP | A database of single-nucleotide polymorphisms (SNPs) and other genetic variations, facilitating studies on genetic associations and population genetics. | https://www.ncbi.nlm.nih.gov/snp/ (accessed on 20 April 2025) |
Gene Expression Omnibus (GEO) Database | A repository for gene expression and other functional genomics data, supporting MIAME-compliant submissions and analysis tools. | https://www.ncbi.nlm.nih.gov/geo/ (accessed on 20 April 2025) |
Catalogue of Somatic Mutations in Cancer (COSMIC) | An expert-curated database of somatic mutations in cancer, including mutation distributions and effects. | https://cancer.sanger.ac.uk/cosmic (accessed on 20 April 2025) |
ClinVar | Archives information about genomic variations and their relationships to human health, including disease associations and drug responses. | https://www.ncbi.nlm.nih.gov/clinvar/ (accessed on 20 April 2025) |
PharmGKB | A pharmacogenomics knowledge base that links genetic variations to drug responses, aiding in personalized medicine. | https://www.pharmgkb.org/ (accessed on 20 April 2025) |
UK Biobank | A large-scale biomedical database containing genetic, lifestyle, and health data from 500,000 participants, supporting research in personalized medicine. | https://www.ukbiobank.ac.uk/ (accessed on 20 April 2025) |
Medical Information Mart for Intensive Care (MIMIC) | A critical care database with de-identified health data, including clinical notes, lab results, and prescriptions, for personalized healthcare research. | https://mimic.mit.edu/ (accessed on 20 April 2025) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ali, S.; Qadri, Y.A.; Ahmad, K.; Lin, Z.; Leung, M.-F.; Kim, S.W.; Vasilakos, A.V.; Zhou, T. Large Language Models in Genomics—A Perspective on Personalized Medicine. Bioengineering 2025, 12, 440. https://doi.org/10.3390/bioengineering12050440
Ali S, Qadri YA, Ahmad K, Lin Z, Leung M-F, Kim SW, Vasilakos AV, Zhou T. Large Language Models in Genomics—A Perspective on Personalized Medicine. Bioengineering. 2025; 12(5):440. https://doi.org/10.3390/bioengineering12050440
Chicago/Turabian StyleAli, Shahid, Yazdan Ahmad Qadri, Khurshid Ahmad, Zhizhe Lin, Man-Fai Leung, Sung Won Kim, Athanasios V. Vasilakos, and Teng Zhou. 2025. "Large Language Models in Genomics—A Perspective on Personalized Medicine" Bioengineering 12, no. 5: 440. https://doi.org/10.3390/bioengineering12050440
APA StyleAli, S., Qadri, Y. A., Ahmad, K., Lin, Z., Leung, M.-F., Kim, S. W., Vasilakos, A. V., & Zhou, T. (2025). Large Language Models in Genomics—A Perspective on Personalized Medicine. Bioengineering, 12(5), 440. https://doi.org/10.3390/bioengineering12050440