Generative AI Models (2018–2024): Advancements and Applications in Kidney Care
Abstract
:1. Introduction
- Understand Key Technologies: Provide an overview of GenAI, LLMs, and LVMs as of 2024. This includes explaining the core principles of each technology, their advancements over recent years, and their current capabilities.
- Application in Kidney Care: Current applications of these technologies to improve diagnostics, treatment, and patient management in kidney care.
- Present Common Use Cases: Highlight use cases where these technologies can be effectively implemented.
- Address Limitations and Future Directions: Analyze current limitations of these technologies and propose future research areas for advancing their applications in the field.
2. Research Methodology
- Inclusion Criteria:
- –
- Articles in English, published between January 2020 and December 2024, discussing the application of GenAI, LLMs, and LVMs in kidney care, including peer-reviewed journal articles, conference papers, and reputable preprints.
- Exclusion Criteria:
- –
- Non-English articles.
- –
- Publications without full-text access.
- –
- Studies not specifically focused on the application of GenAI, LLMs, and LVMs in kidney care.
- –
- Opinion pieces, editorials, and commentaries lacking empirical data.
3. Background
3.1. The Era of Pretrained Language Models (2018–2020)
3.1.1. Generative Pretrained Transformer (GPT)
3.1.2. Bidirectional Encoder Representations from Transformers (BERT)
- Masked Language Modeling (MLM): Predicts randomly masked words from their context:
- Next Sentence Prediction (NSP): Determines if the second sentence logically follows the first:
3.1.3. XLNet
3.1.4. Text-to-Text Transfer Transformer (T5)
3.1.5. GPT-3
3.2. Multimodal Models: Text, Image, and Video (2021–2023)
3.2.1. DALL·E
3.2.2. Contrastive Language-Image Pretraining (CLIP)
3.2.3. Neural Video and Image Animation (NUWA)
3.2.4. CogView2
3.2.5. Imagen
3.2.6. DALL·E 2
3.2.7. Stable Diffusion
3.2.8. Make-A-Video
3.2.9. MidJourney
3.2.10. DreamFusion
3.2.11. Make-A-Scene
3.2.12. Large Language Model Meta AI (Llama)
3.2.13. Bidirectional Attention Recurrent Decoder (Bard)
3.2.14. Phenaki
3.2.15. GPT-4
3.2.16. Mistral
3.3. 2024—Advanced Video and Multimodal Systems
3.3.1. SORA
3.3.2. Gemini 1.5
3.3.3. Big Sleep
3.3.4. ChatGPT 4o and Variants (2024)
3.4. Improvements in GenAI, LLMs, and LVMs
3.4.1. Prompt Engineering
3.4.2. Retrieval-Augmented Generation (RAG)
3.4.3. Dense Retrieval and Fusion-in-Decoder (FiD)
3.4.4. Sparse Retrieval Models
3.4.5. Vector Databases
3.4.6. Contrastive Learning
4. Recent Applications of GenAI, LLMs, and LVMs in Kidney Care
4.1. 2018–2023: Early Developments
4.2. 2024: Recent Breakthroughs
5. Discussion
6. Use Case
6.1. Text-to-Image (2D)
6.2. Text-to-Image (3D)
6.3. Text-to-Text
6.4. Text-to-Video
7. Challenges of GenAI, LLMs, and LVMs in Kidney Healthcare
8. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Nair, M. The renal system. In Fundamentals of Anatomy and Physiology: For Nursing and Healthcare Students; Wiley-Blackwell: Hoboken, NJ, USA, 2016. [Google Scholar]
- Neha, F. Kidney Localization and Stone Segmentation from a CT Scan Image. In Proceedings of the 2023 7th International Conference on Computing, Communication, Control And Automation (ICCUBEA), Pune, India, 18–19 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Kovesdy, C.P. Epidemiology of chronic kidney disease: An update 2022. Kidney Int. Suppl. 2022, 12, 7–11. [Google Scholar] [CrossRef] [PubMed]
- Luyckx, V.A.; Tonelli, M.; Stanifer, J.W. The global burden of kidney disease and the sustainable development goals. Bull. World Health Organ. 2018, 96, 414. [Google Scholar] [PubMed]
- Osman, M.A.; Alrukhaimi, M.; Ashuntantang, G.E.; Bellorin-Font, E.; Gharbi, M.B.; Braam, B.; Courtney, M.; Feehally, J.; Harris, D.C.; Jha, V.; et al. Global nephrology workforce: Gaps and opportunities toward a sustainable kidney care system. Kidney Int. Suppl. 2018, 8, 52–63. [Google Scholar]
- Bello, A.K.; McIsaac, M.; Okpechi, I.G.; Johnson, D.W.; Jha, V.; Harris, D.C.; Saad, S.; Zaidi, D.; Osman, M.A.; Ye, F.; et al. International Society of Nephrology Global Kidney Health Atlas: Structures, organization, and services for the management of kidney failure in North America and the Caribbean. Kidney Int. Suppl. 2021, 11, e66–e76. [Google Scholar]
- Levey, A.S.; Eckardt, K.U.; Dorman, N.M.; Christiansen, S.L.; Cheung, M.; Jadoul, M.; Winkelmayer, W.C. Nomenclature for kidney function and disease—Executive summary and glossary from a Kidney Disease: Improving Global Outcomes (KDIGO) consensus conference. Eur. Heart J. 2020, 41, 4592–4598. [Google Scholar]
- Seaborg, E. What’s in a Name, and Who’s the Audience? “Kidney” vs. ”Renal”. Kidney News 2021, 13, 24–25. [Google Scholar]
- Fnu, N.; Bansal, A. Understanding the architecture of vision transformer and its variants: A review. In Proceedings of the 2024 1st International Conference on Innovative Engineering Sciences and Technological Research (ICIESTR), Muscat, Oman, 14–15 May 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving language understanding by generative pre-training. Mikecaptain 2018. in progress. [Google Scholar]
- Kenton, J.D.M.W.C.; Toutanova, L.K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, p. 2. [Google Scholar]
- Yang, Z. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv 2019, arXiv:1906.08237. [Google Scholar]
- Roberts, A.; Raffel, C.; Lee, K.; Matena, M.; Shazeer, N.; Liu, P.J.; Narang, S.; Li, W.; Zhou, Y. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2019, 21, 1–67. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-shot text-to-image generation. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 8821–8831. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Wu, C.; Liang, J.; Ji, L.; Yang, F.; Fang, Y.; Jiang, D.; Duan, N. Nüwa: Visual synthesis pre-training for neural visual world creation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 720–736. [Google Scholar]
- Ding, M.; Yang, Z.; Hong, W.; Zheng, W.; Zhou, C.; Yin, D.; Lin, J.; Zou, X.; Shao, Z.; Yang, H.; et al. Cogview: Mastering text-to-image generation via transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 19822–19835. [Google Scholar]
- Ding, M.; Zheng, W.; Hong, W.; Tang, J. Cogview2: Faster and better text-to-image generation via hierarchical transformers. Adv. Neural Inf. Process. Syst. 2022, 35, 16890–16902. [Google Scholar]
- Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.L.; Ghasemipour, K.; Gontijo Lopes, R.; Karagol Ayan, B.; Salimans, T.; et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural Inf. Process. Syst. 2022, 35, 36479–36494. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Singer, U.; Polyak, A.; Hayes, T.; Yin, X.; An, J.; Zhang, S.; Hu, Q.; Yang, H.; Ashual, O.; Gafni, O.; et al. Make-a-video: Text-to-video generation without text-video data. arXiv 2022, arXiv:2209.14792. [Google Scholar]
- MidJourney Team. MidJourney Explore: Top Images. Available online: https://www.midjourney.com/explore?tab=top (accessed on 22 August 2024).
- Poole, B.; Jain, A.; Barron, J.T.; Mildenhall, B. Dreamfusion: Text-to-3d using 2d diffusion. arXiv 2022, arXiv:2209.14988. [Google Scholar]
- Gafni, O.; Polyak, A.; Ashual, O.; Sheynin, S.; Parikh, D.; Taigman, Y. Make-a-scene: Scene-based text-to-image generation with human priors. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 89–106. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Google. Bard: A Generative AI Text-to-Text Model Powered by LaMDA and PaLM 2. 2023. Available online: https://bard.google.com (accessed on 22 December 2024).
- Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; Millican, K.; et al. Gemini: A family of highly capable multimodal models. arXiv 2023, arXiv:2312.11805. [Google Scholar]
- Thoppilan, R.; De Freitas, D.; Hall, J.; Shazeer, N.; Kulshreshtha, A.; Cheng, H.T.; Jin, A.; Bos, T.; Baker, L.; Du, Y.; et al. Lamda: Language models for dialog applications. arXiv 2022, arXiv:2201.08239. [Google Scholar]
- Anil, R.; Dai, A.M.; Firat, O.; Johnson, M.; Lepikhin, D.; Passos, A.; Shakeri, S.; Taropa, E.; Bailey, P.; Chen, Z.; et al. Palm 2 technical report. arXiv 2023, arXiv:2305.10403. [Google Scholar]
- Villegas, R.; Babaeizadeh, M.; Kindermans, P.J.; Moraldo, H.; Zhang, H.; Saffar, M.T.; Castro, S.; Kunze, J.; Erhan, D. Phenaki: Variable length video generation from open domain textual descriptions. In Proceedings of the International Conference on Learning Representations, Online, 25–29 April 2022. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Mistral AI. Introducing Mistral: Advancing Open-Weight Transformer Models. 2023. Available online: https://www.mistral.ai (accessed on 22 December 2024).
- OpenAI. Sora: A Text-to-Video Generation Model Using Transformer-Based Architecture and Pre-Trained Diffusion Models. 2024. Available online: https://openai.com (accessed on 22 December 2024).
- Team, G.; Georgiev, P.; Lei, V.I.; Burnell, R.; Bai, L.; Gulati, A.; Tanzer, G.; Vincent, D.; Pan, Z.; Wang, S.; et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv 2024, arXiv:2403.05530. [Google Scholar]
- Big Sleep Team. From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code. 2024. Available online: https://googleprojectzero.blogspot.com/2024/10/from-naptime-to-big-sleep.html (accessed on 22 December 2024).
- OpenAI. Hello GPT-4o. 2024. Available online: https://openai.com/index/hello-gpt-4o/ (accessed on 1 December 2024).
- OpenAI. GPT-4o Mini: Advancing Cost-Efficient Intelligence. 2024. Available online: https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence (accessed on 1 December 2024).
- OpenAI. Introducing OpenAI o1: A Model Designed for Enhanced Reasoning. Available online: https://openai.com/index/introducing-openai-o1-preview (accessed on 1 December 2024).
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- Karpukhin, V.; Oğuz, B.; Min, S.; Lewis, P.; Wu, L.; Edunov, S.; Chen, D.; Yih, W.t. Dense passage retrieval for open-domain question answering. arXiv 2020, arXiv:2004.04906. [Google Scholar]
- Izacard, G.; Grave, E. Leveraging passage retrieval with generative models for open domain question answering. arXiv 2020, arXiv:2007.01282. [Google Scholar]
- Schütze, H.; Manning, C.D.; Raghavan, P. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008; Volume 39. [Google Scholar]
- Johnson, J.; Douze, M.; Jégou, H. Billion-scale similarity search with GPUs. IEEE Trans. Big Data 2019, 7, 535–547. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Gao, T.; Yao, X.; Chen, D. Simcse: Simple contrastive learning of sentence embeddings. arXiv 2021, arXiv:2104.08821. [Google Scholar]
- Mao, C.; Yao, L.; Luo, Y. A Pre-trained Clinical Language Model for Acute Kidney Injury. In Proceedings of the 2020 IEEE International Conference on Healthcare Informatics (ICHI), Oldenburg, Germany, 30 November–3 December 2020; pp. 1–2. [Google Scholar] [CrossRef]
- Ma, D.; Li, X.; Mou, S.; Cheng, Z.; Yan, X.; Lu, Y.; Yan, R.; Cao, S. Prediction of chronic kidney disease risk using multimodal data. In Proceedings of the 2021 5th International Conference on Compute and Data Analysis, Sanya, China, 2–4 February 2021; pp. 20–25. [Google Scholar]
- Qarajeh, A.; Tangpanithandee, S.; Thongprayoon, C.; Suppadungsuk, S.; Krisanapan, P.; Aiumtrakul, N.; Garcia Valencia, O.A.; Miao, J.; Qureshi, F.; Cheungpasitporn, W. AI-Powered Renal Diet Support: Performance of ChatGPT, Bard AI, and Bing Chat. Clin. Pract. 2023, 13, 1160–1172. [Google Scholar] [CrossRef]
- Aiumtrakul, N.; Thongprayoon, C.; Suppadungsuk, S.; Krisanapan, P.; Miao, J.; Qureshi, F.; Cheungpasitporn, W. Navigating the landscape of personalized medicine: The relevance of ChatGPT, BingChat, and Bard AI in nephrology literature searches. J. Pers. Med. 2023, 13, 1457. [Google Scholar] [CrossRef]
- Suppadungsuk, S.; Thongprayoon, C.; Krisanapan, P.; Tangpanithandee, S.; Garcia Valencia, O.; Miao, J.; Mekraksakit, P.; Kashani, K.; Cheungpasitporn, W. Examining the validity of ChatGPT in identifying relevant nephrology literature: Findings and implications. J. Clin. Med. 2023, 12, 5550. [Google Scholar] [CrossRef] [PubMed]
- Garcia Valencia, O.A.; Thongprayoon, C.; Jadlowiec, C.C.; Mao, S.A.; Miao, J.; Cheungpasitporn, W. Enhancing kidney transplant care through the integration of chatbot. Healthcare 2023, 11, 2518. [Google Scholar] [CrossRef]
- Szczesniewski, J.J.; Tellez Fouz, C.; Ramos Alba, A.; Diaz Goizueta, F.J.; García Tello, A.; Llanes González, L. ChatGPT and most frequent urological diseases: Analysing the quality of information and potential risks for patients. World J. Urol. 2023, 41, 3149–3153. [Google Scholar] [CrossRef] [PubMed]
- Moore, A.; Orset, B.; Yassaee, A.; Irving, B.; Morelli, D. HEalthRecordBERT (HERBERT): Leveraging Transformers on Electronic Health Records for Chronic Kidney Disease Risk Stratification. ACM Trans. Comput. Healthc. 2024, 5, 1–18. [Google Scholar] [CrossRef]
- Tan, Y.; Dede, M.; Mohanty, V.; Dou, J.; Hill, H.; Bernstam, E.; Chen, K. Forecasting Acute Kidney Injury and Resource Utilization in ICU patients using longitudinal, multimodal models. J. Biomed. Inform. 2024, 154, 104648. [Google Scholar] [CrossRef] [PubMed]
- Liang, R.; Zhao, A.; Peng, L.; Xu, X.; Zhong, J.; Wu, F.; Yi, F.; Zhang, S.; Wu, S.; Hou, J. Enhanced artificial intelligence strategies in renal oncology: Iterative optimization and comparative analysis of GPT 3.5 versus 4.0. Ann. Surg. Oncol. 2024, 31, 3887–3893. [Google Scholar] [CrossRef]
- Talyshinskii, A.; Juliebø-Jones, P.; Hameed, B.Z.; Naik, N.; Adhikari, K.; Zhanbyrbekuly, U.; Tzelves, L.; Somani, B.K. ChatGPT as a Clinical Decision Maker for Urolithiasis: Compliance with the Current European Association of Urology Guidelines. Eur. Urol. Open Sci. 2024, 69, 51–62. [Google Scholar] [CrossRef]
- Bersano, J. Exploring the role of Microsoft’s Copilot in visual communication: Current use and considerations through science communicators’ lens. Virus 2024, 6, 11. [Google Scholar] [CrossRef]
- Safadi, M.F.; Zayegh, O.; Hawoot, Z. Advancing Innovation in Medical Presentations: A Guide for Medical Educators to Use Images Generated With Artificial Intelligence. Cureus 2024, 16, e74978. [Google Scholar] [CrossRef]
- Goparaju, N. Picture This: Text-to-Image Models Transforming Pediatric Emergency Medicine. Ann. Emerg. Med. 2024, 84, 651–657. [Google Scholar] [CrossRef]
- Lin, S.Y.; Jiang, C.C.; Law, K.M.; Yeh, P.C.; Kuo, H.L.; Ju, S.W.; Kao, C.H. Comparative Analysis of Generative AI in Clinical Nephrology: Assessing ChatGPT-4, Gemini Pro, and Bard in Patient Interaction and Renal Biopsy Interpretation. Gemini Pro. Bard Patient Interact. Ren. Biopsy Interpret. 2024. Available online: https://ssrn.com/abstract=4711596 (accessed on 18 February 2025).
- Hsueh, J.Y.; Nethala, D.; Singh, S.; Linehan, W.M.; Ball, M.W. Investigating the clinical reasoning abilities of large language model GPT-4: An analysis of postoperative complications from renal surgeries. Urol. Oncol. Semin. Orig. Investig. 2024, 42, 292.e1–292.e7. [Google Scholar] [CrossRef] [PubMed]
- Sexton, D.J.; Judge, C. Assessments of Generative Artificial Intelligence as Clinical Decision Support Ought to be Incorporated Into Randomized Controlled Trials of Electronic Alerts for Acute Kidney Injury. Mayo Clin. Proc. Digit. Health 2024, 2, 606–610. [Google Scholar] [CrossRef]
- Halawani, A.; Almehmadi, S.G.; Alhubaishy, B.A.; Alnefaie, Z.A.; Hasan, M.N. Empowering patients: How accurate and readable are large language models in renal cancer education. Front. Oncol. 2024, 14, 1457516. [Google Scholar] [CrossRef]
- Neha, F.; Bhati, D.; Shukla, D.K.; Amiruzzaman, M. Chatgpt: Transforming healthcare with AI. AI 2024, 5, 2618–2650. [Google Scholar] [CrossRef]
- Commons Wikimedia. The Kidney. 2024. Available online: https://commons.wikimedia.org/wiki/File:2610_The_Kidney.jpg (accessed on 23 November 2024).
Model | Release Date | Architecture | Pretraining Task | Main Strength | Best Use Cases | Limitations |
---|---|---|---|---|---|---|
GPT (OpenAI) | 2018 | Unidirectional (left-to-right) | Autoregressive Language Modeling | Text generation and continuation | Language generation, conversation | Struggles with long-range dependencies and context, lacks bidirectional understanding |
BERT (Google) | 2018 | Bidirectional (left-to-right + right-to-left) | Masked Language Modeling (MLM), Next Sentence Prediction (NSP) | Text understanding (e.g., question answering, sentiment analysis) | Contextual tasks, classification, named entity recognition (NER) | Computationally expensive during inference, not designed for text generation |
XLNet (Google/CMU) | 2019 | Bidirectional with autoregressive (permutation-based) | Permutation-based Language Modeling | Combines autoregressive and bidirectional learning for improved performance | Question answering, text classification, sentiment analysis, and tasks benefiting from both context and generation | More complex training procedure, requires substantial computational resources |
T5 (Google) | 2019 | Text-to-Text Transfer Transformer (Bidirectional) | Span-Corruption (Denoising) Objective | Unified architecture for diverse tasks (translation, summarization, question answering) | Translation, summarization, question answering, text classification | Pretraining requires large-scale data and computation, struggles with very specific tasks requiring domain expertise |
GPT-3 (OpenAI) | 2020 | Unidirectional (left-to-right) | Autoregressive Language Modeling | Few-shot learning, scalable to a variety of tasks | Text generation, translation, summarization, question answering, conversation | Computationally expensive, lacks true understanding, may generate biased or harmful content |
Model | Release Date | Architecture | Pretraining Task | Main Strength | Best Use Cases | Limitations |
---|---|---|---|---|---|---|
DALL·E (OpenAI) | 2021 | Transformer-based, VQ-VAE-2 for image generation | Trained on text-image pairs for generating images from text | Generates novel, high-quality images from text descriptions | Creative fields (design, art, visualization) | Image quality can vary, struggles with detailed prompts, computationally expensive |
CLIP (OpenAI) | 2021 | Vision transformer for images, transformer for text, shared feature space | Contrastive learning on text-image pairs for matching | Matches images with text, zero-shot learning for classification | Image classification, multimodal search, text-to-image matching | Limited by biases, struggles with abstract queries |
NUWA (Microsoft) | 2021 | 3D transformers for spatiotemporal data, VQ-VAE for encoding/decoding | Trained on text-image and text-video pairs for multimodal generation | Generates high-quality videos, animations, and images with temporal coherence | Video animation, text-to-video, creative industries, education | Limited resolution and video length, computationally expensive |
CogView2 (Tsinghua University) | 2022 | Transformer-based, masked autoregressive model for image generation | Trained on text-image pairs for high-quality image generation | Generates high-resolution, detailed images from text prompts, improved over previous versions | Creative industries, design, conceptual art, high-quality visual generation | Struggles with very abstract or highly detailed prompts, computationally intensive |
Imagen (Google) | 2022 | Two-stage diffusion process with T5 transformer encoder | Trained on text-image pairs with large-scale data for photorealistic generation | Generates high-resolution, photorealistic images from text | Design, art, visualization, photorealistic content | Computationally intensive, struggles with complex scenes |
DALL·E 2 (OpenAI) | 2022 | Transformer-based, Diffusion model for image generation, CLIP-based encoder | Trained on text-image pairs with added inpainting capability | Higher quality, more photorealistic images, image inpainting | Design, concept art, image editing | Struggles with very detailed or abstract prompts, computationally intensive |
Stable Diffusion (Stability AI) | 2022 | Latent diffusion model for image generation, transformer-based text encoder | Trained on text-image pairs with latent space modeling | High-quality, photorealistic images, fast and efficient generation | Creative industries, personalized content, rapid prototyping | Potential biases, requires computational resources for high-quality outputs |
Make-A-Video (Meta Platforms) | 2022 | Transformer-based, spatiotemporal generative model | Trained on text-video pairs with temporal coherence modeling | Generates short video clips from text prompts, temporal coherence | Content creation, advertising, creative industries, education | Limited video length, resolution, and frame rate; computationally expensive |
MidJourney (MidJourney, Inc.) | 2022 | Transformer-based, latent diffusion model | Trained on text-image pairs with a focus on artistic content | Produces highly artistic, stylized images from text descriptions | Digital artwork, conceptual designs, visual art projects | Primarily focused on artistic outputs, not suitable for photorealism or highly detailed images |
DreamFusion (Google) | 2022 | 2D diffusion model (like Imagen) + Neural Radiance Field (NeRF) | Trained on text-image pairs with 3D optimization | Generates photorealistic 3D models from text prompts, does not require 3D data | 3D model generation, game design, virtual/augmented reality | Requires substantial computational resources, limitations with complex scene generation |
Make-A-Scene (Meta Platforms) | 2022 | Transformer-based, text-to-image with interactive layout control | Trained on text-image pairs with spatial layout specification | Interactive image generation, user control over scene composition | Concept art, design mockups, visual storytelling, interactive advertising | Limited by complexity of layout inputs, challenges in highly detailed scene generation |
Model | Release Date | Architecture | Pretraining Task | Main Strength | Best Use Cases | Limitations |
---|---|---|---|---|---|---|
Llama (Meta) | 2023 | Transformer-based architecture, open-weight models | Trained on diverse large-scale text datasets | Efficient scaling, high performance with open-source accessibility | Language modeling, text generation, multi-task learning | Limited for very large tasks, scaling challenges in certain applications |
Bard/Gemini (Google) | 2023 | Transformer-based, multimodal architecture (text, image, and video) | Trained on large-scale text datasets, multimodal data (text, images) | Multimodal understanding, high-quality text responses, and creative generation across various formats | Conversational AI, content generation, code generation, customer support, creative industries | Computationally expensive, potential biases, limited by training data complexity |
Phenaki (Google) | 2023 | Transformer-based, Causal Vision Transformer (C-ViViT), masked bidirectional transformer | Trained on text-video pairs with spatiotemporal modeling | Generates long-form, coherent videos from text prompts, maintains temporal coherence | Storytelling, advertising, content creation, creative industries | Requires substantial computational resources, potential challenges with complex prompts |
GPT-4 (OpenAI) | 2023 | Transformer-based, multimodal architecture (text and image inputs) | Trained on large text and image datasets | Multimodal understanding and generation, high-quality text responses from both image and text inputs | Visual Question Answering (VQA), image captioning, multimodal tasks | Computationally expensive, potential biases, and limited by training data complexity |
Mistral (Mistral AI) | 2023 | Sparse mixture-of-experts (MoE), transformer-based architecture | Trained on large-scale text datasets | Efficient scaling with dynamic expert selection, reduces computational costs | Language modeling, text generation, multi-task learning | Limited scalability for very large tasks, expert activation challenges |
Model | Release Date | Architecture | Pretraining Task | Main Strength | Best Use Cases | Limitations |
---|---|---|---|---|---|---|
SORA (OpenAI) | 2024 | Transformer-based architecture with pre-trained diffusion model | Text-to-video generation from textual descriptions | Efficient video generation with spatial and temporal consistency | Video generation for creative industries, educational content, advertising, and prototyping | Struggles with complex scenes, accurate human motion rendering, and high-quality image generation |
Gemini 1.5 (Google) | 2024 | PaLM-based transformer architecture with Pathways framework and MoE activation | Multimodal integration, including text, image, and video generation | Advanced multimodal integration and efficient scaling with large context windows | Creative and analytical applications across text, image, and video generation, including large-scale AI tasks | Challenges with fine-grained temporal coherence in video generation and computational complexity in some tasks |
Big Sleep (Google) | 2024 | Multimodal architecture with LLMs and specialized tools (code browser, Python tool, debugger) | Trained on codebases for vulnerability detection, variant analysis, and proactive defense | Detects and analyzes vulnerabilities, proactive defense, variant analysis | Vulnerability research, cybersecurity, automated security analysis, root-cause analysis | Requires substantial computational resources, challenges with complex vulnerabilities and large codebases |
ChatGPT (OpenAI) | 2024 | Transformer-based architecture with multimodal capabilities (text, image, video) | Trained on large-scale text and multimodal datasets, reinforcement learning with human feedback (RLHF) | Multimodal understanding, high-quality responses, and creative generation | Conversational AI, content generation, customer support, creative tasks | Computationally expensive, potential biases, limited by training data complexity |
Year | Focus | Contribution | Architecture & Details | Result | Strengths | Limitations |
---|---|---|---|---|---|---|
2020 [50] | Acute Kidney Injury (AKI) Prediction | AKI-BERT: a domain-specific language model for early AKI prediction using clinical notes. | Based on Clinical BioBERT; pre-trained with MLM and NSP tasks on AKI-specific corpus (77,160 notes, 49 M tokens). Fine-tuned with stratified sampling strategies. | Achieved AUC of 76.4% (best with upsampling + pooling) | Enhanced domain-specific representation; effective handling of imbalanced datasets; improved early AKI prediction accuracy. | Limited to textual data without incorporating structured clinical measurements |
2021 [51] | CKD Prediction | MD-BERT-LGBM: a multimodal model integrating BERT with LightGBM for CKD risk prediction using structured and unstructured data. | MD-BERT for unstructured clinical notes extraction, coupled with LightGBM for nonlinear classification using lab results and other structured data. | Accuracy: 78.1%, recall: 75.7%, AUC: 85.2%. | Superior accuracy and effective integration of structured and unstructured data; robust for small datasets. | Requires high computational resources; limited interpretability; validation needed on larger datasets. |
2023 [52] | CKD Dietary Support | Evaluated ChatGPT 3.5, ChatGPT 4, Bard AI, and Bing Chat in identifying mineral (potassium and phosphorus) content in foods for CKD patients. | Transformer-based generative AI models tailored for text-to-information tasks; ChatGPT 4 offers improved precision and complexity handling over 3.5. | ChatGPT 4: 81.0% accuracy for potassium; Bard AI: 100.0% accuracy for phosphorus; Bing Chat: 89.0% for phosphorus. | High potential for assisting healthcare providers in CKD diet planning; efficiency in categorizing nutritional content. | Inconsistent performance across food categories; requires human oversight to ensure reliability in clinical application. |
2023 [53] | Nephrology Literature Search | Evaluated ChatGPT-3.5, Bing Chat, and Bard AI for citation accuracy in nephrology literature searches. | GPT-3.5, GPT-4 (ChatGPT), Bing Chat (GPT-4), and Bard AI (PaLM2); tested with 12 nephrology topics and 20 references per topic. | ChatGPT-3.5: 38.0% accurate references, Bing Chat: 30.0% accurate, Bard AI: 3.0% accurate. Bard exhibited the highest rate of fabricated references (63.0%). | Highlights potential for automating citation generation; ChatGPT showed highest accuracy. | High rate of fabricated and inaccurate references; inconsistent performance across platforms; requires human verification. |
2023 [54] | Nephrology Citation Validation | Assessed ChatGPT-3.5’s ability to identify and validate references in nephrology literature. | GPT-3.5 | Of 610 references, 62.0% existed, 20.0% accurate, 31.0% fabricated, and 7.0% incomplete; only 20.0% included correct links. | Time-efficient for generating initial reference lists; identifies relevant references quickly. | High rate of fabricated references; poor accuracy in DOIs/links (68.0%); weak reliability in specialized domains like peritoneal dialysis. |
2023 [55] | Kidney Transplant Care | Proposed integration of AI-powered chatbots to enhance kidney transplant care in patient education, clinical decision support, and medication management. | Leveraged NLP in GenAI chatbots like ChatGPT; integrated with EHR systems for real-time insights and personalized patient education. | Demonstrated potential for improving adherence to treatment plans and addressing disparities in kidney transplant care. | Personalizes patient education; supports treatment compliance; enhances healthcare provider efficiency. | Risks of over-reliance on chatbot recommendations; challenges with AI biases, integration into existing EHR systems, and ethical concerns. |
2023 [56] | Patient Education in Urology | Evaluated ChatGPT-4.0 for providing information on bladder cancer, prostate cancer, renal cancer, benign prostatic hypertrophy (BPH), and urinary stones. | ChatGPT-4.0 evaluated using DISCERN and informed consent quality assessments; analyzed for correctness and word count. | DISCERN score of 4/5 for most conditions; accurate but lacked sources and comprehensive coverage. | Clear explanations of pathologies and treatments; emphasized patient-doctor consultation; addressed risks and recovery for surgeries. | Missing bibliographic references; lacked depth for complex treatments; omitted alternative options unless explicitly queried. |
Year | Focus | Contribution | Architecture & Details | Result | Strengths | Limitations |
---|---|---|---|---|---|---|
2024 [57] | CKD Risk Stratification | HERBERT: A BERT-based model tailored for EHR data with custom embeddings (age, sex, visit) and next visit prediction task. | Transformer architecture with MLM; 2025-token vocabulary. | ROC AUC: 91.0% (1-year), 86.0% (2-year), 82.0% (5-year). | Superior temporal context handling and improved long-term prediction. | Limited dataset size; requires validation on diverse cohorts; lacks integration of lab results. |
2024 [58] | AKI and Continuous Renal Replacement Therapy (CRRT) Prediction in ICU | Multimodal model combining LSTM (time series) and BioMedBERT (clinical notes) to predict AKI and CRRT needs 12 h in advance. | LSTM for structured time-series data and BioMedBERT for unstructured clinical notes; integrated using a multimodal encoder; SHAP used for interpretability. | AUROC: 88.8% (AKI), 99.7% (CRRT); AUPRC: 72.7% (AKI), 84.0% (CRRT). | Effective integration of structured and unstructured data; enhanced prediction accuracy and interpretability. | Imbalanced dataset for CRRT; computationally intensive multimodal architecture. |
2024 [59] | Renal Cell Carcinoma (RCC) Diagnosis | Compared GPT-3.5 and GPT-4.0 for addressing RCC-related clinical queries. | GPT-3.5 Turbo and GPT-4.0; fine-tuned GPT-3.5 with RCC-specific clinical questions using iterative training. | GPT-3.5: 67.1% accuracy; GPT-4.0: 77.5%; Fine-tuned GPT-3.5: 93.8%, achieving 100.0% accuracy with iterative training. | Enhanced accuracy with iterative fine-tuning; improved handling of RCC-specific clinical queries. | Initial instability in responses due to limited RCC-specific data; requires iterative optimization for significant improvements. |
2024 [60] | Urolithiasis (Kidney Stones) Diagnosis and Treatment | Investigated GPT-4 for urolithiasis care, including diagnosis, and urgent care. | GPT-4 evaluated using 11 clinical scenarios; responses assessed for compliance with guidelines by experienced urologists. | Correct in initial diagnostic scenarios; 73.0% partial compliance; errors in 6/11 responses. | Accurate diagnostic suggestions; demonstrated potential for patient communication and initial treatment planning. | Poor adherence to surgical guidelines; lacked detailed drug recommendations; inconsistent response accuracy. |
2024 [61] | Visual Communication in Nephrology | Use of Microsoft Copilot with DALL-E to create educational kidney-related visuals for diagnostics and patient education. | DALL-E 3 integrated with Microsoft Copilot. | Produced accurate, engaging visuals for nephrology education and diagnostics, boosting communication efficiency. | Easy to use; accessible integration with Microsoft 365; supports nephrology-specific visual content creation. | Challenges in precise anatomical accuracy; iterative refinement needed for medical-grade visuals; advanced features require premium subscriptions. |
2024 [62] | Medical Education and Presentations | Comprehensive guide for medical educators to create AI-generated images for presentations, focusing on practical and cost-effective methods. | Utilized Text-to-Image models (DALL-E); emphasized prompt engineering and multimedia principles. | Generated accurate, culturally sensitive, and diverse visuals tailored to educational needs, enhancing engagement. | Cost-effective, customizable tools for creating compelling educational presentations. | Current Text-to-Image models lack precision for complex medical visuals; ethical concerns with patient representation and copyright issues. |
2024 [63] | Pediatric Emergency Medicine | Use of DALL-E to create child-friendly visuals for medical procedures to reduce anxiety and improve understanding. | DALL-E 3; text encoder and generative diffusion model for creating illustrations of medical procedures. | Reduced patient anxiety and improved compliance during procedures; increased patient satisfaction. | Enhanced patient engagement and communication, particularly in pediatric settings; helpful for non-English-speaking families. | Dependency on precise text input; potential for generating anatomically inaccurate or confusing images. |
2024 [64] | Clinical Nephrology | Comparative analysis of ChatGPT-4, Gemini Pro, and Bard in nephrology-related queries and renal biopsy interpretations. | Evaluated using 21 nephrology-related questions and 3 renal biopsy reports; metrics included TF-IDF, BertScore, ROUGE, and empathy ratings. | ChatGPT-4: Highest in empathy (79.9%); Gemini Pro: Best for biopsy appropriateness; Bard: Best in dialysis-related helpfulness. | Comprehensive performance assessment across empathy, appropriateness, and helpfulness; improved reference accuracy. | Limited to predefined questions and reports; small evaluator group; dependency on model-specific strengths limits generalizability. |
2024 [65] | Postoperative Complication Analysis for Renal Surgeries | Evaluated GPT-4’s ability to detect and classify postoperative complications using discharge summaries from renal surgeries. | GPT-4 with zero-shot and few-shot prompts. | 79.6% match rate with human-curated data for complication detection; accuracy: GPT-4 (86.7%), human (92.9%). | Demonstrated potential for automating complication detection; reduced manual workload for initial analysis. | Poor performance in advanced categorization; prone to false positives/negatives; requires human oversight. |
2024 [66] | Clinical Decision Support in AKI | Evaluated GPT-4 and GenAI integration into AKI e-alert systems for early detection and management. | Multimodal generative AI leveraging EHR inputs like labs, vitals, and clinical notes. | Hypothetical trials indicated improved detection accuracy and management recommendations for AKI cases. | Scalable solution for real-time AKI management; improves physician efficiency and patient safety. | Challenges with specificity in complex cases; risk of alert fatigue; interpretability issues limit full adoption. |
2024 [67] | Renal Cancer Education | Evaluated ChatGPT-4, Gemini AI, and Perplexity AI against authoritative patient education materials for accuracy and readability. | ChatGPT-4, Gemini AI, and Perplexity AI | ChatGPT-4: Highest accuracy; Gemini AI: Best readability | High accuracy and improved readability in patient education; adaptable to simplifying text upon request. | Variability across topics; struggled to achieve uniform 6th-grade readability; minor inaccuracies in treatment-related queries. |
Prompt | ChatGPT Response |
---|---|
: Does frequent urination mean kidneys are functioning normally? | : Frequent urination indicates active urine production but not necessarily normal kidney function. It can result from high fluid intake, diuretics, medications, or conditions like UTIs, diabetes, or overactive bladder. Consult a healthcare professional for concerns. |
: How can it be determined if someone has kidney disease (KD)? | : Symptoms of KD include fatigue, swelling, changes in urination, foamy urine, back pain, high blood pressure, decreased appetite, nausea, and difficulty concentrating. Persistent symptoms require medical evaluation through blood, urine tests, and imaging. |
: Can kidney problems occur without pain? | : Yes, kidney issues, like CKD, may not cause pain in early stages. Symptoms to watch for include changes in urination, fatigue, swelling, high blood pressure, and anemia. Regular check-ups and tests are essential for early detection. Consult a healthcare professional if kidney problems are suspected. |
: Is anemia related to kidney failure? | : Anemia is common in kidney failure due to reduced red blood cell production, leading to fatigue and weakness. Managing anemia is crucial for treating KD patients. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Neha, F.; Bhati, D.; Shukla, D.K. Generative AI Models (2018–2024): Advancements and Applications in Kidney Care. BioMedInformatics 2025, 5, 18. https://doi.org/10.3390/biomedinformatics5020018
Neha F, Bhati D, Shukla DK. Generative AI Models (2018–2024): Advancements and Applications in Kidney Care. BioMedInformatics. 2025; 5(2):18. https://doi.org/10.3390/biomedinformatics5020018
Chicago/Turabian StyleNeha, Fnu, Deepshikha Bhati, and Deepak Kumar Shukla. 2025. "Generative AI Models (2018–2024): Advancements and Applications in Kidney Care" BioMedInformatics 5, no. 2: 18. https://doi.org/10.3390/biomedinformatics5020018
APA StyleNeha, F., Bhati, D., & Shukla, D. K. (2025). Generative AI Models (2018–2024): Advancements and Applications in Kidney Care. BioMedInformatics, 5(2), 18. https://doi.org/10.3390/biomedinformatics5020018