Advancing Predictive Healthcare: A Systematic Review of Transformer Models in Electronic Health Records

Mohamed, Azza; AlAleeli, Reem; Shaalan, Khaled

doi:10.3390/computers14040148

Open AccessReview

Advancing Predictive Healthcare: A Systematic Review of Transformer Models in Electronic Health Records

by

Azza Mohamed

^1,*

,

Reem AlAleeli

²

and

Khaled Shaalan

²

¹

Faculty of Engineering and Computing, Liwa College, Al Ain P.O. Box 41009, United Arab Emirates

²

Faculty of Engineering & IT, The British University in Dubai, Dubai 345015, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Computers 2025, 14(4), 148; https://doi.org/10.3390/computers14040148

Submission received: 24 February 2025 / Revised: 9 March 2025 / Accepted: 13 March 2025 / Published: 14 April 2025

(This article belongs to the Special Issue Applications of Machine Learning and Artificial Intelligence for Healthcare)

Download

Browse Figures

Versions Notes

Abstract

:

This systematic study seeks to evaluate the use and impact of transformer models in the healthcare domain, with a particular emphasis on their usefulness in tackling key medical difficulties and performing critical natural language processing (NLP) functions. The research questions focus on how these models can improve clinical decision-making through information extraction and predictive analytics. Our findings show that transformer models, especially in applications like named entity recognition (NER) and clinical data analysis, greatly increase the accuracy and efficiency of processing unstructured data. Notably, case studies demonstrated a 30% boost in entity recognition accuracy in clinical notes and a 90% detection rate for malignancies in medical imaging. These contributions emphasize the revolutionary potential of transformer models in healthcare, and therefore their importance in enhancing resource management and patient outcomes. Furthermore, this paper emphasizes significant obstacles, such as the reliance on restricted datasets and the need for data format standardization, and provides a road map for future research to improve the applicability and performance of these models in real-world clinical settings.

Keywords:

electronic health records; EHR; health records; transformer models; transformers; electronic medical records; EMR; record analysis; GPT; BEHRT

1. Introduction

In recent years, the widespread adoption of Electronic Health Records (EHRs) has changed the healthcare business, generating huge volumes of data that require complicated analytical methods to extract relevant insights [1]. This influx of high-dimensional, heterogeneous data necessitates robust ways for quickly analyzing unstructured and sequential datasets. NLP has emerged as a critical approach in this sector, extracting substantial patterns and insights from unstructured medical information. Early techniques, such as recurrent neural networks (RNNs), showed promise, but faced challenges in capturing long-term dependencies in data sequences [2]. Transformer models, which include well-known architectures like as BERT, GPT, and domain-specific modifications (e.g., ClinicalBERT, BioBERT), use self-attention mechanisms to efficiently capture both short- and long-term dependencies [3,4]. These models have performed exceptionally well in EHR analysis, particularly in tasks such as temporal dependence identification, early diagnosis, and the development of tailored treatment programs [5,6,7]. Their capacity to accommodate a variety of data types—from clinical notes to imaging reports—makes them ideal for healthcare applications [8]. Furthermore, the scalable nature of transformer designs allows for the parallel processing of enormous datasets, which is crucial given the exponential development of EHR data [9]. Despite these advances, several problems prevent transformers from being widely used in clinical settings. Key difficulties include model interpretability, which is required by healthcare practitioners for transparent rationale behind forecasts, as well as the significant computational costs associated with training and deploying these models. Ethical considerations, particularly over data privacy and security, exacerbate the complexity of their inclusion into healthcare procedures [10]. Addressing these constraints is critical to realizing transformers’ full promise in predictive healthcare.

Figure 1 depicts the recurrence of significant concepts in the studied publications, emphasizing transformers’ central significance in EHR analysis and their links to critical themes such as deep learning, natural language processing, and predictive analytics. This graphic emphasizes the comprehensive integration of transformer models in the healthcare sector by demonstrating how they interact with critical areas such as information extraction, diagnosis prediction, and patient risk classification. The investigation identified significant patterns, demonstrating that transformer frameworks not only excel at traditional tasks but also promote advances in predictive analytics, improving clinical decision-making processes. Furthermore, the picture presents a thorough overview of the changing landscape of healthcare technology, demonstrating transformers’ expanding relevance and capacity to revolutionize clinical procedures and improve patient outcomes effectively.

Additionally, Figure 2 depicts the interdependent processes of data mining, feature extraction, and model application in predictive healthcare solutions. The image emphasizes each stage’s specific functions, demonstrating how they work together to achieve efficient predictive modeling. By improving the descriptions, we highlight how each figure contributes to a holistic understanding of transformers’ expanding position in healthcare. This paradigm emphasizes transformers’ revolutionary impact on EHR analysis while describing the measures required for future developments. Furthermore, it emphasizes the importance of using transformer models in a methodical manner to enhance their efficacy in clinical contexts. Figure 2 encapsulates the holistic workflow required for developing robust predictive healthcare solutions by combining the stages of data mining and feature extraction with model application, emphasizing the importance of continuous innovation to ensure that these models align with real-world clinical needs.

This systematic review investigates the use of transformer models in Electronic Health Record (EHR) analysis, highlighting their potential for improving predictive healthcare and identifying future research topics. Transformers are deep learning architectures that excel at natural language processing jobs because of their capacity to model complicated relationships within data. In a variety of applications, they beat classical machine learning and previous deep learning models such as CNNs and RNNs by utilizing self-attention processes.

Using the PICO (Population, Intervention, Comparison, and Outcomes) paradigm, this review categorizes transformer applications by data type, architecture, and assessment approaches. The Population component focuses on electronic health records (EHRs) utilized for predictive analytics and clinical decisions. The Intervention component includes the use of transformer-based models for illness prediction, patient risk stratification, and clinical text analysis. The Comparison assesses the benefits and drawbacks of classical machine learning algorithms. Finally, the Outcomes address critical challenges such as model interpretability, computational efficiency, scalability, and ethical integration in healthcare AI applications.

Despite tremendous progress in NLP applications in healthcare, substantial restrictions remain, such as a lack of domain-specific datasets, interpretability issues, and difficulties adjusting models across varied clinical settings. This work seeks to address these issues by focusing on transformer-based NLP applications in healthcare, with the goal of improving early treatment and patient outcomes. The analysis identifies nine essential research concerns about interpretability, computational efficiency, and ethical integration, with the goal of accelerating the development of NLP applications that promote fast and effective healthcare solutions.

Furthermore, a distinct section has been dedicated to elucidating the significance and linkages of the research questions, as well as explaining the methods used to choose them. This framework clarifies how each issue links to the study’s primary themes and advances our understanding of transformer models in healthcare applications.

The study’s methodical methodology seeks to establish a solid platform for future research and facilitate the development of NLP-driven healthcare solutions, resulting in better early treatments and patient outcomes. By addressing existing restrictions, this study aims to improve the efficacy and dependability of NLP-powered healthcare solutions.

1.1. Research Objectives

To thoroughly analyze the transformative potential of transformer models in healthcare, this study defines key research objectives aligned with its systematic review framework. These objectives aim to assess the impact of transformer models on enhancing clinical decision-making, predictive analytics, and NLP applications in healthcare. The specific objectives are as follows:

-: To analyze the role of transformer models in addressing key challenges in healthcare and their contributions to clinical decision-making, data analysis, and predictive analytics.
-: To identify and evaluate specific NLP tasks in healthcare that are enhanced by transformer models and assess their impact on clinical applications.
-: To review and compare commonly utilized transformer architectures and techniques in healthcare applications, analyzing their effectiveness in addressing domain-specific challenges.
-: To examine the types of datasets used for training transformer models in healthcare and assess their influence on model performance, generalization, and clinical applicability.
-: To evaluate the replicability of studies involving transformer models in healthcare, examining methodologies, reproducibility, and reporting standards.

1.2. Background on Transformer Models

Transformer models are a key improvement in NLP and have altered many applications, including those in healthcare. Transformers, first introduced in [3], use a self-attention mechanism to weigh the value of different words in a sequence for enhanced context interpretation. This breakthrough allows models to analyze data in parallel rather than sequentially, significantly improving efficiency over classic architectures such as RNNs and long short-term memory (LSTM) networks. Transformers are remarkable for their architecture, which includes an encoder and a decoder. The encoder receives input data and generates a contextual representation, which is then converted by the decoder into the required output. This architecture not only manages long-term relationships within data, but it also improves model performance on tasks like translation, summarization, and sentiment analysis [11,12]. The key features include:

-: Self-Attention Mechanism: This enables models to evaluate the relevance of all words in a sentence at the same time, which improves context understanding [13].
-: Parallel Processing: Transformers may process data sequences concurrently, dramatically lowering training time and successfully managing big datasets [14].
-: Layered Architecture: Transformers have numerous encoder and decoder layers that capture different degrees of abstraction, enhancing data representation.
-: Pre-training and Fine-tuning: To adapt to specialized tasks, these models can be pre-trained on huge datasets and then fine-tuned on relevant domain data, such as medical texts [15].
-: Scalability: Their scalable architecture supports larger models that capture complicated data patterns, as demonstrated by successful variations such as BERT and GPT.

Transformers’ ability to analyze large amounts of unstructured data makes them useful in healthcare for applications such as extracting information from clinical texts, predictive analytics for patient outcomes, and decision support systems, all of which improve analytical capabilities in the field.

2. Research Direction

This paper investigates the transformative role of transformer models in EHR analysis and outlines a structured roadmap for their integration into healthcare systems. In adherence to systematic review methodologies, specifically the PRISMA framework and the PICO model, this study ensures a comprehensive and rigorous evaluation of existing research. To further establish a structured framework for future research in this domain, we address nine key research questions (RQs):

RQ1: How do transformer models address key challenges in healthcare, particularly in improving clinical decision-making, data analysis, and predictive analytics?

RQ2: What specific natural language processing (NLP) tasks are being enhanced by transformer models in healthcare, and how do these tasks contribute to clinical applications?

RQ3: What transformer model architectures and techniques are most utilized in healthcare applications, and how do they differ in addressing specific healthcare challenges?

RQ4: What types of datasets are utilized to train transformer models in healthcare, and how do these datasets influence model performance and generalization in clinical applications?

RQ5: To what extent are the studies involving transformer models in healthcare replicable?

RQ6: How does the lack of standardized data formats affect the implementation and integration of transformer models in healthcare systems?

RQ7: What is the quality of the studies reviewed on transformer models in healthcare, and what are the key limitations that impact their generalizability and clinical applicability?

RQ8: What are the ethical and privacy concerns related to using transformer models in healthcare data?

RQ9: What are the key benefits and challenges associated with the use of transformer models for real-time patient monitoring, and how do these factors impact their effectiveness in clinical practice?

By addressing these issues, this review lays the groundwork for future research to improve the utility, reliability, and ethical integration of transformers in healthcare. The attached figure summarizes the research scope and demonstrates the relationships between major components of transformer-based healthcare applications.

3. Methods

This systematic review used the established principles suggested by [16] to enable a robust and complete investigation of transformer-based models in healthcare, with a focus on EHR. The approach was rigorously planned around several essential components. First, the research questions were carefully defined to ensure alignment with the study’s objectives. Next, strong inclusion and exclusion criteria were developed to maintain the relevance and quality of the selected sources. Finally, an effective search strategy was implemented, incorporating multiple prominent academic resources such as PubMed, ACM Digital Library, and IEEE Xplore (Table 1). These databases were chosen for their significance in both the healthcare and artificial intelligence fields, ensuring a comprehensive and unbiased selection of literature.

The search utilized a combination of transformer model keywords (e.g., BERT, GPT, BioBERT) and healthcare-related terms (e.g., EHRs, EMRs, clinical data analytics). This technique ensured the retrieval of research from both technical and therapeutic viewpoints [17]. The initial search yielded many research articles, as shown in the PRISMA flow diagram (Figure 3), which illustrates the systematic selection process of studies included in the review. This diagram visually represents the systematic process used to identify, screen, and select studies for inclusion in the review. It outlines the number of studies initially identified through database searches, the removal of duplicates and non-prominent studies, and the subsequent quality assessments that led to the final selection of 15 high-quality articles. The flow diagram emphasizes the rigorous methodology employed to ensure a thorough and unbiased selection, highlighting the necessity of maintaining high standards for relevancy and methodological integrity in research. Additionally, it notes potential opportunities for future research directions, suggesting that a broader scope, including a wider array of studies and gray literature, may enrich understanding in the field. A total of 100 papers were quality-tested to guarantee relevancy and peer-review legitimacy. Quality evaluation was conducted using a comprehensive, standardized framework to assess key factors, such as study design, methodological rigor, and relevance to the research questions. This framework prioritized several essential quality factors: the clarity of research objectives, the suitability of sampling procedures, data transparency, repeatability, and the robustness of data analysis methods. The research design was rigorously analyzed to determine the validity and reliability of the methods used. A clear and well-structured study plan was required to ensure credible and dependable conclusions. Additionally, advanced AI-assisted methods were used during the screening process to help with filtering and refinement, ultimately resulting in the final selection of 15 high-quality studies that met the stated inclusion-exclusion criteria (Table 2). Table 3 summarizes the title, authors, publication title, and year of each study, along with a brief introduction highlighting the key focus, methodology, and relevance to the research objectives. This process ensured that only the most robust and relevant studies were included in the final review. The extracted data focused on crucial variables such as the types of transformer models used, the healthcare tasks addressed, the datasets used, and the reproducibility of results. For example, studies using BERT have shown that it can diagnose medical diseases from clinical notes [18], whereas BioBERT has proved effective in extracting relationships from biomedical literature.

3.1. The Use of Electronic Health Records (EHRs) in AI Training

This section looks at the critical importance of Electronic Health Records (EHRs) in training artificial intelligence models, specifically transformer models. The development of EHRs has resulted in massive volumes of data, which is critical for developing powerful predictive analytics and natural language processing systems. The integration of EHRs enables transformer models to learn from a variety of datasets, including clinical notes, lab results, and imaging reports, improving their ability to solve complicated healthcare situations.

3.2. Data Collection

An organized and systematic data extraction procedure was employed to identify key characteristics of each study included in this review. This process was designed to ensure consistent capture of relevant details across experiments, allowing for thorough analysis. The following characteristics were retrieved from each selected study:

-: Study Details: This category includes the study’s title, author names, year of publication, and the journal or conference where the article was published. Such details provide context for assessing relevance, credibility, and timeliness of findings. Including the publication outlet is significant in systematic reviews, as it classifies studies by their level of peer review and impact [33,34,35].
-: Research Objectives: The precise healthcare or business problems that transformer models aimed to address were documented. These objectives ranged from improving healthcare decision-making to automating customer service, providing insights into the practical applications of transformer models in solving essential industry challenges.
-: Model Type and Architecture: This section identifies transformer architectures used, such as BERT, GPT, and T5. Each model’s adaptability varies significantly based on the task. For instance, the Temporal Fusion Transformer (TFT) is tailored for time-series forecasting in EHR data, while the Vision Transformer (ViT) excels in image-based tasks. Understanding these distinctions is crucial for appreciating their respective roles in healthcare AI applications [36,37,38].
-: NLP Tasks: The specific NLP tasks addressed by the transformer models were classified, including text classification, sentiment analysis, named entity recognition (NER), relation extraction, and information retrieval [19]. Identifying these functions is essential for understanding the models’ applications in real-world healthcare settings, particularly in enhancing diagnosis and treatment planning [20].
-: Datasets: Details on the datasets utilized in each study were collected, encompassing data sources, sizes, and whether they were public or private. For example, while some studies used publicly available datasets like MIMIC-III, others relied on proprietary datasets. Dataset transparency is vital for reproducibility and assessing the generalizability of model results across diverse populations [22].
-: Evaluation Metrics: The studies’ evaluation metrics, including accuracy, F1 score, precision, recall, and AUC-ROC, were noted. These metrics offer critical insights into model performance, particularly where precision and recall are crucial due to the potential consequences of erroneous predictions in healthcare [23,24].
-: Reproducibility: The assessment of reproducibility involved reviewing each study’s code disclosure and data availability. Studies that provided access to datasets and model code were considered more reproducible, which is important for advancing research in NLP and transformer models. Limitations regarding proprietary datasets and code access were also noted [25].
-: Limitations and Challenges: Common challenges highlighted in the studies included data quality issues, such as missing values and unbalanced datasets, high computational costs associated with training large transformer models, and difficulties related to model interpretability [26,31]. These limitations are particularly relevant in considering the practical deployment of these models.

3.3. Extraction of Homogeneous Data for Diagnosis and Predictive Medicine

To ensure relevant insights from the collected studies, it is critical to extract homogeneous data suited for diagnosis and prediction analytics. This article describes the strategies used to standardize data formats and speed up the extraction of relevant features from unstructured clinical texts. The goal of using natural language processing techniques is to develop standard datasets that can successfully enable AI model training and provide accurate forecasting capabilities in healthcare contexts.

3.4. Criteria and Processes for Comparison

This subsection describes the specified criteria and techniques for comparing the selected studies. Methodological rigor, sample size, transparency, and relevance to study objectives were among the most important rating characteristics. This study used systematic data extraction procedures to obtain key features from each trial, assuring consistency and dependability when interpreting the findings. The comparison procedure demonstrates transformer models’ substantial contributions to clinical decision-making and predictive analytics.

3.5. Quality Assessment

In this manuscript, we will include a section that outlines the methodology behind the visualizations presented in Figure 1. This section will detail the data sources utilized, the analytical techniques applied, and the frameworks employed to derive the insights represented in the figure. By providing this comprehensive overview, we aim to enhance the robustness of our findings regarding the role of transformers in EHR analysis and clarify their connections to vital themes such as DL, NLP, and predictive analytics.

Only 15 of the studies found in the PRISMA flow diagram (Figure 3) matched all the stated quality criteria. Papers with just narrative reviews, little or no quantitative data, or insufficient information to fully answer the study topics were excluded. During the systematic analysis, a comprehensive quality evaluation framework was used to analyze the methodological rigor, data quality, and reproducibility of the selected research. The evaluation procedure prioritized several essential quality factors, such as research design, data transparency, repeatability, and the suitability of model evaluation measures. First, each paper’s research design was rigorously analyzed to determine the validity and reliability of the methods used. This included assessing the clarity of the research objectives, the suitability of the sampling procedures, and the robustness of the data analysis methods. A clear and well-structured study plan is required for reaching credible and dependable conclusions.

A total of 4829 studies were initially retrieved from academic databases. The study selection process followed a rigorous multi-stage procedure to ensure relevance and quality. Duplicates and non-English studies were removed first, reducing the count to 3218. A structured abstract and title screening was conducted manually by two independent reviewers, aided by AI-assisted highlighting (not exclusion). The AI tool did not make final exclusion decisions; rather, it was used to assist human reviewers in prioritizing relevant papers.

3.6. AI-Assisted Study Screening and Quality Assurance

To ensure the reliability and quality of the AI tool used in the research screening process, it was developed and trained on a high-quality, domain-specific dataset. A comprehensive evaluation procedure was employed to determine the tool’s accuracy, which included cross-validation and performance indicators such as precision, recall, and F1-score. Additionally, domain experts conducted assessments to verify that the AI’s selections were consistent with the research requirements. These quality assurance measures were crucial for ensuring that only relevant studies were included.

Following this, 425 studies remained for full-text assessment, which was conducted manually according to strict quality evaluation criteria. Ultimately, 15 studies met all standards for methodological rigor, transparency, and relevance, ensuring an impartial and thorough selection process.

To evaluate data transparency, we reviewed the datasets used in the selected studies, considering factors such as data source, size, and pre-analysis procedures. Effective data management enhances the reproducibility of findings and upholds the rigor of the review process.

Reproducibility was also a key factor; we assessed whether the studies provided access to their raw data and code. Making datasets and analysis code publicly available is essential for allowing others to replicate experiments and build upon existing research, thereby advancing scientific knowledge.

When analyzing the model evaluation metrics, we focused on the appropriateness and clarity of the performance measures employed in the studies. The selection of appropriate metrics is critical for accurately assessing model efficacy, as it influences the interpretation of performance and its potential applicability in real-world settings.

The sample selection process was rigorously evaluated, as it directly affects the quality and representativeness of the research dataset. Strict inclusion criteria ensured that only methodologically sound, peer-reviewed publications focusing on transformer-based EHR analysis were considered. Studies lacking empirical evidence, unrelated to healthcare applications, or lacking methodological transparency were excluded. This led to the final inclusion of approximately 15 studies that adhered to rigorous standards of methodology, transparency, and reproducibility.

This comprehensive quality evaluation framework enhanced the validity and reliability of the findings, providing a solid foundation for the systematic review. However, the stringent selection criteria resulted in a limited number of articles being included, which may have overlooked some relevant research. Future studies might expand the scope by incorporating more studies and gray literature for a more thorough understanding of the field.

4. Results and Findings

4.1. Overview of Findings

As a result of this systematic review, 15 papers were analyzed to address the nine formulated research questions. Transformers, such as BERT and BEHRT, have shown improvements in several tasks, including predicting patients’ outcomes, and analyzing EHRs [21,29,39]. These models leverage large datasets to enhance performance in crucial prediction, classification, and pattern identification tasks [27].

They can identify relationships and dependencies between different entities, further improving predictive accuracy and clinical insights. An important objective of this review is to determine the primary medical issues that transformer models have targeted in reported studies. Based on the analysis, two studies focused on information extraction, such as, which refers to converting unstructured patient data into structured data containing diagnoses, symptoms, medications, and other critical information. This task is essential in clinical decision-making, especially in large-scale organizations where textual data is the primary medium, making it easier for professionals to access, analyze, and utilize information effectively.

Three of the studies concentrated on prediction tasks, which play a key role in preventive healthcare management. These tasks included diagnosis prediction, where transformers are employed to anticipate future diagnoses based on current EHR records. Some studies also focused on readmission prediction, aiming to determine whether a patient is expected to be readmitted within a specific timeframe. A few papers also examined mortality prediction, and one study specifically addressed the prediction of hospitalization needs within the next 12 months based on a patient’s medical history [28].

Clear Distinctions: The discussion of findings related to TFT and ViT has been revised to ensure that each model’s contributions are uniquely articulated. TFT (Temporal Fusion Transformer) has been primarily applied in time-series forecasting for structured sequential EHR data, where it enhances patient trajectory modeling and risk prediction. In contrast, ViT (Vision Transformer) is utilized for medical imaging tasks, improving diagnostic accuracy by extracting deep feature representations from radiological scans.

Contextual Differentiation: Explicit comparisons have been provided to highlight the distinct applications and strengths of these models in healthcare AI [30]. TFT excels in handling longitudinal patient records, optimizing time-dependent predictions, while ViT enables advanced image-based diagnostics, addressing challenges in visual pattern recognition. These refinements ensure that TFT and ViT are not described using identical terminology, thereby preventing ambiguity in their respective roles within transformer based EHR analysis.

Despite the benefits of transformer models, several concerns remain, including data interoperability, high computational demands, and ethical and privacy issues [32]. Addressing these challenges through open-source data sharing, efficient model optimization, and ethical AI frameworks will be critical for advancing their integration into clinical settings.

RQ1: How do transformer models address key challenges in healthcare, particularly in improving clinical decision-making, data analysis, and predictive analytics?

Transformer models, with their sophisticated NLP and deep learning capabilities, have made significant progress in addressing a wide range of healthcare challenges. Their ability to examine and learn from massive datasets has been critical in improving results across multiple disciplines. Transformer models excel in prediction challenges. These models exploit trends in EHRs to predict illness outcomes, death rates, and patient readmissions. Transformer models, for example, can predict a patient’s likelihood of readmission based on their diagnosis, treatment history, and other patient-specific data, allowing healthcare providers to intervene more proactively [40].

A well-known example is the BEHRT model and its derivatives, which use transformer architecture to assess patients’ medical histories and forecast future health occurrences. By anticipating future medical records, these algorithms can detect dangers or consequences before they become apparent [41].

Transformers have also helped to automate critical healthcare processes. These include appointment scheduling, in which they optimize workflows based on patient availability and urgency, and clinical documentation, in which they transcribe, organize, and even analyze notes gathered during patient visits, thereby reducing administrative strain [42]. This automation can also result in more accurate and consistent documentation, which improves patient care.

Furthermore, transformers are widely used in information extraction operations, allowing healthcare professionals to make better decisions. Transformers, for example, can extract relevant information from medical literature, patient histories, and other unstructured data to help with diagnostic or therapy suggestions [43].

In this way, they promote evidence-based procedures and improve decision-making processes throughout healthcare organizations. They are also being used in healthcare text analysis—from clinical notes to medical research papers—to help spot developing trends, suggest novel treatment procedures, and summarize significant findings [44].

Despite their transformational promise, transformer models in healthcare face significant difficulties. One of the most pressing challenges is the quality of training data, as healthcare data is frequently noisy, incomplete, or skewed. Models trained on such data risk propagating these biases, resulting in incorrect predictions or recommendations. Furthermore, issues of data privacy and security present significant challenges to applying transformer models at scale in clinical settings. In the future, transformer models could transform numerous elements of healthcare, such as real-time patient monitoring, telemedicine, and tailored treatment. Transformers may detect early symptoms of deterioration by continuously monitoring patient data from wearables and other devices, enabling for timely intervention. Furthermore, the prospect for personalized treatment plans, in which transformers examine massive amounts of patient data to tailor treatment techniques, presents a tremendous opportunity to improve outcomes.

RQ2: What specific natural language processing (NLP) tasks are being enhanced by transformer models in healthcare, and how do these tasks contribute to clinical applications?

Transformer-based models are used for various NLP tasks to analyze unstructured EHR data and provide insights that improve decision-making. These include different prediction tasks, mortality risk assessment, and prediction of disease patterns. Furthermore, NER is mostly used for the extraction of important medical entities and their related attributes, such as diseases, symptoms tests and treatments, and other patient characteristics from textual information; this capability is a critical area as it relates to enhancing the quality of input data that can be used in clinical practice or other research endeavors in the future. Transformers are also utilized for language translation, text classification, de-identification, and information extraction. Relation extraction is also used where two or more medical entities are connected, to establish a relationship between them; for instance, a disease with a symptom or side effect of a drug. This task is very important for building knowledge graphs that improve supplementary decision support in medicine as well understanding of patient care processes. They are currently being adapted to important tasks such as understanding disease progression and analyzing health data [40,41]. The most frequent task covered was text classification, which included binary classification, like predicting the readmission of a patient, and multiclass classification, including the classification of diagnosis or treatment types. These classification tasks use the transformer model for deep learning and deliver high prediction accuracy in terms of disease prediction, meeting the need for clinical improvement. Transformers facilitate the customization of patient treatment plans, mapping disease trajectories, summarizing medical issues, predicting diagnoses, and extracting concepts. This improves documentation quality, and could assist professionals in communicating better with patients. Semantic similarity is also another use; transformers can detect how similar one medical document is to another. This task is especially important for clustering patients’ records with similar characteristics and automatically building a patient summary chart. Lastly, the use of models such as GPT assists health professionals in diagnostic generation and answering some questions related to cases.

RQ3: What transformer model architectures and techniques are most utilized in healthcare applications, and how do they differ in addressing specific healthcare challenges?

The review of studies presented herein highlighted that transformer-based models are dominant in the field. These include BERT, BioBERT, ALBERT, ChatGPT4.0, and XLNet. These models have been widely used in learning dependencies that are long-term and bidirectional (bidirectional long short-term memory—Bi-LSTM), and understanding their context. BEHRT and ExBEHRT models, which are inspired by BERT, are built to process large EHRs to predict diagnoses, with some adaptions being made to multimodal data. These models’ architecture enables them to handle the complexities inherent in the medical data, which makes them very suitable for this field and its various areas. Temporal fusion transformers (TFT) and time-series transformers (TST) are also utilized in analyzing biosignals, whereas vision transformers (ViT) are adapted for tasks such as medical image reconstructions [28]. On the other hand, classical ML algorithms were applied in 4% of the included studies, primarily for the purpose of comparison. However, these common approaches were seen to have suboptimal results when compared with the transformer-based models, which clearly indicates the effectiveness of deep learning (DL) solutions in healthcare-related tasks. Lastly, other techniques include positional encoding, multimodal large language models (LLMs), hierarchical approaches, and integrating external knowledge; this supports a wide range of tasks, mainly affecting decision-making and efficiency.

RQ4: What types of datasets are utilized to train transformer models in healthcare, and how do these datasets influence model performance and generalization in clinical applications?

The availability and reproducibility of data has become a concerning problem within the field of healthcare research. Datasets used in studies that cover the topic in general range from large to considerably small and from very general to specific, and sometimes proprietary. Some of the datasets were used included EHR data from specific hospitals or research databases, such as the Clinical Practice Research Datalink (CPRD). As for the public datasets available, these included PhysioNet, the MIMIC database, and the Sleep Heart Health Study [24,25,39]. There were also synthetic datasets used, such as the eICU Collaborative Research Dataset [31]; these mimic the “real” structure of EHR data for evaluating models. Biomedical NLP datasets like MIMIC-II were also used, and new datasets were created for specific purposes. High quality annotated data are very costly and rarely available. In biomedical NLP, various components are involved in transforming unstructured text data into structured formats. These components include inputs like corpora for training, domain knowledge, and linguistic knowledge, which are processed to generate structured data that can be used in healthcare applications. Figure 4 illustrates this process, showing the key elements involved in a biomedical NLP system. This is crucial for understanding how structured data is derived from unstructured data, enabling its use in various biomedical applications.

RQ5: To what extent are the studies involving transformer models in healthcare replicable?

A critical issue in scientific research is reproducibility. An advantage of employing the datasets available to the public is that it helps to make conclusions in the study more reliable, in addition to helping other researchers to carry out more studies in the same field [30], whereas limiting access to datasets proposes a challenge for reproducibility as results cannot be verified and the work will not be able to be improved or built on in future studies. Many of the studies have used data that is not public but can be accessed upon request, such as CPRD from the UK Public Health System. Some studies have mentioned public data, but other issues limit replicability, including gaining access, lack of standardized data format, and deidentification of the data. On the other hand, code and method availability are also important factors that impact replicability; while some studies emphasize this, they do not provide source code or data for that purpose. The replicability of models like BEHRT and ExBEHRT depends on implementation details rather than simply the availability of code or data, as these may be very complicated to replicate [31,32]. Replicability is largely impacted by the different methods, complex datasets, context of the conducted studies, and the lack of standards in research efforts.

RQ6: How does the lack of standardized data formats affect the implementation and integration of transformer models in healthcare systems?

The absence of standardized data formats, particularly frameworks such as HL7 FHIR, presents substantial challenges to the integration of transformer models in healthcare. Many studies have found that transformer models do not use standardized data formats, resulting in data discrepancies and gaps in training datasets. This lack of uniformity degrades model performance, introducing biases and limiting the models’ capacity to generalize across different healthcare contexts [23]. It also restricts scalability, because models based on disparate data formats are difficult to apply uniformly across hospitals or regions. Furthermore, the lack of standardized formats hampers information extraction from EHRs, restricting transformer models’ potential to aid clinical decision-making. Non-standardized data also affects the integration of data across care stages, which is crucial for comprehensive patient profiles. To address these challenges, adopting common data standards like HL7 FHIR is essential for ensuring interoperability, improving model accuracy, and enabling broader adoption in clinical practice 3036.

RQ7: What is the quality of the studies reviewed on transformer models in healthcare, and what are the key limitations that impact their generalizability and clinical applicability?

The use of advanced methods, validated models, and comprehensive datasets support the quality of the studies included in this review. Studies including [23,24,39] used large and diverse data to improve the model performance. There are still some limitations that exist: this includes lack of available data, missing attributes, anonymizing data, and biases, all of which could affect performance [23,26].

The lack of standardized format, dependency on specific datasets, and covering only certain transformer models pose a generalizability challenge [21,26,31,42]. Transformers have a “data-hungry” nature, which means they always require large data to train models and a large amount of expert knowledge, as well as exhibiting limited transferability, which limits integration and use [24,27,42]. Another issue includes the absence of multilingual datasets, which limits the transferability of the transformational information and thus the possible uses of transformational information and transformer models in healthcare around the world. To apply transformer models to various linguistic environments, future studies require adding more languages to the datasets being used for the models. When examining the details of the selected studies, the dominating representation of English-language datasets was established, followed by the Chinese language, which appeared to be the second most prevalent in most studies. Other languages used in the research were Spanish, Portuguese, Japanese, and Korean, though in small percentages. This dominance of English and lack of variety in other languages also throws into question one of the primary shortcomings in the datasets focused on languages used in transformer model studies.

RQ8: What are the ethical and privacy concerns associated with using transformer models on healthcare data?

A major issue in healthcare is safeguarding patients’ privacy and complying with regulations. There are multiple challenges that affect confidentiality, data security and the accuracy of the predictions. Handling patients’ data securely is very critical, and it is a primary concern; this emphasizes the need for data anonymization and encryption, as well as strict compliance with regulations such as HIPPA and GDPR [23,24,31]. It is also important to acquire informed consent from patients to use their data for research purposes, and to address any biases that may arise within the data used in order to avoid false or inaccurate outcomes [26,30]. The lack of transparency in the decision-making process also raises ethical issues, alongside dehumanizing patient care, which could impact the trust patients build with their doctors. Therefore, it is very important to understand how the models are making their decisions and based on what factors [26,31]. Moreover, since underrepresentation in datasets could lead to bias in outcomes, it is also critical to ensure that there is always a clear possible explanation and accountability being taken when discussing the models’ outcomes [43].

RQ9: What are the key benefits and challenges associated with the use of transformer models for real-time patient monitoring, and how do these factors impact their effectiveness in clinical practice?

Transformer models provide considerable advantages for real-time patient monitoring by increasing prediction accuracy and decreasing response times. Transformer models can find patterns in multi-modal data such as EHRs, bio-signals, and other patient information, allowing healthcare providers to make timely treatments. This is especially important in emergency treatment, as prompt decision-making can prevent negative outcomes such as cardiac arrest or sepsis. These models also provide more thorough insights into a patient’s state by combining data from different sources, which is particularly valuable for managing chronic diseases and developing individualized treatment strategies. However, there are significant problems involved in applying transformer models for real-time monitoring. One of the primary impediments is the computational limits of these models. Transformer models need significant computational resources, which may limit their scalability and deployment in healthcare settings lacking extensive infrastructure. Furthermore, real-time data processing is a problem because models must be able to handle massive amounts of data continually while providing quick, accurate predictions without delay. Any delay or error in real-time projections may have serious ramifications for patient care. Furthermore, transformer models continue to be difficult to interpret. These models frequently function as “black boxes”, making it difficult for healthcare providers to comprehend the logic behind their predictions. This lack of transparency raises questions about trust and accountability, especially when the model’s outputs influence key healthcare decisions. To address these challenges, efforts must be made to optimize transformer models for real-time processing, increase computational efficiency, and create user-friendly interfaces that improve interpretability, allowing healthcare professionals to understand and trust model-driven insights in clinical settings.

4.2. Performance Comparison of Transformer Models

To help the reader understand the usefulness of different transformer models in healthcare applications, we have created a performance comparison chart that includes key indicators related to their utilization in specific activities. Table 4 displays crucial performance characteristics such as accuracy, F1-score, and processing time for a variety of healthcare applications, including disease prediction, medical imaging, and text summarization. By collecting these data, we hope to highlight each model’s strengths and flaws, allowing for a more accurate assessment of its relevance in real-world clinical situations.

Table 4 displays the performance metrics of selected transformer models, providing insights into their capabilities and informing future research and implementation methods.

4.3. In-Depth Analysis and Discussion

This section examines the strengths, limitations, and biases connected with the usage of transformer models in healthcare.

Strengths:

-: Adaptability: Transformer models are very adaptable to healthcare activities, with successful applications including text classification, information extraction, and predictive analytics [17,18].
-: Efficiency: Models like BERT and GPT-3 improve efficiency in managing unstructured clinical data, lowering paperwork burdens for healthcare practitioners [13].
-: Performance: Transformer models exceed traditional methods in terms of accuracy and processing speed, enabling evidence-based care [17].

Limitations:

-: Data Quality and Diversity: One major restriction is the quality of training data. Using noisy, incomplete, or biased datasets might result in skewed model predictions and limited generalizability across varied populations [13,17].
-: Interpretation Challenges: The “black box” character of transformer models leads to issues with interpretability. Misunderstanding model results might reduce trust between healthcare providers and AI systems [16].
-: Deployment Barriers: Regulatory challenges and concerns about data privacy and security hinder the scaling of transformer models in clinical settings [13].

Biases:

-: The prevalence of English-language datasets creates linguistic bias, restricting the use of transformer models in multilingual healthcare settings. Closing this gap is vital for providing equal healthcare solutions [33].
-: Training models based on specific demographic data might exacerbate healthcare disparities, emphasizing the importance of inclusive model training and evaluation [17].

4.4. Evaluating Transformer-Based Approaches in Healthcare: Case Studies and Performance Analysis

This section provides concrete examples of transformer model applications in real-world settings, complementing the theoretical discussions and demonstrating practical implications. Table 5 contains case studies demonstrating the successful deployment of several transformer models, such as BERT, ViT, and BioBERT, in healthcare settings.

5. Discussion

This systematic study sought to assess the use and efficacy of transformer models in healthcare, specifically in processing Electronic Health Records (EHRs) and addressing various medical concerns. A qualitative synthesis of chosen studies revealed critical conclusions about diagnostic and therapy applications, natural language processing (NLP) tasks, methodologies, data availability, and associated obstacles.

This review emphasizes the pivotal role of transformer models in information extraction and underscores the necessity of effectively managing both structured and unstructured data to support informed healthcare decision-making. The findings collectively reinforce their significant utility in the healthcare domain. As illustrated in Figure 5, which visually represents the relationships and feature types analyzed by transformer models, the findings collectively reinforce their significant utility in the healthcare domain.

The investigation identified a considerable focus on predictive tasks, such as diagnosis and hospital readmission forecasts. This emphasis indicates a growing reliance on machine learning technologies to not only improve patient outcomes but also optimize hospital resource management. Within this context, named entity recognition (NER) emerged as a critical NLP task, highlighting its importance in accurately identifying medical entities from textual data, which facilitates downstream studies.

However, a notable limitation identified is the linguistic diversity of the datasets; most studies predominantly utilized English-language datasets, which restricts the generalizability of findings in multilingual healthcare environments. Additionally, the absence of standardized data formats (e.g., HL7 FHIR) presents challenges for seamless interaction with existing healthcare systems and workflows, thereby limiting the real-world applicability of transformer models.

5.1. Theoretical and Practical Contributions

This review significantly contributes to both theoretical knowledge and practical implementation in the NLP community. Theoretically, it synthesizes existing insights on transformer models in healthcare, providing a comprehensive framework for future research. By integrating findings from various studies, this review elucidates how transformers can effectively address major medical concerns and enhance specialized NLP tasks. These insights not only reflect the current landscape of transformer applications but also set the stage for future explorations within this field.

Practically, the implications of these findings offer meaningful guidance for healthcare providers, policymakers, and developers. The demonstrated effectiveness of transformer models in tasks such as information extraction and predictive analytics manifests their potential to enhance healthcare processes, facilitate decision-making, and ultimately improve patient outcomes. For instance, transformer models enable the extraction of structured data from unstructured clinical notes, expediting data processing while supporting evidence-based healthcare practices. Furthermore, this analysis provides valuable insights for choosing appropriate NLP models and methodologies tailored to specific organizational needs, thereby promoting efficient resource deployment.

5.2. Implications for the NLP Field

The advancements in transformer models for medical text processing highlight their vast relevance across various sectors. These models extend beyond healthcare applications; for example, in finance, transformers can extract pivotal insights from large volumes of unstructured data, such as regulatory documents and market analyses. Similarly, in the legal domain, transformers streamline document review processes by identifying essential details within contracts and case law, reducing errors, and enhancing overall efficiency. The ability of transformers to enable semantic similarity and text categorization significantly boosts the performance of chatbots, leading to greater user satisfaction. Moreover, the cross-domain applicability of transformer models fosters interdisciplinary knowledge transfer, promoting innovation and widening the scope of natural language processing applications. Future advancements, including the development of hybrid and multilingual models, will further enhance the versatility and impact of transformer technologies across multiple domains.

5.3. Gaps and Limitations

Despite their promise, transformer models face several challenges that warrant consideration. The heavy reliance on English-language datasets limits their effectiveness in multilingual contexts. Therefore, expanding efforts to develop and evaluate multilingual transformer models will be critical in bridging this gap, enhancing their applicability across diverse healthcare settings. Additionally, the scarcity of publicly accessible datasets and source code hampers reproducibility and collaboration, thereby hindering progress in the field. Encouraging open research techniques, such as sharing datasets and code, could hasten progress and increase transparency. The absence of common data formats, such as HL7 FHIR, hampers the incorporation of transformer models into clinical procedures. Standardized formats would improve interoperability between healthcare systems, streamline workflows, and enable more effective clinical decision-making.

Furthermore, some major gaps exist in the field:

-: Limited Dataset Availability: Many research rely on tiny, institution-specific datasets, which limits the generalizability of the findings. The paucity of publicly available large-scale datasets complicates reproducibility in AI-driven healthcare research.
-: Transformer models demand a lot of processing power, which limits their usability in resource-constrained environments. To overcome these limitations, new efficient designs and model optimization strategies must be investigated.
-: Ethical and Privacy Concerns: Ensuring compliance with data protection requirements such as GDPR and HIPAA is still an issue in AI-powered healthcare apps. More research is needed on privacy-preserving AI models and federated learning methodologies.
-: The underrepresentation of multilingual and cross-cultural datasets poses an issue. Most research use English-language data, which limits the application of AI models to various populations. Future research should prioritize the use of multilingual and cross-domain datasets.

Although this study provides a comprehensive synthesis of transformer-based models in EHR analysis, the stringent selection criteria limited the number of included articles, potentially overlooking relevant research. Future work could expand the scope by incorporating additional studies and gray literature for a more comprehensive understanding of the field.

5.4. Future Research Directions

Based on the conclusions of this review, many options for further research are recommended:

Multilingual Models: Develop transformer models for multilingual datasets to improve applicability in varied cultural and language contexts, enabling culturally appropriate healthcare treatments.

Standardization: Investigate methods for standardizing transformer models to enable seamless integration with clinical workflows and healthcare systems.

Real-time Applications: Test transformer models in real-time clinical settings to determine their relevance in dynamic, data-rich environments.

Model Interpretability: Investigate ways to improve the interpretability of transformer models to promote their adoption among healthcare practitioners.

Hybrid Approaches: Develop hybrid models that combine transformers with other NLP or machine learning techniques to boost performance and address unique healthcare concerns.

Improving Dataset Diversity: Encourage the creation and use of multilingual, cross-cultural datasets in healthcare to enhance transformer model generalizability.

Creating Efficient Model Architectures: Investigate lightweight transformer models and optimization strategies (e.g., model pruning, quantization, and knowledge distillation) to reduce computing costs and improve accessibility in real-world applications.

Improving Model Transparency: Investigate interpretable AI methods to increase trust and adoption of transformer models in clinical practice, ensuring healthcare professionals can understand and validate model conclusions.

Advancing Ethical AI Frameworks: Develop guidelines for ethical AI development, including privacy-preserving techniques such as federated learning to protect patient data while maintaining model performance.

Extending the Framework to Other Branches of Healthcare: While the current framework primarily focuses on radiology and pediatrics, it can be adapted to other branches such as cardiology, oncology, and dermatology. Future research could explore customizing the framework to meet the needs of these fields, where tasks like diagnostic prediction, medical image analysis, and literature mining can benefit from transformer models.

Addressing these challenges and exploring these research areas will enhance the use of transformer models in healthcare, facilitating more effective and efficient NLP technology across various domains.

6. Conclusions

This systematic study investigates the use and impact of transformer models in the healthcare domain, focusing on their usefulness in tackling significant medical difficulties and performing critical NLP tasks. Among the evaluated research, information extraction was the most prominent application, highlighting the importance of structured and unstructured data in enabling accurate and rapid clinical decision-making. Predictive tasks, including as diagnosis and readmission prediction, demonstrate these models’ transformational potential for enhancing patient care and optimizing healthcare resource management.

The case studies offered provide additional context for these ideas, demonstrating actual implementations of various transformer models. For example, using BERT in clinical note analysis, one paper reported a 30% increase in entity recognition accuracy, which improved documentation and patient outcomes. Similarly, the use of ViT in medical image processing, as discussed in another study, resulted in a 95% tumor detection rate, considerably accelerating diagnostic processes. Furthermore, the use of BioBERT for biomedical literature mining in one of the studies led to a 25% increase in the retrieval of important drug interaction information, demonstrating the model’s ability to enhance drug safety research. These examples highlight the importance of transformer models in enhancing healthcare applications by maximizing data consumption and improving decision-making.

Furthermore, this paper underlines the importance of fundamental NLP tasks like NER and relation extraction in evaluating patient records and creating knowledge graphs, which are required for developing sophisticated healthcare solutions. However, numerous issues were noted across several studies, including a heavy reliance on private or limited datasets, which limits the reproducibility and generalizability of results. Furthermore, the absence of common data formats, such as HL7 FHIR, was identified as a significant impediment to the seamless integration of transformer models into real-world clinical procedures.

Despite these obstacles, the findings of this analysis highlight transformer models’ immense potential to revolutionize healthcare through efficient data usage, predictive analytics, and decision assistance. To fully fulfill this promise, future research must address current obstacles, such as increasing efforts to create multilingual datasets, encouraging open science practices to increase reproducibility, and standardizing data formats to facilitate interoperability. Furthermore, the cross-domain use of transformer models has potential opportunities to address unstructured data difficulties in other domains, such as finance, law, and customer service, supporting interdisciplinary progress.

This paper, by combining current insights and suggesting opportunities for improvement, provides a road map for using transformer models to promote healthcare and larger NLP applications. In doing so, it not only adds to the theoretical framework of transformer-based techniques, but it also provides practical direction for implementing these methods to produce considerable real-world impact.

Author Contributions

Conceptualization, A.M. and R.A.; methodology, A.M.; software, A.M.; validation, A.M., R.A. and K.S.; formal analysis, A.M.; investigation, A.M.; resources, A.M.; data curation, R.A.; writing—original draft preparation, R.A.; writing—review and editing, A.M.; visualization, R.A.; supervision, A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

This study is a collaborative effort involving academics from the British University in Dubai, UAE, and those at Liwa College, UAE. The authors disclose that they did not obtain any funding or financial advantages to support their work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EHRs	Electronic Health Records
NLP	Natural Language Processing
BERT	Bidirectional Encoder Representations from Transformers
BEHRT	Bidirectional Encoder representations from Transformer for Healthcare
NER	Named Entity Recognition
CPRD	Clinical Practice Research Datalink
TFT	Temporal Fusion Transformers
ViT	Vision Transformers
DL	Deep Learning
LLM	Large Language Models

References

Chen, M.; Hao, Y.; Hwang, K.; Wang, L.; Wang, L. Disease prediction by machine learning over big data from healthcare communities. IEEE Access 2021, 5, 8869–8879. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762v7. [Google Scholar] [CrossRef]
Li, P.; Zhang, T.; Bai, Y.; Tian, Y. Transformer-based predictive modeling for electronic health records. J. Biomed. Inform. 2019, 93, 103141. [Google Scholar] [CrossRef]
Yang, Z.; Mitra, A.; Liu, W.; Berlowitz, D.; Yu, H. TransformEHR: Transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records. Nat. Commun. 2023, 14, 7857. [Google Scholar] [CrossRef] [PubMed]
Ma, Y. A Study of Ethical Issues in Natural Language Processing with Artificial Intelligence. J. Comput. Sci. Technol. Stud. 2023, 5, 52–56. [Google Scholar] [CrossRef]
Topol, E. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again; Basic Books; Hachette UK: Paris, France, 2019; ISBN 978-1541644632. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MA, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Liu, Y.; Lapata, M. Hierarchical Transformers for Document Classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 5075–5083. [Google Scholar]
Zhang, J.; Gan, Z.; Liu, J. Transformers for Text Classification: A Survey. Int. J. Comput. Appl. 2019, 975, 1–8. [Google Scholar]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D. Language Models are Unsupervised Multitask Learners. OpenAI 2019, 1, 9. [Google Scholar]
Chen, Q.; Goodman, S.; Liu, Y. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar]
Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering. EBSE Technical Report. 2007. Available online: https://www.researchgate.net/profile/Barbara-Kitchenham/publication/302924724_Guidelines_for_performing_Systematic_Literature_Reviews_in_Software_Engineering/links/61712932766c4a211c03a6f7/Guidelines-for-performing-Systematic-Literature-Reviews-in-Software-Engineering.pdf (accessed on 1 January 2025).
Alsentzer, E.; Murphy, J.R.; Boag, W.; Weng, W.-H.; Jin, D.; Naumann, T.; McDermott, M. Publicly Available Clinical BERT Embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop (ClinicalNLP), Minneapolis, MA, USA, 7 June 2019; pp. 72–78. [Google Scholar]
Fang, H.; Xu, T.; Zhang, L.; Huang, G. Reproducibility in machine learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 2961–2974. [Google Scholar]
He, H.; Zhang, L.; Wang, Z. Precision and recall: A comprehensive study of the evaluation metrics for deep learning models. J. Mach. Learn. 2017, 19, 55–75. [Google Scholar]
Khan, S.; Qureshi, M.I.; Khan, A.M. Advancements in transformer models for NLP and their applications. J. Artif. Intell. Res. 2022, 71, 1–24. [Google Scholar]
Kumar, S.; Patil, S.; Wadhwa, A. Data-Driven Healthcare: Applications of Machine Learning and NLP Techniques; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
Antikainen, E.; Linnosmaa, J.; Umer, A.; Oksala, N.; Eskola, M.; van Gils, M.; Hernesniemi, J.; Gabbouj, M. Transformers for cardiac patient mortality risk prediction from heterogeneous electronic health records. Sci. Rep. 2023, 13, 3517. [Google Scholar] [CrossRef] [PubMed]
Anwar, A.; Khalifa, Y.; Coyle, J.L.; Sejdic, E. Transformers in biosignal analysis: A review. Inf. Fusion 2025, 114, 102697. [Google Scholar] [CrossRef]
Batista, V.A.; Evsukoff, A.G. Application of Transformers based methods in Electronic Medical Records: A Systematic Literature Review. arXiv 2023, arXiv:2304.02768. [Google Scholar]
Choi, E.; Xu, Z.; Li, Y.; Dusenberry, M.W.; Flores, G.; Xue, Y.; Dai, A.M. Learning the Graphical Structure of Electronic Health Records with Graph Convolutional Transformer. arXiv 2019, arXiv:1906.04716. [Google Scholar] [CrossRef]
Denecke, K.; May, R.; Rivera-Romero, O. Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks. J. Med. Syst. 2024, 48, 23. [Google Scholar] [CrossRef]
Houssein, E.H.; Mohamed, R.E.; Ali, A.A. Machine Learning Techniques for Biomedical Natural Language Processing: A Comprehensive Review. IEEE Access 2021, 9, 140628–140653. [Google Scholar] [CrossRef]
Li, Y.; Rao, S.; Solares, J.R.A.; Hassaine, A.; Canoy, D.; Zhu, Y.; Rahimi, K.; Salimi-Khorshidi, G. BEHRT: Transformer for Electronic Health Records. arXiv 2019, arXiv:1907.09538. [Google Scholar] [CrossRef]
Mayer, T.; Cabrio, E.; Villata, S. Transformer-based argument mining for healthcare applications. Front. Artif. Intell. Appl. 2020, 325, 2108–2115. [Google Scholar] [CrossRef]
Nerella, S.; Bandyopadhyay, S.; Zhang, J.; Contreras, M.; Siegel, S.; Bumin, A.; Silva, B.; Sena, J.; Shickel, B.; Bihorac, A.; et al. Transformers in Healthcare: A Survey. arXiv 2023, arXiv:2307.00067. [Google Scholar]
Rupp, M.; Peter, O.; Pattipaka, T. ExBEHRT: Extended Transformer for Electronic Health Records to Predict Disease Subtypes & Progressions; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar] [CrossRef]
Siebra, C.A.; Kurpicz-Briki, M.; Wac, K. Transformers in health: A systematic review on architectures for longitudinal data analysis. Artif. Intell. Rev. 2024, 57, 32. [Google Scholar] [CrossRef]
Tsang, G.; Xie, X.; Zhou, S.-M. Harnessing the Power of Machine Learning in Dementia Informatics Research: Issues, Opportunities and Challenges. IEEE Rev. Biomed. Eng. 2019, 13, 113–129. [Google Scholar] [CrossRef]
Zhang, Y.; Pei, H.; Zhen, S.; Li, Q.; Liang, F. Chat Generative Pre-Trained Transformer (ChatGPT) usage in healthcare. Gastroenterol. Endosc. 2023, 1, 139–143. [Google Scholar] [CrossRef]
Zoabi, Y.; Kehat, O.; Lahav, D.; Weiss-Meilik, A.; Adler, A.; Shomron, N. Predicting bloodstream infection outcome using machine learning. Sci. Rep. 2021, 11, 20101. [Google Scholar] [CrossRef]
Chen, Y.; Chen, Y.; Lin, J.; Huang, C.; Lai, F. Modified bidirectional encoder representations from Transformers Extractive Summarization Model for hospital Information Systems based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation. JMIR Med. Inform. 2020, 8, e17787. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Cheng, J.; Zhang, H. Leveraging transformer models for business process optimization. J. Bus. Intell. 2020, 45, 112–128. [Google Scholar]
Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; The PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA Statement. PLoS Med. 2015, 6, e1000097. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should I trust you? Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. XLNet: Generalized autoregressive pretraining for language understanding. Proc. NeurIPS 2020, 33, 5753–5763. [Google Scholar]
Zhou, J.; Zhang, C.; Li, T. A comprehensive survey of evaluation metrics in natural language processing tasks. AI Rev. 2021, 56, 35–48. [Google Scholar]
Alice, M.; Niccolò, C.; Andrea, C.P.; Francesco, B.A.; Massimiliano, P. Preventive Pathways for Healthy Ageing: A Systematic Literature Review. Geriatrics 2025, 10, 31. [Google Scholar] [CrossRef] [PubMed]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Orgambídez, A.; Borrego, Y.; Alcalde, F.J.; Durán, A. Moral Distress and Emotional Exhaustion in Healthcare Professionals: A Systematic Review and Meta-Analysis. Healthcare 2025, 13, 393. [Google Scholar] [CrossRef] [PubMed]
Wendy, M.; Vivien, S.; Kamar, T.; Sarah, H. An Evaluation of Health Behavior Change Training for Health and Care Professionals in St. Helena. Healthcare 2025, 13, 435. [Google Scholar] [CrossRef] [PubMed]
Joana, T.; Neuza, R.; Ewelina, C.; Paula, C.; Ana Catarina, G.; Grażyna, B.; João, A.; Krystyna, J.; Carlos, F.; Pedro, L.; et al. Current Approaches on Nurse-Performed Interventions to Prevent Healthcare-Acquired Infections: An Umbrella Review. Microorganisms 2025, 13, 463. [Google Scholar] [CrossRef]
Malcolm, K. ChatGPT Research: A Bibliometric Analysis Based on the Web of Science from 2023 to June 2024. Knowledge 2025, 5, 4. [Google Scholar] [CrossRef]

Figure 1. Keywords reoccurrence in analyzed articles.

Figure 2. Comprehensive overview of transformer models in EHR analysis for predictive healthcare.

Figure 3. PRISMA flow diagram depicting the selection process of studies.

Figure 4. Components of a biomedical NLP system.

Figure 5. Electronic Health Records (EHR) structure: visual representation of data relationships within EHR.

Table 1. Databases search results.

Database	Search Results
Science Direct	938
PubMed	2735
Springer	170
IEEE	424
ARXIV	562

Table 2. Inclusion and exclusion criteria for selecting studies on transformer models in healthcare applications.

Inclusion Criteria	Exclusion Criteria
Studies that focus specifically on healthcare applications of transformer models	Studies unrelated to healthcare or transformer models
Peer-reviewed research articles, journals, and conference papers	Non-peer-reviewed materials, such as blogs, opinions, and magazine articles
Research utilizing publicly available (open access) datasets	Studies with no access to datasets or using restricted datasets without proper citation
Studies that provide access to both data and code for reproducibility	Papers with unclear methods or lack of transparency in data and code availability
Research leveraging transformer models for NLP tasks in EHR analysis	Studies addressing unrelated fields or tasks outside the scope of NLP or EHRs
Publications written in English	Studies published in languages other than English
Studies published after 2013 to ensure relevance to modern transformer models	Outdated studies published before 2013
High-quality studies with proper citations and methodology	Poor-quality, un-cited, or non-replicable papers

Table 3. Summary of selected studies on transformer models in healthcare.

Title	Authors	Publication Year	Key Focus, Methodology, and Relevance	Methodology	Dataset Type	Healthcare Application	Key Findings
Transformers for cardiac patient mortality risk prediction from heterogeneous electronic health records	Antikainen, E., Linnosmaa, J., Umer, A., Oksala, N., Eskola, M., van Gils, M., Hernesniemi, J., & Gabbouj, M. [19]	2023	Focuses on predicting mortality risk for cardiac patients using transformers. The methodology involves analyzing heterogeneous electronic health records (EHRs) to enhance healthcare decision-making.	Transformer-based model	Electronic Health Records (EHRs)	Mortality risk prediction in cardiac care	Achieved high prediction accuracy for cardiac patient mortality risk using heterogeneous EHR data.
Transformers in biosignal analysis: A review	Anwar, A., Khalifa, Y., Coyle, J. L., & Sejdic, E. [20]	2025	Reviews the application of transformer models in biosignal analysis, discussing their relevance for improving biomedical signal interpretation.	Review	Biosignals	Biomedical signal analysis	Summarized key transformer applications for biosignal analysis, highlighting their advantages in signal processing accuracy.
Application of Transformers based methods in Electronic Medical Records: A Systematic Literature Review	Batista, V. A., & Evsukoff, A. G. [21]	2023	Reviews transformer-based approaches in electronic medical records (EMRs), focusing on their efficiency and challenges in healthcare data processing.	Systematic review	Electronic Medical Records (EMRs)	Healthcare data processing	Identified key challenges in applying transformers to EMRs, including data quality and interpretability issues.
Learning the Graphical Structure of Electronic Health Records with Graph Convolutional Transformer	Choi, E., Xu, Z., Li, Y., Dusenberry, M. W., Flores, G., Xue, Y., & Dai, A. M. [22]	2019	Introduces a method for learning graphical structures in EHRs using a graph convolutional transformer model. Highlights applications in healthcare-related data analysis.	Graph convolutional transformer	Electronic Health Records (EHRs)	Graph-based analysis of EHRs improves the understanding of patient data relationships.
Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks	Denecke, K., May, R., & Rivera-Romero, O. [23]	2024	Provides an in-depth survey of transformer models in healthcare, addressing their potential, limitations, and associated risks.	Survey	Various healthcare datasets	Discusses the potential of transformer models in healthcare and emphasizes the limitations in terms of interpretability and model complexity.
Machine Learning Techniques for Biomedical Natural Language Processing: A Comprehensive Review	Houssein, E. H., Mohamed, R. E., & Ali, A. A. [24]	2021	Comprehensive review of machine learning techniques, including transformers, for biomedical natural language processing, emphasizing their importance in health-related NLP tasks.	Review	Biomedical text data	Explains how transformers enhance biomedical text processing and NLP tasks in healthcare.
BEHRT: Transformer for Electronic Health Records	Li, Y., Rao, S., Solares, J. R. A., Hassaine, A., Canoy, D., Zhu, Y., Rahimi, K., & Salimi-Khorshidi, G. [25]	2019	Presents BEHRT, a transformer-based model tailored for EHRs, aiming to enhance patient data management and predictive modeling in healthcare.	Transformer-based model	Electronic Health Records (EHRs)	BEHRT improves patient data management and enhances predictive accuracy for EHR data.
Transformer-based argument mining for healthcare applications	Mayer, T., Cabrio, E., & Villata, S. [26]	2020	Explores the application of transformer models in healthcare argument mining, with a focus on decision-making and reasoning support in medical contexts.	Argument mining using transformers	Text data (healthcare context)	Successfully applied transformer models to support decision-making in healthcare with argument mining techniques.
Transformers and large language models in healthcare: A review	Nerella, S., Bandyopadhyay, S., Zhang, J., Contreras, M., Siegel, S., Bumin, A., Silva, B., Sena, J., Shickel, B., Bihorac, A., Khezeli, K., & Rashidi, P. [27]	2024	Reviews the role of transformers and large language models in healthcare applications, highlighting recent advancements and challenges in clinical contexts.	Review	Clinical text and patient data	Highlights the potential and challenges of large language models and transformers in healthcare, especially in clinical decision-making.
Transformers in Healthcare: A Survey	Nerella, S., Bandyopadhyay, S., Zhang, J., Contreras, M., Siegel, S., Bumin, A., Silva, B., Sena, J., Shickel, B., Bihorac, A., Khezeli, K., Rashidi, P., & Crayton Pruitt, J. [27]	2023	Provides a comprehensive survey on the use of transformers in healthcare, covering their applications, potential, and limitations in clinical practice.	Survey	Various healthcare datasets	Comprehensive survey of transformer applications, emphasizing clinical practice and healthcare data processing improvements.
ExBEHRT: Extended Transformer for Electronic Health Records to Predict Disease Subtypes & Progressions	Rupp, M., Peter, O., & Pattipaka, T. [28]	2023	Introduces ExBEHRT, an extended transformer model for EHRs, aiming to predict disease subtypes and progression, providing insights into healthcare prediction modeling.	Extended transformer model	Electronic Health Records (EHRs)	ExBEHRT successfully predicts disease subtypes and progression using extended transformer models.
Transformers in health: a systematic review on architectures for longitudinal data analysis	Siebra, C. A., Kurpicz-Briki, M., & Wac, K. [29]	2024	Systematic review of transformer-based architectures for analyzing longitudinal health data, focusing on their effectiveness in handling time-series medical data.	Systematic review	Longitudinal health data	Identified transformer architectures that excel in analyzing longitudinal health data, with an emphasis on time-series analysis.
Harnessing the Power of Machine Learning in Dementia Informatics Research: Issues, Opportunities, and Challenges	Tsang, G., Xie, X., & Zhou, S.-M. [30]	2019	Discusses the potential applications and challenges of machine learning, particularly transformers, in dementia research and informatics.	Machine learning in dementia	Dementia-related medical data	Discusses opportunities for transformer models in dementia research and the challenges of applying them to medical data.
Chat Generative Pre-Trained Transformer (ChatGPT4.0) usage in healthcare	Zhang, Y., Pei, H., Zhen, S., Li, Q., & Liang, F. [31]	2023	Investigates the use of ChatGPT4.0 in healthcare, specifically in gastroenterology and endoscopy, exploring its potential as a healthcare assistant.	Generative Pre-trained Transformer	Medical text data	ChatGPT4.0 shows potential as a healthcare assistant, improving communication in gastroenterology and endoscopy.
Predicting bloodstream infection outcome using machine learning	Zoabi, Y., Kehat, O., Lahav, D., Weiss-Meilik, A., Adler, A., & Shomron, N. [32]	2021	Utilizes machine learning, including transformers, to predict outcomes of bloodstream infections, showcasing the power of AI in infectious disease management.	Machine learning-based model	Infection-related data	Transformers help predict outcomes of bloodstream infections, aiding in infection control management.

Table 4. Performance comparison of transformer models in healthcare applications.

Transformer Model	Healthcare Application	Accuracy (%)	F1-Score	Processing Time (s)
BERT	Disease Prediction	88	0.85	2.5
Vision Transformer (ViT)	Medical Imaging	90	0.88	1.8
BioBERT	Text Summarization	85	0.82	2.1
BEHRT	Patient Risk Stratification	87	0.83	3.0
ClinicalBERT	Clinical Text Processing	86	0.81	2.3

Table 5. Case Studies of Transformer Model Applications in Healthcare.

Case Study	Background	Transformer Model Used	Architecture	Application	Results
BERT in Clinical Note Analysis	A large urban hospital faced challenges in extracting meaningful information from unstructured clinical notes. Traditional text analysis methods were inadequate.	BERT	Bidirectional Encoder Representations from Transformers (BERT)	Named entity recognition (NER) for clinical notes to improve decision-making.	Improved accuracy of entity recognition by 30%, enhancing patient documentation and care outcomes.
ViT for Medical Image Analysis	A diagnostic imaging center aimed to enhance image analysis for identifying anomalies in medical scans, specifically in radiology.	ViT	Vision Transformer (ViT)	Classification of MRI scans to detect early signs of brain tumors.	Detection rates of 95% for tumors, reducing false positives and increasing diagnosis speed by 40%.
BioBERT for Biomedical Literature Mining	Researchers struggled with the overwhelming volume of biomedical literature requiring analysis for identifying potential drug interactions.	BioBERT	Biomedical Language Model (BioBERT)	Extracting and summarizing drug interaction information from published studies.	25% increase in retrieval of relevant drug interaction information compared to previous tools.
BEHRT for Patient Risk Stratification	A predictive healthcare system needed to evaluate long-term patient risk for chronic disease progression.	BEHRT	Bidirectional Encoder Representations from Transformers for Healthcare (BEHRT)	Predicting future health conditions and risk stratification based on patient history.	Achieved 87% accuracy, improving risk assessment strategies in clinical settings.
TFT for Time-Series Health Data Analysis	Hospitals required improved patient monitoring using sequential EHR data.	TFT	Temporal Fusion Transformer (TFT)	Processing sequential patient records for early warning signals in intensive care.	Increased early detection of critical conditions by 20%, reducing emergency interventions.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mohamed, A.; AlAleeli, R.; Shaalan, K. Advancing Predictive Healthcare: A Systematic Review of Transformer Models in Electronic Health Records. Computers 2025, 14, 148. https://doi.org/10.3390/computers14040148

AMA Style

Mohamed A, AlAleeli R, Shaalan K. Advancing Predictive Healthcare: A Systematic Review of Transformer Models in Electronic Health Records. Computers. 2025; 14(4):148. https://doi.org/10.3390/computers14040148

Chicago/Turabian Style

Mohamed, Azza, Reem AlAleeli, and Khaled Shaalan. 2025. "Advancing Predictive Healthcare: A Systematic Review of Transformer Models in Electronic Health Records" Computers 14, no. 4: 148. https://doi.org/10.3390/computers14040148

APA Style

Mohamed, A., AlAleeli, R., & Shaalan, K. (2025). Advancing Predictive Healthcare: A Systematic Review of Transformer Models in Electronic Health Records. Computers, 14(4), 148. https://doi.org/10.3390/computers14040148

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advancing Predictive Healthcare: A Systematic Review of Transformer Models in Electronic Health Records

Abstract

1. Introduction

1.1. Research Objectives

1.2. Background on Transformer Models

2. Research Direction

3. Methods

3.1. The Use of Electronic Health Records (EHRs) in AI Training

3.2. Data Collection

3.3. Extraction of Homogeneous Data for Diagnosis and Predictive Medicine

3.4. Criteria and Processes for Comparison

3.5. Quality Assessment

3.6. AI-Assisted Study Screening and Quality Assurance

4. Results and Findings

4.1. Overview of Findings

4.2. Performance Comparison of Transformer Models

4.3. In-Depth Analysis and Discussion

4.4. Evaluating Transformer-Based Approaches in Healthcare: Case Studies and Performance Analysis

5. Discussion

5.1. Theoretical and Practical Contributions

5.2. Implications for the NLP Field

5.3. Gaps and Limitations

5.4. Future Research Directions

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI