Next Article in Journal
The Survival Rate of Living-Donor Liver Transplantation Between Same-Sex and Opposite-Sex Recipients
Previous Article in Journal
Lung Ultrasound in Critical Care: A Narrative Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Ensemble Patient Graph Framework for Predictive Modelling from Electronic Health Records and Medical Notes

1
Department of Computer Science and Engineering, CEG Campus, Anna University, Chennai 600025, India
2
Centre for Cyber Security, Department of Computer Science and Engineering, CEG Campus, Anna University, Chennai 600025, India
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Diagnostics 2025, 15(6), 756; https://doi.org/10.3390/diagnostics15060756
Submission received: 4 February 2025 / Revised: 1 March 2025 / Accepted: 12 March 2025 / Published: 18 March 2025

Abstract

:
Objective: Electronic health records (EHRs) are becoming increasingly important in both academic research and business applications. Recent studies indicate that predictive tasks, such as heart failure detection, perform better when the geometric structure of EHR data, including the relationships between diagnoses and treatments, is considered. However, many EHRs lack essential structural information. This study aims to improve predictive accuracy in healthcare by constructing a Patient Knowledge Graph Ensemble Framework (PKGNN) to analyse ICU patient cohorts and predict mortality and hospital readmission outcomes. Methods: This study utilises a cohort of 42,671 patients from the MIMIC-IV dataset to build the PKGNN framework, which consists of three main components: (1) medical note extraction, (2) patient graph construction, and (3) prediction tasks. Advanced Natural Language Processing (NLP) models, including Clinical BERT, BioBERT, and BlueBERT, extract and integrate semantic representations from discharge summaries into a patient knowledge graph. This structured representation is then used to enhance predictive tasks. Results: Performance evaluations on the MIMIC-IV dataset indicate that the PKGNN framework outperforms state-of-the-art baseline models in predicting mortality and 30-day hospital readmission. A thorough framework analysis reveals that incorporating patient graph structures improves prediction accuracy. Furthermore, an ensemble model enhances risk prediction performance and identifies crucial clinical indicators. Conclusions: This study highlights the importance of leveraging structured knowledge graphs in EHR analysis to improve predictive modelling for critical healthcare outcomes. The PKGNN framework enhances the accuracy of mortality and readmission predictions by integrating advanced NLP techniques with patient graph structures. This work contributes to the literature by advancing knowledge graph-based EHR analysis strategies, ultimately supporting better clinical decision-making and risk assessment.

1. Introduction

Predicting patient outcomes [1,2], especially mortality in critical care settings, has long been a priority in medical research. Numerous clinical parameters have been identified as significant predictors. The length of stays in the intensive care unit (ICU) has been associated with severe circumstances such as mechanical ventilation (MV) and psychiatric medication poisoning, in addition to biochemical indicators. Furthermore, early predicting mechanical ventilation duration for patients suffering from acute respiratory distress syndrome (ARDS) can enhance risk stratification and improve care strategies [3].
Predictive models for assessing risks in critically ill patients, such as hypoglycemia in septic patients or post-surgical outcomes following coronary artery bypass grafting (CABG), are increasingly being explored [4,5]. While CABG remains a critical intervention for patients with coronary atherosclerosis, the long-term prognosis remains uncertain, making the development of predictive models essential for improving patient survival probabilities.
Anyway, the complexity of disorders like pulmonary hypertension (pH) emphasises the requirement for all-encompassing prediction models. Despite advancements in therapy, the pathophysiology [6] of pH includes a mix of musculoskeletal, cardiovascular, and respiratory problems that lead to increasing exercise intolerance and a reduced quality of life. Because these comorbidities are multi-dimensional, predictive models that leverage electronic health record (EHR) data are crucial for accurately predicting patient mortality and guiding clinical decision-making.
These results demonstrate the importance of using EHR data to forecast patient mortality. EHRs capture a wide range of clinical information, including biochemical markers, comorbidities, and intervention outcomes. These are key for developing robust mortality risk prediction models to improve patient care and outcomes.
Hospitals often record patient data as EHRs, which include data on tests, symptoms, diagnoses, and prescriptions. These EHRs contain structured patient information, lab report details, and unstructured data in free text comments, such as medical notes. This rich information within EHRs is crucial to integrating knowledge about illnesses, treatments, and proteomics into clinical knowledge graphs, all within a real-time patient care system. Figure 1 shows some example tasks involved in EHR information extraction using a Clinical Data Warehouse (CDW) approach.
Integrating structured and unstructured data in EHRs allows for a comprehensive view of patient health, vital for personalised medicine. Structured data, such as lab results and medication lists, provide clear, quantitative insights. In contrast, unstructured data, like medical notes, offer contextual information and detailed narratives about the patient’s condition and treatment responses. Despite several deep learning methodologies that provide patient-specific death forecasts from unstructured data in EHRs, these current techniques frequently fail to completely extract the concealed, intricate information essential for thorough analysis. Knowledge graphs offer a robust solution by organising unstructured data into interconnected, semantic relationships. They capture complex associations between medical entities—such as symptoms, treatments, and diagnoses—allowing for a more holistic understanding of patient history. By transforming fragmented narratives into a structured form, knowledge graphs enhance the interpretability of unstructured data and enable more accurate predictions, decision-making, and personalised care. This approach bridges the gap between raw clinical narratives and actionable insights, significantly advancing the precision of healthcare analytics.
Knowledge graphs are essential for describing the complex connections and meanings inherent in the data domain. They encapsulate a wide range of biomedical entities and their interrelations, such as diseases, symptoms, drugs, and genes. Using a patient graph network, the framework enables the extraction of meaningful embeddings from the knowledge graph, facilitating the identification of subtle patterns and associations within biomedical information. For instance, a knowledge graph can reveal how specific genetic markers correlate with disease susceptibility or how drug combinations impact patient outcomes.
One effective strategy is to employ GCN for the knowledge graph. These networks leverage data statistics to guide the process of structural learning, presenting a promising approach to unravel the underlying structure inherent in EHR data [7]. GCNs can capture the dependencies and interactions between different features in the data, which traditional flat models might overlook.
Creating a knowledge graph from EHR data is a collaborative, multidisciplinary effort involving experts in healthcare, data engineering, natural language processing, machine learning, and graph databases. The resulting knowledge graph becomes a powerful tool for healthcare professionals to improve patient care, conduct research, and make more informed clinical decisions. The most common technique for applying neural networks to handle EHR data has been to treat each case as an unordered set of characteristics, essentially representing it as a “bag of features”. Unfortunately, this method disregards the vital geometric structure representing the physician’s assessment process. For instance, when we analyse the encounter in Figure 2 as a bag of features, we lose crucial information that the combination of Decadron, Revlimid, and Velcade drugs prescribed to patient ‘111791005’ was the suspected cause of anaemia, resulting in severe medical conditions.
Problem Statement: To predict patient mortality using EHRs, a patient knowledge graph focusing on extracting relationships between entities like diagnoses and treatments from unstructured medical notes for better interpretability and decision support.
A patient network is effectively modelled as a graph, where each node represents an individual patient’s hospital stay, encoded using graph representations derived from their medical notes and medical data. The edges between nodes indicate a connection between two hospital stays based on a similarity measure, such as shared diagnoses, treatment responses, or other medical characteristics. The objective of this model is to be on par with the decision-making process of healthcare professionals. In clinical practice, doctors rely on a patient’s medical history and draw on their experience with patients who have exhibited similar conditions or treatment responses. Using this graph-based approach, we can simulate this process computationally, allowing the model to inform decisions about medication, treatment plans, or interventions by identifying patterns and outcomes from similar patients. This knowledge graph captures the implicit knowledge gained from prior cases, supporting personalised and evidence-based care recommendations. This knowledge graph is the input for GCN, which learns the patient embeddings from the patient graph, encouraging a regularised latent space for the embeddings.
Thus, to address the problems of EHR data that does not always provide complete structure information, we propose PKGNN, an ensemble approach for concurrently learning the hidden structural information for different prediction tasks. We contribute the following in this paper:
  • The PKGNN (Patient Knowledge Graph Framework) model is proposed as a concrete design and convincing implementation for biomedical data. It integrates patient graph topology and health record data to enhance two tasks: mortality prediction and 30-day hospital readmission prediction.
  • An ensemble model is implemented to learn the underlying EHR structure using a graph convolutional network (GCN). The data are pre-trained on three different clinical models: Clinical BERT [9], BioBERT [10], and BlueBERT [11], depicting a promising method for binary classification.
  • We validate the proposed PKGNN on the openly accessible EHR database, MIMIC-IV [12].
This study uses the MIMIC-IV benchmark dataset to compare the performance of the proposed framework with that of SOTA deep learning models and predict critical patient outcomes. The models’ performances have been evaluated for mortality and 30-day hospital readmission predictions.

2. Literature Survey

Large-scale, balanced training data are frequently necessary for deep learning (DL) models in the healthcare industry to be reliable, flexible, and effective. Creating DL models has been significantly tricky when the data are hugely imbalanced. Healthcare providers face difficulties caring for elderly people due to increased complexity and comorbidities [13]. These challenges include balancing beneficial and detrimental therapies, keeping an eye on declining patients, and allocating resources. Mortality prediction makes decision-making quicker [14]. This study aims to emphasise the current challenges in risk prediction for the prevalence of 30-day hospital readmission and analyse its impact on mortality in ICU patients.

2.1. Latent Embeddings for Medical Notes

Medical notes contain detailed patient information, including symptoms, diagnosis, radiological results, daily activities, and sickness history. Medical notes provide vital information, but identifying trends can be tricky [9]. Their writing uses unconventional vocabulary and acronyms, and their styles are varied [15]. The high dimensionality and sparseness of healthcare records have led to a lot of research on building prediction models utilising structured EHR data elements [16,17].
Due to developments in deep learning algorithms, extracting crucial clinical information from medical records using transfer learning and contextual word embedding models has gained momentum. Bidirectional Encoder Representations from Transformers (BERT) [18] is a contextualised text representation model based on a masked language model pre-trained on a large clinical text corpus using the bidirectional transformer encoder architecture. These embeddings are then used in downstream tasks. Clinical prediction models often integrate BERT design into downstream tasks and fine-tune it to provide integrated task-specific architecture [9,10,19].
Clinical BERT [9], introduced by Huang et al., is a specialised adaptation of the BERT model explicitly designed for the medical and clinical domain. Standard NLP models like BERT are pre-trained on general language corpora, which may not capture the fine distinction and specific terminology used in medical texts. Clinical BERT has a vocabulary that is fine-tuned by a large corpus of medical notes from the MIMIC-III dataset to include medical terms, which improves its understanding of clinical texts compared to general-purpose models. Leveraging the BERT architecture, Clinical BERT captures the context of words bidirectionally, meaning it considers the entire sentence before assigning meaning to a word. This is particularly useful in clinical texts, where context is critical.
BioBERT is initialised with the original BERT weights and then pre-trained on full-text PubMed Central articles and PubMed abstracts. While Clinical BERT, BioBERT, and BlueBERT are based on the BERT architecture and use medical notes from the MIMIC-III dataset for pre-training, there are significant variations between the models. One crucial distinction in their training corpora is that Clinical BERT is pre-trained only on medical notes, whereas BlueBERT is pre-trained on medical notes of MIMIC-III and PubMed abstracts. DeepNote-GNN [20], introduced by Golmaei et al., employs a deep learning model that combines a pre-trained BERT with a patient graph to predict hospital readmissions.

2.2. Learning Graph Representation on EHR

Graph Convolutional Networks have the capacity to learn about node attributes as well as graph structure. They might be accomplished by using semi-supervised learning methods for classifying nodes.
For specific graph learning objectives, graph representation aims to learn a feature vector for a subset or an entire graph. The graphs generated from EHRs are frequently heterogeneous since they often comprise many healthcare entities and several relations. Consequently, heterogeneous graphs may not be directly fitted with the GCN.
GCNs use message-passing techniques to combine information from neighbouring nodes, allowing them to capture both local and global dependencies in graph-structured data, as shown by Wu et al. [21]. This makes GCNs particularly suitable for applications such as gene property prediction, social media analysis, and knowledge graph reasoning, where understanding relationships is critical.
Recent advancements in GNNs have also focused on their ability to learn hierarchical representations through multi-layer architectures, as highlighted by Liu et al. [22]. Their research showed that GNNs are an effective method for tasks like graph and node categorisation because they can accurately simulate intricate structural patterns. These properties, combined with their scalability and adaptability to dynamic graphs, underscore the growing importance of GNNs in analysing complex systems.
For disease prediction, Sun et al. [23] created two dual graph networks (bipartite) with two kinds of nodes each: a patient record and a medical concept graph. Rather than using the node categories in the propagation rule, they used the projection weight to bring all node embeddings onto a shared space. MedGCN [24] employs GCN and trains the model with a cross-regularisation strategy for medication recommendation. It constructs a bipartite subgraph between lab test information and encounters. One task’s loss might be considered a regularisation term for another task in cross-regularisation.
From various EHR data, MGATRx [25] created a graph with six different node categories (medication, sickness, the desired level, the base, adverse effect, and connect). It then extended multi-view max pooling for drug repositioning using an attention mechanism. The MedGCN graph is composed of four types of nodes: patients, medications, vitals, and diagnosis. It is used for drug recommendation and laboratory task imputation.
Instead of using GCN spectrum filters, Graph Attention Network (GAT) [26] compares each node in the network with its closest neighbours to learn regional characteristics. It can learn graph topologies from attention variables and may design various weights to edges, increasing the model’s ability and interpretation.
Each patient’s medical encounter includes a hospital stay embedding combined with the medical notion embeddings introduced in Graph Convolution Transformer (GCT) [7]. The challenge of uniformly distributed attention weights among medical ideas is resolved by GCT by regularising a pre-established graph. The construction of GCT involves linking several categories of medical ideas (like treatment, laboratory, and procedures) in order to replicate doctors’ decision-making processes.
The clinical prediction models built with these standard architectures have a few limitations: The fine-tuning strategy adds few task-specific factors, has limited generalisation capability, and fails to recognise the extended interdependence of words, which are critical in clinical situations. However, specially designed approaches restrict the generalisation of machine learning models. Pre-compiling a costly representation of training information and running trials with less expensive task-specific classification models has significant computational benefits [18].
In summary, although these studies have achieved satisfactory learning structures, the absence of pre-established graphs is still challenging. Thus, we solve the limitations mentioned above by introducing the fusion of a deep learning model that uses medical notes and a knowledge network for prediction. This study uses feature aggregation to improve the depiction of medical notes. The medical knowledge graph can potentially capture more meaningful and robust representations of medical concepts and their relationships. These representations can be helpful for various tasks, including disease prediction, drug discovery, and patient cohort analysis.

3. Materials and Methods

This section defines the proposed PKGNN, focusing on clinical risk prediction problems with EHR data. The proposed ensemble GCN architecture utilises medical notes [27,28] with feature extraction using pretrained BERT variant models.

3.1. Datasets

We validate the proposed PKGNN on a real-world EHR database, Medical Information Mart for Intensive Care (MIMIC-IV) [29], which is openly accessible. We selected the following two forecasting tasks to evaluate the performance of the proposed models.
The 30-day Hospital Readmission is a binary classification task that aims to predict whether a patient, at time t, will have to be re-admitted to the hospital in the next 30 days. We evaluate the AUROC and AUPRC metrics.
Mortality prediction is a binary classification task that aims to predict whether a patient, at time t, will expire in the upcoming 24 h. We evaluate the AUROC and AUPRC metrics.
The MIMIC-IV [12] and MIMIC-IV discharge summary notes [27] database undergo a selection process to identify a subset of data records for our patient cohort, omitting irrelevant and redundant features. The cohort comprises individuals aged 18 years or older who have spent a minimum of one day in the ICU, with an average daily duration exceeding six hours. Patients who are not organ donors and have not been transferred from another hospital are included in our cohort. To minimise ambiguity, we exclude individuals with conditions such as neuromuscular diseases, malignant tumours, and severe burns, which typically require extended hospital stays. Every ICU stay record includes both time-series and static characteristics (e.g., age, gender).
Figure 3 summarises the cohort data that were taken from the tables in the MIMIC-IV database. The 35 unique tables that make up the MIMIC-IV relational database are divided into four different modules that correspond to the core, hospital, intensive care unit, and derived tables. We extract information from the admissions, patients, and icu_stay tables according to the cohort requirements. Further, we transfer the ICD diagnostic codes to the cohort selection schema by mapping them from the diagnosis_icd table. Accessing the derived tables, ICUstay_hourly and vitalsign, is necessary to retrieve the hourly details of patients and their routines. Then, the discharge table’s discharge summary text field is concatenated to create the final complete cohort.

3.2. Problem Formulation

Consider a set of patients, denoted by P = { p 1 , p 2 , , p N } , where each patient is denoted as p i for i = { 1 , 2 , , N } , with N being the total number of patients. Each patient, p i , has an associated set of hospital stays, represented by B i = { b i 1 , b i 2 , , b i M i } . Here, M i denotes the total number of hospital stays for patient p i . Each hospital stay b i j , where j indexes the individual hospital stay for patient p i , contains a set of medical notes. A sample patient knowledge graph is shown in Figure 4.
To analyse each patient’s medical record comprehensively per hospital stay, we aggregate all the medical notes of each hospital stay for that patient. This aggregated set of medical notes for patient p i for the jth stay is denoted by C i j , which is defined in Equation (1) as the union of the medical notes from each hospital stay where z denotes each medical notes and k denotes the total number of medical notes during the jth hospital stay.
C i j = z = 1 k c i j z
Let S i represent the set of all medical notes for patient p i , combining data from all hospital stays as in Equation (2)
S i = { C i 1 , C i 2 , , C i M i }
The primary purpose is to learn the model’s prediction F θ : S Y ^ i j , where Y ^ i j represents the predicted likelihood for the target label Y i j . Learnable modelling characteristics are denoted by θ .
Figure 5 depicts the overall architecture of the proposed PKGNN.

3.3. Medical Notes Representation and Knowledge Graph

The observed medical notes S i for each patient are pre-processed to extract relevant information and create vectors that are used for graph node feature embeddings. Algorithm 1 describes the medical notes’ latent representation process based on feature aggregation for each patient’s hospital stay data. We utilise a pre-trained BERT variant model to tokenise medical notes S i and generate feature embeddings d i j = { d i j 1 , d i j 2 , , d i j x j } , which represents the medical notes embedding vectors of size 768 for hospital stay j, as explained in Algorithm 1.
Algorithm 1 Medical Notes Latent Representation
1:
Input: Medical notes S i , for patient p i .
2:
Output: Medical Notes Latent Representation
3:
             D = i = 1 N d i j j { 1 , 2 , , M i }
4:
for each i { 1 , , N }  do
5:
    for each hospital stay j { 1 , , M i }  do
6:
        Concatenate all the medical notes of jth hospital stay using
C i j = z = 1 k c i j z
7:
        Divide C i j into x j 512-byte chunks since the BERT model can only process 512 input sequences at once. C i j = { c i j 1 , c i j 2 , , c i j x j }
8:
        for each chunk c i j x where x { 1 , , x j }  do:
9:
            a i j x = BERT-Tokenizer ( c i j x )
10:
            d i j x = BERT variant feature vector ( a i j x )
11:
        end for
12:
        Use the average feature aggregator to obtain the feature vector
13:
         d i j = 1 x j x = 1 x j d i j x .
14:
    end for
15:
end for
16:
Note: d i j x is of the dimension 768 × x j (where 768 is the dimensionality of the BERT embeddings for each chunk), while d i j is of the dimension 768 × 1 (averaged feature aggregated vector).
For each hospital stay, the medical notes are concatenated and divided into 512-byte chunks, as the BERT model can only process sequences of this length. Each chunk c i j x is tokenised using the BERT tokeniser, and a feature vector d i j x is extracted using a BERT variant.
To obtain a single feature vector d i j for each hospital stay, the algorithm averages the feature vectors of all chunks. The resulting vector d i j has a dimensionality of 768 × 1 , representing the averaged feature-aggregated vector. This process is repeated for all hospital stays, ensuring that each set of medical notes is transformed into a compact, meaningful representation suitable for further analysis or modelling tasks. This representation serves as a compressed latent encoding of the textual information within the medical notes, facilitating downstream predictive modelling tasks.
An undirected, unweighted knowledge graph G = ( V , E ) is constructed where V is the set of nodes and E is the set of edges. The set of nodes V is defined as follows:
V = i = 1 N b i j j { 1 , 2 , , M i }
The hospital stay of patient p i during their j-th visit is represented by b i j , where j ranges from 1 to M i . Here, the total number of hospital stays for all the patients are denoted by M = { M 1 , M 2 , , M N } , where M i represents the number of hospital stays for patient p i .
Two nodes b i p and b j q corresponding to two hospital stays p, q are connected by an edge if their feature similarities are above a threshold β . The similarity score l p q for vertices is calculated using Equation (4).
l p q = 1 if similarity ( d i p , d j q ) = d i p · d j q d i p d j q > β
where l p q represents an edge between hospital stay nodes with p and q corresponding to hospital stays, while i and j denote patients. Following hyper-parameter tuning, we set β = 0.95, as the average node similarity is high. This will help to link nodes with substantial similarity and reduce the occurrence of false positive predictions.
Here, the constructed knowledge graph is trained using a two-layer GCN model. Let the symmetric adjacency matrix of the graph be L = [ l i j ] R M × M , where M is the size of the node set V. The corresponding degree matrix is represented by T, where T i i = j L i j . The adjacency matrix L is augmented with self-loops to form L ˜ = L + I M , where I M is the identity matrix.
The normalised adjacency matrix L ^ is computed as follows:
L ^ = T ˜ 1 / 2 L ˜ T ˜ 1 / 2
where L ˜ = L + I M is the adjacency matrix augmented with self-loops, and  T ˜ is the degree matrix of L ˜ , with  T ˜ i i = j L ˜ i j .
In Equation (5), the matrix L ^ is used in the GCN to aggregate information from node i and its neighboring nodes, with normalisation based on the degrees of the nodes. Specifically, the feature representation of each node is updated by combining its own features with those of its neighbors, weighted by the normalised adjacency matrix. For an undirected and unweighted graph, this weighting is based on the degrees of the nodes, ensuring that the contributions of neighbors are balanced. This process is formalised in the update rule for the g-th layer:
H ( g + 1 ) = σ L ^ H ( g ) w ( g )
where w ( g ) is the trainable weight matrix, σ indicates the ReLU activation function, which is applied element-wise to induce non-linearity, and  H ( g + 1 ) is the matrix of activations in the g-th layer.
To perform classification, the softmax function (Equation (7)) is applied to the forward model in Equation (5) to obtain class probabilities:
Y = softmax L ^ ReLU L ^ X w ( 0 ) w ( 1 )
where X is the matrix of node feature embeddings and w ( 0 ) and w ( 1 ) are the input-to-hidden and hidden-to-output weight matrices for the two-layer GCN, respectively. A GCN model for classification is illustrated in Figure 6.

3.4. Loss Function

Cross-entropy loss is frequently utilised when outcomes are categorised, for instance, in clinical risk classification. The cross-entropy loss function L CE for all labelled examples is expressed in Equation (8), as follows:
L CE = i M j = 1 Q Y i j ln Y i j ^
where M is the set of indices of labelled vertices in the graph, and Q is the output feature dimension, equal to the number of classes. And  Y R | M | × Q is the label indicator matrix.

3.5. Ensemble Graph Learning

The proposed ensemble model uses three BERT variants: Clinical BERT [9], Bio BERT [10], and Blue BERT [11]. These models individually extract medical notes’ feature representations and create patients’ hospital stay feature vectors. The proposed ensemble model uses an aggregator to generate a fixed feature vector of medical notes on top of the BERT variants. This technique captures extended word interdependence, which is essential in clinical situations.
Algorithm 2 describes the whole working procedure of the proposed PKGNN framework.
Algorithm 2 Proposed PKGNN framework
1:
Initialisation:  ←  Learning_rate, Batch_size, seed, max_grad_norm,
GCN: Input_size, hidden_size, out_size, num_layers, and threshold
2:
Obtain the feature aggregated embedding d i j from Algorithm 1.
3:
for each classifier { F j } j = 1 3  do
4:
    for each epoch do
5:
        Build a graph G, with node features and edge connections based on the cosine similarity of node embeddings using Equation (4)
6:
        Train the GCN model with the node features.
7:
        Calculate the binary cross-entropy loss using Equation (8)
8:
        Update parameters using Adam optimiser
9:
    end for
10:
end for
11:
Use ensemble approach to obtain F ev ( X ) the predictions using majority voting Equation (9).
12:
Test and validate the trained model predicting the probability scores.
Here, we consider three classifiers based on Clinical BERT, BioBERT, and BlueBERT, respectively. Let F 1 , F 2 , and F 3 denote the three classifiers. The ensemble voting classifier F ev predicts the class y ^ e from the predictive score of the individual classifiers. The trained GCN models are integrated into the ensemble model. The majority voting classifier determines the final output of the ensemble method, which aggregates the predictions of the three classifiers, as described in Equation (9):
F ev ( X ) = arg max e = 1 3 I ( C e ( S ) = Y ^ e )
In this equation, C e ( S ) represents the prediction of the e t h classifier for input S. The indicator function is I ( · ) , which returns 1 if the argument is true and 0 otherwise. The term e = 1 3 I ( C e ( S ) = Y ^ e ) counts the number of classifiers that predicted class Y ^ e .
This study demonstrates that the proposed ensemble graph-based learning approach (PKGNN) is a valuable technique for enhancing the performance of clinical prediction models, in contrast to most previous attempts to construct F θ .

4. Results

We implemented the code with python 3.12.4, pytorch-cuda 11.7, and trained all the models on a workstation with Intel ® XeonTM processor (Intel, Santa Clara, CA, USA), NVIDIA Quadro P5000 Graphics Card, 64 GB RAM (NVIDIA, Santa Clara, CA, USA).

4.1. Evaluation Metrics

The outcome of this classification must be assessed and quantified to determine whether or not the samples are correctly categorised. Accuracy, precision, recall, and AUROC are used as evaluation metrics.
True Positive (TP): Instances of deceased patients that were correctly identified as deceased.
False Positive (FP): Instances of survived patients that were misclassified as deceased.
True Negative (TN): Survived patients’ instances that were correctly identified as survived.
False Negative (FN): Deceased patients’ instances that were misclassified as survived.
Precision: Precision measures how many positive predictions are correct. The precision of a model is 1.0 if it generates no false positives. The formula is as follows:
P r e c i s i o n = T P T P + F P
Recall: The capacity to recognise each relevant value in the data collection is known as recall.
R e c a l l = T P T P + F N
Accuracy: Accuracy describes the number of correct and overall predictions.
A c c u r a c y = T P + T N Number of test samples
AUROC: The area under the ROC curve is the AUROC for a particular curve. The best AUROC is 1, while the lowest is 0.5. The trade-off between TP and FP at various decision thresholds between 1 and 0 is displayed by the AUROC curve. For unbalanced data, this measure provides extra information.
AUPRC: The Area Under the Precision-Recall Curve (AUPRC) is a metric used particularly in scenarios with imbalanced datasets. It summarises the trade-off between precision and recall across different classification thresholds. A higher AUPRC indicates better model performance, with a maximum value of 1 representing perfect precision-recall balance.
R@P80: Recall at 80% precision, indicating the recall when the precision is fixed at 80%. The formula can be expressed as follows:
R @ P 80 = max { Recall | Precision 0.80 }

4.2. Patient Knowledge Graph Framework

We comprehensively evaluate and compare the proposed method against six state-of-the-art (SOTA) methods. Table 1 shows that the PKGNN model achieves better performance than state-of-the-art results, where ensemble learning for a global patient graph with a feature aggregation method improves performance for 30-day hospital readmission prediction and mortality prediction. For the hospital readmission task, the proposed model achieves an AUROC of 0.951 and an AUPRC of 0.754, surpassing all competing models. Likewise, it attains an AUROC of 0.934 and an AUPRC of 0.652 for mortality prediction, consistently outperforming all competing models.
The model has been trained to minimise loss using the ensemble learning method with a binary cross-entropy loss function. The training configuration is set with a random seed of 42 to ensure the reproducibility of results. The model has been trained for 100 epochs, with logging occurring every 1000 iterations, validation after each epoch, and model checkpoints saved every 10 epochs. To prevent exploding gradients, gradient clipping is applied with a maximum gradient norm of 100. The batch size for training is set to 32. The optimiser is Adam, with a learning rate of 0.01, weight decay of 0.0005, and beta values of 0.9 and 0.999 for the first- and second-moment estimates, respectively. A step learning rate scheduler is employed, which reduces the learning rate by gamma (1.0) every 100 steps.
The dataset configuration specifies a graph-based dataset stored at the root path and uses a threshold of 0.99 for data processing. The GCN model with ensemble learning has an input feature size of 768, a hidden layer size of 16, and an output size of 2, indicating a binary classification task. The GCN has two layers and includes dropout with a probability of 0.5 to prevent over-fitting.
Figure 7 and Figure 8 show the comparative AUROC plot for mortality prediction and 30-day hospital readmission.

4.3. Ablation Study

In Table 2 and Table 3, we performed ablation tests to analyse the efficacy of the predicted task and relationship ensemble module and global patient graph module. In the MIMIC-IV dataset’s prediction task, we compare these experiments.
For the hospital readmission task, Set 1 includes 42,671 nodes and 472,435,459 edges, achieving the highest performance with an AUROC of 0.955, an AUPRC of 0.754, and a recall at 80% precision (R@P80) of 0.641. Set 2, with 6,162 nodes and 7,456,808 edges, shows a slight decrease in performance with an AUROC of 0.934, an AUPRC of 0.652, and an R@P80 of 0.575. Set 3, the smallest set, comprises 3700 nodes and 3,591,982 edges, resulting in an AUROC of 0.903, an AUPRC of 0.604, and an R@P80 of 0.455.
For the mortality prediction task, we again evaluated three sets. Set 1, the largest, includes 42,671 nodes and 472,435,459 edges, achieving an AUROC of 0.934, an AUPRC of 0.652, and an R@P80 of 0.575. Set 2, with 15,292 nodes and 59,643,760 edges, shows an AUROC of 0.917, an AUPRC of 0.544, and an R@P80 of 0.515. Set 3, which contains 6162 nodes and 7,456,808 edges, has the lowest performance, with an AUROC of 0.899, an AUPRC of 0.541, and an R@P80 of 0.415.
These results indicate that more extensive sets of nodes and edges generally improve the predictive performance of the proposed models for both hospital readmission and mortality prediction tasks. The significant performance drop in smaller sets highlights the importance of comprehensive data inclusion in constructing patient graphs.

5. Discussion

Deep learning algorithms for analysing raw health data in ICUs have tremendous potential for improving patient outcomes. These advanced methods enable real-time analysis of complex and unstructured data, facilitating the rapid identification of essential patterns, predicting patient deterioration, and supporting clinicians in their decision-making. Thus, we propose an ensemble patient graph framework with BERT variants: Clinical BERT, BioBERT, and BlueBERT were leveraged as cutting-edge natural language processing models pre-trained on healthcare-specific datasets. These models provide context-specific word representations from medical notes, enhancing generalisation capability and capturing the extended dependencies between words, which are crucial in clinical settings.
We successfully developed the PKGNN framework, a promising and ensemble GCN-based approach to address clinical and biomedical information complexities. The framework provides a structured and meaningful representation of clinical and biomedical data by constructing knowledge graphs and applying an ensemble approach. The ensemble model aims to leverage the strengths of both models to improve overall performance in predicting patient fatality. The ensemble approach employed in the framework excels at uncovering latent patterns and associations within the data. This capability can reveal critical insights that may have otherwise remained hidden. The performance evaluation on the MIMIC-IV dataset demonstrates that PKGNN outperforms the state-of-the-art baselines across two different tasks: mortality prediction and 30-day hospital readmission prediction.
The current study focuses on mortality prediction and 30-day hospital readmission, but there are other critical clinical outcomes that could benefit from similar predictive modelling. Future work could expand the framework to predict disease progression, treatment response, and other relevant clinical outcomes, providing a more comprehensive approach to patient risk assessment.
As technology and data collection methods in healthcare continue to evolve, ongoing research should also investigate the integration of additional data sources, such as genomic data or real-time sensor data from wearable devices, to further enhance the model’s predictive capabilities.

6. Conclusions

This study highlights the importance of integrating the graphical structure of EHR data to enhance predictive performance in critical healthcare tasks such as mortality and 30-day hospital readmission prediction. By leveraging advanced NLP models like Clinical BERT, BioBERT, and BlueBERT for medical note extraction and incorporating them into a patient knowledge graph, the proposed PKGNN framework effectively captures meaningful clinical relationships. The model’s performance metrics on the MIMIC-IV dataset demonstrate that utilising patient graph structures significantly improves prediction accuracy. This work contributes to the literature by advancing strategies for knowledge graph-based EHR analysis, offering a robust approach to risk prediction and supporting more informed clinical decision-making.

Author Contributions

Conceptualization, S.D. (S. Daphne) and V.M.A.R.; Methodology, S.D. (S. Daphne) and V.M.A.R.; Software, S.D. (S. Daphne), P.H. and S.D. (Sundarrajan Dinesh); Validation, S.D. (S. Daphne), P.H., S.D. (Sundarrajan Dinesh) and V.M.A.R.; Writing—original draft, S.D. (S. Daphne); Writing—review & editing, V.M.A.R.; Supervision, V.M.A.R. All authors have read and agreed to the published version of the manuscript.

Funding

The Anna Centenary Research Fellowship [CFR/ACRF/21244191591/AR1, Dated: 19-02-2021] supported this work.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data underlying the results presented in the work are available from the MIMIC-IV, an extensive, single-centre database comprising information about patients admitted to critical care units at a large tertiary care hospital. More details about MIMIC-III can be found on their website (https://mimic.mit.edu/about/mimic/ accessed on 3 February 2025). To access these data, interested researchers must complete the CITI ’Data or Specimens Only Research’ course (https://www.citiprogram.org/index.cfm?pageID=154&icat=0&ac=0) accessed on 3 February 2025 and then apply for credentialed access through PhysioNet (https://physionet.org/content/mimiciv/) accessed on 3 February 2025.

Conflicts of Interest

The authors have no relevant financial or non-financial interests to disclose.

References

  1. Ahuja, K.R.; Nazir, S.; Ariss, R.W.; Bansal, P.; Garg, R.; Ahuja, S.K.; Minhas, A.M.K.; Harb, S.; Krishnaswamy, A.; Unai, S.; et al. Derivation and Validation of Risk Prediction Model for 30-Day Readmissions Following Transcatheter Mitral Valve Repair. Curr. Probl. Cardiol. 2023, 48, 101033. [Google Scholar] [CrossRef] [PubMed]
  2. Evbayekha, E.; Antia, A.; Dixon, B.; Reiss, C.; LaRue, S. Predictors of mortality and burden of arrhythmias in endstage heart failure. Curr. Probl. Cardiol. 2024, 49, 102541. [Google Scholar] [CrossRef]
  3. Wang, Z.; Zhang, L.; Huang, T.; Yang, R.; Cheng, H.; Wang, H.; Yin, H.; Lyu, J. Developing an explainable machine learning model to predict the mechanical ventilation duration of patients with ARDS in intensive care units. Heart Lung 2023, 58, 74–81. [Google Scholar] [CrossRef] [PubMed]
  4. Gao, H.; Zhao, Y. A prediction model for assessing hypoglycemia risk in critically ill patients with sepsis. Heart Lung 2023, 62, 43–49. [Google Scholar] [CrossRef]
  5. Li, C.; Xu, F.; Han, D.; Zheng, S.; Ma, W.; Yang, R.; Wang, Z.; Liu, Y.; Lyu, J. Developing and verifying a multivariate model to predict the survival probability after coronary artery bypass grafting in patients with coronary atherosclerosis based on the MIMIC-III database. Heart Lung 2022, 52, 61–70. [Google Scholar] [CrossRef]
  6. Freire, T.C.; Ferreira, M.S.; De Angelis, K.; Paula-Ribeiro, M. Respiratory, cardiovascular and musculoskeletal mechanisms involved in the pathophysiology of pulmonary hypertension: An updated systematic review of preclinical and clinical studies. Heart Lung 2024, 68, 81–91. [Google Scholar] [CrossRef] [PubMed]
  7. Choi, E.; Xu, Z.; Li, Y.; Dusenberry, M.W.; Flores, G.; Xue, E.; Dai, A.M. Learning the Graphical Structure of Electronic Health Records with Graph Convolutional Transformer. Proc. AAAI Conf. Artif. Intell. 2019, 34, 606–613. [Google Scholar] [CrossRef]
  8. Lal, M. Neo4j Graph Data Modeling; Packt Publishing: Mumbai, India, 2015. [Google Scholar]
  9. Huang, K.; Altosaar, J.; Ranganath, R. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXiv 2019, arXiv:1904.05342. [Google Scholar] [CrossRef]
  10. Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2019, 36, 1234–1240. [Google Scholar] [CrossRef]
  11. Peng, Y.; Yan, S.; Lu, Z. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. arXiv 2019, arXiv:1906.05474. [Google Scholar] [CrossRef]
  12. Johnson, A.E.W.; Bulgarelli, L.; Shen, L.; Gayles, A.; Shammout, A.; Horng, S.; Pollard, T.J.; Hao, S.; Moody, B.; Gow, B.; et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 2023, 10, 1. [Google Scholar] [CrossRef] [PubMed]
  13. Exmann, C.J.; Kooijmans, E.C.; Joling, K.J.; Burchell, G.L.; Hoogendijk, E.O.; van Hout, H.P. Mortality prediction models for community-dwelling older adults: A systematic review. Ageing Res. Rev. 2024, 101, 102525. [Google Scholar] [CrossRef]
  14. Danay, L.; Ramon-Gonen, R.; Gorodetski, M.; Schwartz, D.G. Evaluating the effectiveness of a sliding window technique in machine learning models for mortality prediction in ICU cardiac arrest patients. Int. J. Med. Inform. 2024, 191, 105565. [Google Scholar] [CrossRef]
  15. Kemp, J.; Rajkomar, A.; Dai, A.M. Improved Patient Classification with Language Model Pretraining Over Clinical Notes. arXiv 2019, arXiv:1909.03039. [Google Scholar] [CrossRef]
  16. Shickel, B.; Tighe, P.J.; Bihorac, A.; Rashidi, P. Deep EHR: A Survey of Recent Advances on Deep Learning Techniques for Electronic Health Record (EHR) Analysis. arXiv 2017, arXiv:1706.03446. [Google Scholar] [CrossRef]
  17. Xiao, C.; Choi, E.; Sun, J. Opportunities and challenges in developing deep learning models using electronic health records data: A systematic review. J. Am. Med. Inf. Assoc. 2018, 25, 1419–1428. [Google Scholar] [CrossRef] [PubMed]
  18. Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar] [CrossRef]
  19. Alsentzer, E.; Murphy, J.R.; Boag, W.; Weng, W.; Jin, D.; Naumann, T.; McDermott, M.B.A. Publicly Available Clinical BERT Embeddings. arXiv 2019, arXiv:1904.03323. [Google Scholar] [CrossRef]
  20. Golmaei, S.N.; Luo, X. DeepNote-GNN: Predicting hospital readmission using clinical notes and patient network. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021, Gainesville, FL, USA, 1–4 August 2021. [Google Scholar] [CrossRef]
  21. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef]
  22. Liu, Y.; Wang, X.; Zhang, M. Hierarchical Graph Neural Networks for Complex Structure Analysis. Pattern Recognit. 2023, 135, 109–120. [Google Scholar]
  23. Sun, J.; Joshi, D.; Betancourt, F.; Solodinin, A.; Woodland, B.; Yan, H. Anion exchange chromatography of oligonucleotides under denaturing conditions. Nucleosides Nucleotides Nucleic Acids 2020, 39, 818–828. [Google Scholar] [CrossRef] [PubMed]
  24. Mao, C.; Yao, L.; Luo, Y. MedGCN: Medication recommendation and lab test imputation via graph convolutional networks. J. Biomed. Inform. 2022, 127, 104000. [Google Scholar] [CrossRef] [PubMed]
  25. Yella, J.K.; Jegga, A.G. MGATRx: Discovering Drug Repositioning Candidates Using Multi-View Graph Attention. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 19, 2596–2604. [Google Scholar] [CrossRef]
  26. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, arXiv:1710.10903. [Google Scholar] [CrossRef]
  27. Johnson, A.; Pollard, T.; Horng, S.; Celi, L.A.; Mark, R. MIMIC-IV-Note: Deidentified Free-Text Clinical Notes; PhysioNet: Enfield, UK, 2023. [Google Scholar]
  28. Goldberger, A.; Amaral, L.; Glass, L.; Hausdorff, J.; Ivanov, P.C.; Mark, R.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef]
  29. Johnson, A.E.W.; Stone, D.J.; Celi, L.A.; Pollard, T.J. The MIMIC Code Repository: Enabling reproducibility in critical care research. J. Am. Med. Inform. Assoc. 2018, 25, 32–39. [Google Scholar] [CrossRef]
  30. Zhu, W.; Razavian, N. Variationally regularized graph-based representation learning for electronic health records. In Proceedings of the CHIL ’21—Conference on Health, Inference, and Learning, Virtual, 8–10 April 2021; Volume 21, pp. 1–13. [Google Scholar] [CrossRef]
Figure 1. Electronic health records information extraction.
Figure 1. Electronic health records information extraction.
Diagnostics 15 00756 g001
Figure 2. Graphical structure of EHRs using Neo4j [8].
Figure 2. Graphical structure of EHRs using Neo4j [8].
Diagnostics 15 00756 g002
Figure 3. Data extraction from MIMIC-IV tables.
Figure 3. Data extraction from MIMIC-IV tables.
Diagnostics 15 00756 g003
Figure 4. An example of a patient knowledge graph, where the node ( b 23 ) serves as an illustration of the node without any edge, which means hospital stays, b 23 has no similar hospital stay features in the network. All the other nodes are connected by an edge, which means their feature similarities are above a threshold β .
Figure 4. An example of a patient knowledge graph, where the node ( b 23 ) serves as an illustration of the node without any edge, which means hospital stays, b 23 has no similar hospital stay features in the network. All the other nodes are connected by an edge, which means their feature similarities are above a threshold β .
Diagnostics 15 00756 g004
Figure 5. Overall architecture of the proposed PKGNN.
Figure 5. Overall architecture of the proposed PKGNN.
Diagnostics 15 00756 g005
Figure 6. The framework of a classification model using a graph convolutional network. Node characteristics and a patient knowledge graph structure make up the model’s input. After that, pre-processing may be applied to the node features. After that, a block of graph convolutional layers receives the node characteristics and uses them to learn node embeddings. The graph embeddings are then learnt using a node pooling module. Finally, the results are predicted using the graph embeddings.
Figure 6. The framework of a classification model using a graph convolutional network. Node characteristics and a patient knowledge graph structure make up the model’s input. After that, pre-processing may be applied to the node features. After that, a block of graph convolutional layers receives the node characteristics and uses them to learn node embeddings. The graph embeddings are then learnt using a node pooling module. Finally, the results are predicted using the graph embeddings.
Diagnostics 15 00756 g006
Figure 7. AUROC for in-hospital mortality prediction.
Figure 7. AUROC for in-hospital mortality prediction.
Diagnostics 15 00756 g007
Figure 8. AUROC for 30-day hospital readmission prediction.
Figure 8. AUROC for 30-day hospital readmission prediction.
Diagnostics 15 00756 g008
Table 1. AUROC for 30-day hospital readmission and mortality prediction.
Table 1. AUROC for 30-day hospital readmission and mortality prediction.
ModelHospital ReadmissionMortality Prediction
AUROCAUPRCAUROCAUPRC
BlueBERT [11]0.7900.5120.7870.461
GCT [7]0.8340.5810.7930.582
Biobert [10]0.8660.5700.8110.574
Clinical BERT [9]0.8690.6120.8370.602
DeepNote-GNN [20]0.8710.6530.8680.613
VGNN [30]0.9130.6240.8940.634
PKGNN (proposed)0.9510.7540.9340.652
Table 2. The 30-day hospital readmission prediction.
Table 2. The 30-day hospital readmission prediction.
AUROCAUPRCR@P80
Set 1nodes42,6710.9510.7540.641
edges472,435,459
Set 2nodes61620.9340.6520.575
edges7,456,808
Set 3nodes37000.9030.6040.455
edges3,591,982
Table 3. In-hospital mortality prediction.
Table 3. In-hospital mortality prediction.
AUROCAUPRCR@P80
Set 1nodes42,6710.9340.6520.575
edges472,435,459
Set 2nodes15,2920.9170.5440.515
edges59,643,760
Set 3nodes61620.8990.5410.415
edges7,456,808
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Daphne, S.; Rajam, V.M.A.; Hemanth, P.; Dinesh, S. An Ensemble Patient Graph Framework for Predictive Modelling from Electronic Health Records and Medical Notes. Diagnostics 2025, 15, 756. https://doi.org/10.3390/diagnostics15060756

AMA Style

Daphne S, Rajam VMA, Hemanth P, Dinesh S. An Ensemble Patient Graph Framework for Predictive Modelling from Electronic Health Records and Medical Notes. Diagnostics. 2025; 15(6):756. https://doi.org/10.3390/diagnostics15060756

Chicago/Turabian Style

Daphne, S., V. Mary Anita Rajam, P. Hemanth, and Sundarrajan Dinesh. 2025. "An Ensemble Patient Graph Framework for Predictive Modelling from Electronic Health Records and Medical Notes" Diagnostics 15, no. 6: 756. https://doi.org/10.3390/diagnostics15060756

APA Style

Daphne, S., Rajam, V. M. A., Hemanth, P., & Dinesh, S. (2025). An Ensemble Patient Graph Framework for Predictive Modelling from Electronic Health Records and Medical Notes. Diagnostics, 15(6), 756. https://doi.org/10.3390/diagnostics15060756

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop