Next Article in Journal
Enhancing Manufacturing Excellence with Digital-Twin-Enabled Operational Monitoring and Intelligent Scheduling
Previous Article in Journal
Advanced UAV Material Transportation and Precision Delivery Utilizing the Whale-Swarm Hybrid Algorithm (WSHA) and APCR-YOLOv8 Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning in Medical Triage: A Predictive Model for Emergency Department Disposition

by
Georgios Feretzakis
1,*,
Aikaterini Sakagianni
2,
Athanasios Anastasiou
3,
Ioanna Kapogianni
1,
Rozita Tsoni
1,
Christina Koufopoulou
4,
Dimitrios Karapiperis
5,
Vasileios Kaldis
6,
Dimitris Kalles
1 and
Vassilios S. Verykios
1
1
School of Science and Technology, Hellenic Open University, 26335 Patras, Greece
2
Intensive Care Unit, Sismanogleio General Hospital, 15126 Marousi, Greece
3
Biomedical Engineering Laboratory, National Technical University of Athens, 15772 Athens, Greece
4
Anaesthesiology Department, Aretaieio Hospital, National and Kapodistrian University of Athens, 11528 Athens, Greece
5
School of Science and Technology, International Hellenic University, 57001 Thessaloniki, Greece
6
Emergency Department, Sismanogleio General Hospital, 15126 Marousi, Greece
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(15), 6623; https://doi.org/10.3390/app14156623 (registering DOI)
Submission received: 16 April 2024 / Revised: 24 July 2024 / Accepted: 27 July 2024 / Published: 29 July 2024
(This article belongs to the Section Biomedical Engineering)

Abstract

:

Featured Application

Machine-Learning Support for Hospital Admission Decisions.

Abstract

The study explores the application of automated machine learning (AutoML) using the MIMIC-IV-ED database to enhance decision-making in emergency department (ED) triage. We developed a predictive model that utilizes triage data to forecast hospital admissions, aiming to support medical staff by providing an advanced decision-support system. The model, powered by H2O.ai’s AutoML platform, was trained on approximately 280,000 preprocessed records from the Beth Israel Deaconess Medical Center collected between 2011 and 2019. The selected Gradient Boosting Machine (GBM) model demonstrated an AUC ROC of 0.8256, indicating its efficacy in predicting patient dispositions. Key variables such as acuity and waiting hours were identified as significant predictors, emphasizing the model’s capability to integrate critical triage metrics into its predictions. However, challenges related to the complexity and heterogeneity of medical data, privacy concerns, and the need for model interpretability were addressed through the incorporation of Explainable AI (XAI) techniques. These techniques ensure the transparency of the predictive processes, fostering trust and facilitating ethical AI use in clinical settings. Future work will focus on external validation and expanding the model to include a broader array of variables from diverse healthcare environments, enhancing the model’s utility and applicability in global emergency care contexts.

1. Introduction

The demand for emergency department (ED) services is escalating globally, leading to overcrowding and a subsequent decline in the quality of care [1,2]. Delays in care, exacerbated by increased demand, have been linked to heightened morbidity and mortality rates [3]. The COVID-19 pandemic has further highlighted the critical nature of effective patient triage in EDs, as seen during the healthcare crisis in Italy’s Lombardy region [4]. Efficient triage is crucial as it directly impacts patient outcomes and can be life-altering.
The decision to admit patients is typically made after a clinical evaluation, involving various tests and procedures. In busy or understaffed environments, a reliable decision-support system, informed by machine-learning algorithms trained on historical ED data, could significantly aid medical staff by predicting hospital admissions.
Dubey et al. [5] emphasize the importance of resource availability in enhancing the performance of predictive models. The theory of the five Vs—velocity, volume, value, variety, and veracity—highlights essential data characteristics that must meet specific standards for different applications [6,7]. Medical data, laden with sensitive information, poses high disclosure risks, driving the need for open data solutions. Open data offers immense potential by illuminating disease causes, effective treatments, and hospital admission management, which has garnered attention from various sectors over the past two decades [8]. Notably, large datasets such as Healthdata.gov and the MIMIC Critical Care Database are crucial for developing robust predictive models [8].
This paper introduces an automated machine-learning (AutoML) approach using the MIMIC-IV-ED database to predict patient dispositions at triage, marking a significant contribution to medical informatics and emergency care. Our study extends beyond traditional AutoML applications by integrating a context-aware preprocessing methodology tailored for the idiosyncrasies of emergency medical data, which often includes irregular, incomplete, and anomalous data patterns.

Related Work

Previous studies have set the groundwork for the application of machine learning in healthcare. Haas et al. [9] utilized associative classification and odds ratios to predict in-hospital patient mortality using the MIMIC-IV dataset. Zhao et al. [10] developed a predictive model to assess mortality risks in patients with sepsis-related acute respiratory failure. Building upon these foundational works, our approach seeks to extend these methodologies by integrating more granular data handling and advanced machine-learning techniques. Additionally, Xie et al. [11] established a benchmark database using the MIMIC-IV ED database. Our work enhances this model by employing a comprehensive AutoML platform that simplifies model selection and tuning, potentially improving accuracy and usability for end-users in clinical settings.
The field of medical triage has seen varied approaches to predictive modeling. Silahtaroğlu and Yılmaztürk [12] reported on a machine-learning pre-diagnosis model for EDs, achieving a minimum accuracy of 75.5%, while Parker et al. [13] utilized demographic and temporal variables in a logistic regression model to predict hospital admissions effectively. Our study advances these models by implementing a Gradient Boosting Machine (GBM) model that not only considers a broader range of variables but also integrates interactions between them, providing a more nuanced understanding of patient outcomes.
Moreover, Araz et al. [14] explored multiple algorithms, including decision trees and neural networks, to enhance predictive capabilities. Contrasting with these, our model uniquely applies AutoML to streamline the selection and optimization process, identifying the most effective algorithms based on performance metrics. This approach differs markedly from the more manual, algorithm-specific enhancements used by Bekker et al. [15] and Goto et al. [16], who employed linear programming and conventional statistical methods, respectively.
Recent studies have demonstrated the efficacy of machine-learning techniques in predicting hospital admissions from emergency department data. Feretzakis et al. explored the application of machine-learning algorithms to forecast hospital admissions, highlighting their potential to improve decision-making in emergency care settings [17]. In subsequent studies, the authors focused on different aspects of this approach. They used machine learning for predicting the hospitalization of emergency department patients, providing insights into how these models can be tailored to specific clinical needs [18]. Another study by the same group employed exploratory clustering techniques to better understand patient data patterns and improve prediction accuracy [19]. Additionally, they investigated the prediction of hospitalization using a broader range of variables, emphasizing the model’s robustness and adaptability in various emergency department scenarios [20]. These efforts underscore the growing importance of integrating machine learning into clinical workflows to support healthcare professionals in making more informed and timely decisions.
Our research contributes significantly to the field by advancing the integration of machine learning into the triage process. We aim not only to predict outcomes but also to understand the underlying factors that contribute to these outcomes, which is crucial in scenarios involving readmissions and early bounce-backs previously explored through deep-learning models [21,22]. By offering a model that is both transparent and adaptable, our approach sets a new standard for predictive accuracy and ethical AI usage in medical triage.
Our study introduces an innovative approach to emergency department (ED) triage using automated machine-learning (AutoML) technologies applied to the MIMIC-IV-ED database. This research is pivotal as it enhances decision-making processes at a critical first point of contact in healthcare systems—emergency triage. We leverage a comprehensive Open Source AutoML Python Module, H2O.ai, to develop a predictive model that not only forecasts hospital admissions but also integrates context-aware preprocessing methodologies specifically tailored to address the unique and often irregular data patterns found in emergency medical settings.
Key contributions of this research include:
  • Advanced Data Handling and Machine Learning Techniques: Extending beyond traditional applications of AutoML in healthcare, our study employs advanced data preprocessing techniques to manage the irregular, incomplete, and anomalous data patterns typical in emergency department settings. This allows for more accurate and reliable predictions, improving upon the granularity of data handling noted in foundational studies by Haas et al. [9] and Zhao et al. [10].
  • Application of Context-Aware AutoML: Unlike previous studies that have applied AutoML generically across healthcare data, our approach is meticulously designed for the idiosyncrasies of ED data, enhancing both the model’s accuracy and its practical usability for end-users in clinical environments.
  • Integration of Explainable AI (XAI) Techniques: In response to the critical need for transparency and accountability in AI applications within healthcare, our model incorporates XAI principles. This ensures that the decision-making processes of our model are transparent, fostering trust among medical practitioners and supporting ethical AI usage in clinical settings.
  • Focus on Real-World Clinical Settings: The predictive model is intended as a decision-support tool to aid medical staff under high-pressure conditions, particularly in overcrowded and understaffed environments, where rapid and accurate decision-making is crucial.
  • Empirical Evaluation and Ethical Consideration: We conduct a thorough empirical evaluation of the model’s performance and discuss its ethical implications, ensuring that our contributions are not only technically sound but also ethically responsible.
By addressing these aspects, our study not only contributes to the academic field of medical informatics and emergency care but also sets a new benchmark for the integration of AutoML in enhancing emergency department operations. This research underscores the potential of machine learning to transform emergency medical services by providing timely and accurate decision support, thereby potentially improving patient outcomes and operational efficiency in EDs.

2. Materials and Methods

For this research, data were sourced from the MIMIC-IV-ED database, extracted by a certified team member using specialized queries [23]. Our main interest lies in the emergency department triage data, specifically the initial vital signs and basic metrics recorded upon arrival. Although incorporating a comprehensive diagnostic evaluation could enhance our model’s accuracy, we have chosen to limit our analysis to these preliminary parameters.

2.1. The MIMIC-IV-ED Database

We utilized the MIMIC-IV-ED database, which includes emergency department records from Beth Israel Deaconess Medical Center (Boston, MA, USA) between 2011 and 2019 [24]. Data extraction was conducted with SQLite software version 3.40.0, focusing on variables from the ‘edstays’, ‘triage’, and ‘medrecon’ (medicine reconciliation) tables to form our dataset. Patient age was determined by merging the ‘edstays’ table from the MIMIC-IV-ED database with the ‘patients’ table from the ‘hosp’ (hospital) module of the MIMIC-IV database, using stay_id and subject_id identifiers. For our predictive model, we concentrated initially on a specific set of features, particularly initial vital signs and basic metrics. This choice was guided by the availability of reliable data within the MIMIC-IV database and the need to establish a baseline model that could be efficiently validated. These features, while foundational, represent only a subset of the potential predictors that could influence hospital admission decisions. We recognize that the dynamics of hospital admissions are influenced by a myriad of factors, including more complex patient histories, laboratory results, and prior health conditions, which were not included in the current model scope due to data constraints and initial scope definition.
Each entry in the final dataset corresponds to a distinct emergency department visit capturing variables such as age, gender, race, mode of arrival, vital signs, pain intensity, acuity related to condition severity, current medications, final disposition (admission or discharge), and total waiting time in hours.
As is usual in applied machine-learning pipelines, some preprocessing is always necessary as an initial stage—for example, removing records with null values or values outside the corresponding domain. Specifically, we excluded records that contained values outside the specific boundaries for legal values for the scale attributes O2 saturation, pain, and acuity. In the context of analyzing medical data from emergency department patients, it is common to encounter values that seem incompatible with life. Outlier values can occur for a variety of reasons, including measurement errors, data entry mistakes, or the severity of the patients’ conditions. This process of data cleaning is semi-automated and human judgment has to be exercised, due to the sensitive nature of how to handle outlier values. A broad range of values are presented in Table 1.
In our study, the preprocessing of data, particularly the exclusion of extreme cases, is a critical point that warrants further clarification. Extreme values in medical datasets, especially in a triage context, can often represent anomalies that are either errors in data entry or true clinical outliers. While these cases can provide valuable insights into unusual but clinically relevant scenarios, they can also introduce significant noise and potential bias into the predictive model. Our approach to data preprocessing was guided by the dual objectives of maintaining the integrity of the model and ensuring its general applicability to a wide range of emergency department scenarios. The decision to exclude certain extreme cases was not made lightly but was based on a thorough analysis of their impact on model performance.
Records containing null values or values beyond the set limits were excluded from the table. The processed data, consisting of approximately 280,000 records, were utilized for model training and prediction. Post-preprocessing, the final data table comprises the variables detailed in Table 1.

2.2. The Modeling

AutoML, or automated machine learning, applies machine-learning models to real-world problems through automated selection, composition, and parameterization. This automation can make ML processes more accessible and often faster and more accurate than manual methods. Institutions without specialized data scientists or machine-learning experts can use these platforms, available from third-party vendors or open-source repositories.
In the medical field, AutoML is used for tasks such as disease prediction, medical imaging, genomics, and EHR analysis. These tools simplify the application of ML models, reducing the need for extensive ML expertise and allowing clinicians to focus on their primary tasks. Automation can increase efficiency, reproducibility, and potentially improve diagnosis and prognosis accuracy [25]. AutoML also supports personalized medicine by using patient-specific data for individualized treatment plans [26,27].
To ensure transparency in healthcare applications, our study incorporates Explainable AI (XAI) principles, allowing clinicians to understand and trust the model’s predictions. XAI techniques, such as feature importance scores and model-agnostic explanations, clarify how and why predictions are made. Visualizations of feature impacts and decision logic help healthcare providers assess the model’s recommendations.
We also address the ethical implications of using automated systems in medical triage (Section 4), emphasizing our commitment to responsible AI use. Safeguards are in place to prevent misuse, and we continuously monitor the model’s performance to align with ethical standards and clinical expectations.
AutoML automatically identifies and applies the best machine-learning algorithm for a specific task, with “optimal” defined as achieving the highest performance based on a set of predefined criteria within a specified time frame [28]. Below are the steps in the machine-learning process that AutoML can automate, listed in the order they occur:
  • Raw data processing;
  • Feature engineering and feature selection;
  • Model selection;
  • Hyperparameter and parameter optimization;
  • Deployment, considering business and technology constraints;
  • Evaluation metric selection;
  • Monitoring and issue detection;
  • Analysis of results.

2.3. The Tools

AutoML uses several, fixed or randomly chosen, grids of models like XGBoost, Logistic Regression, Random Forest, or Deep Neural Nets. There are several AutoML platforms offered in the market, e.g., Microsoft Azure AutoML, Google Cloud AutoML, Scikit-learn, and Amazon SageMaker. In this study, we use H2O.ai in Python.
This platform provides functionalities such as data pre-processing, feature engineering, model selection, hyperparameter tuning, and prediction interpretation [29]. It has been effectively used in a range of areas such as finance, healthcare, marketing, and more, to address complex problems [30]. By handling much of the model-building process, the platform allows users with little to no data science experience and background to build models with competitive performance, thus democratizing the usage of machine learning [31].
One of the critical aspects of the platform we used is its flexibility and extensibility, which make it suitable for different types of data and prediction problems, including regression, classification, and time-series forecasting [32,33,34,35].

2.4. Train and Test Data

To evaluate the classification model’s performance, the dataset was divided into train and test sets using the split frame() function. This function allocated 80% of the data to the training set and 20% to the test set. The split was stratified to maintain the class proportions in both sets, ensuring representative samples for training and evaluation. This stratified split is crucial for obtaining reliable performance metrics for the model, especially for cases of imbalanced data. This train–test split strategy enables the model to be trained on a sufficient amount of data while allowing for an unbiased assessment of its performance on unseen data. After having defined the target variable and the predictors, the H2OAutoML model is trained using the aml.train() function, stating the maximum number of models to be trained.

2.5. Model Selection

Once the H2OAutoML model is trained, we obtain the performance metrics on test data of all the models—in our case, 15 models—built by H2OAutoML and sorted by their performance in terms of AUC. The Gradient Boosting Machine (GBM) classification model achieved an AUC of 0.8256, marginally lower than the Generalized Linear Modeling (GLM) model’s AUC of 0.826. However, the decision to use the GBM model was driven by the need for more nuanced insights from variable importance analysis, which the GBM facilitates through its ensemble of decision trees. While GLMs are advantageous due to their simplicity and interpretability, they can sometimes fall short in capturing complex nonlinear relationships in the data that GBM models can uncover. The GBM model’s ability to identify these intricate patterns was deemed valuable for our predictive analysis, potentially providing a richer understanding of the factors influencing the outcomes. The GBM model for classification that was generated by H2O for this research study consists of 50 trees with a depth range of 5. The model has a relatively small size, occupying approximately 22.7 KB in memory. The GBM algorithm leverages an ensemble of decision trees and boosting methods to iteratively optimize the model’s predictive functioning by decreasing the loss function [36]. Each tree in the ensemble is built to adjust the errors made by the preceding trees, with a greater emphasis placed on the samples with higher residuals. This iterative process allows the model to capture intricate interactions and nonlinear relationships in the data, leading to enhanced predictive accuracy.
The GBM model’s summary provides valuable insights into its structure and characteristics. The depth of the trees indicates the complexity of the learned relationships, with a maximum depth of 5 indicating that the model can capture moderately complex patterns [37]. The number of leaves in each tree indicates the model’s granularity, with an average of 28.5 leaves per tree. This level of granularity allows the model to make fine-grained predictions by considering a variety of conditions within the input features. The summary also reports the number of internal trees, which refers to the number of trees excluding the initial tree in the boosting process. This information helps assess the computational complexity of the model and the number of iterations required for convergence. The GBM model, with its ensemble of decision trees and boosting techniques, offers a powerful tool for predictive modeling in various domains. It has been commonly utilized in applications like fraud detection, customer churn prediction, and disease diagnosis [38]. The model’s ability to capture complex relationships and handle large-scale datasets makes it suitable for a wide range of research and practical applications. Researchers and practitioners can leverage the GBM model’s capabilities to uncover important insights from their data and make accurate predictions.

3. Results

The performance metrics of the selected GBM model on test data are presented in Table 2. The Receiver Operating Characteristic (ROC) curve for the GBM model is depicted in Figure 1. Under the ROC Curve area (AUC ROC) is a broadly recognized metric in machine learning and statistics, indicating the model’s capacity to differentiate between positive and negative classes, representing the probability that the model will rank a randomly selected positive instance higher than a randomly selected negative instance [38,39,40]. For this dataset, the GBM model achieved an AUC of 0.8256, suggesting an 82.56% likelihood that the classifier will prioritize a randomly chosen positive instance over a randomly chosen negative one.
We now briefly review some key metrics for reporting classifier quality before actually presenting the results of our investigation.
The Area Under the Curve (AUC) serves as a quantification of a binary classifier’s ability to discriminate between classes and functions as a condensed representation of the ROC curve, with greater AUC values corresponding to the better performance of the model.
Additionally, a precision–recall curve is employed to assess the effectiveness of binary classification algorithms, particularly in scenarios characterized by a class imbalance. Similar to ROC curves, precision–recall curves offer a visual depiction of a classifier’s performance across various thresholds, as opposed to a single metric. To create a precision–recall curve, the precision and recall are computed and graphed for a single classifier using a range of thresholds. A model attains a perfect Area Under the Precision–Recall Curve (AUPRC) when it successfully identifies all positive examples (achieving perfect recall) without mistakenly classifying any negative instances as positive (achieving perfect precision).
The Mean Squared Error (MSE) quantifies the proximity of a regression line to a given set of data points. It represents a risk function that corresponds to the expected value of the squared error loss. The MSE is calculated by averaging the squared errors between the data points and the function’s predictions. Ideally, the MSE values should be close to zero.
Another primary performance metric for a regression model is the Root Mean Squared Error (RMSE) which evaluates the average disparity between the model-predicted values and actual values. The RMSE evaluates the model’s predictive accuracy, with lower values indicating a more effective model. Ideally, if a model consistently predicts the exact expected value, the RMSE would be zero. A notable advantage of the RMSE is that it expresses error in the same units as the predicted variable, making interpretation straightforward.
By using the appropriate function, H20 calculates the confusion matrix based on the model’s predictions and the actual target values from the dataset (Table 3). The confusion matrix summarizes the model’s performance on the test data by comparing the predicted classes (ADMITTED and HOME) with the actual classes. Metrics such as accuracy, precision, recall, specificity, or F1-score are extracted from the confusion matrix and offer insights into the model’s classification capabilities. Evaluating these metrics helps identify the model’s strengths and weaknesses, guiding decision-making in practical applications. In the given confusion matrix:
True Positives (TP): The model correctly predicted 12,241 instances as “ADMITTED”.
False Negatives (FN): The model falsely predicted 10,833 instances as “HOME” when they were actually “ADMITTED”.
True Negatives (TN): The model correctly predicted 28,295 instances as “HOME”.
False Positives (FP): The model falsely predicted 3797 instances as “ADMITTED” when they were actually “HOME”.
Accuracy: Accuracy assesses the overall correctness of the model’s predictions by calculating the ratio of the correctly predicted instances to the total number of instances. In this case, the accuracy can be calculated as (12,241 + 28,295)/(23,074 + 32,092) = 0.7317 or 73.17%.
Precision: Precision measures the proportion of correctly predicted positive instances (ADMITTED) among all instances that were predicted as positive. In this case, it is calculated as 12,241/(12,241 + 3797) = 0.7632 or 76.32%.
Recall (Sensitivity/True Positive Rate): Recall assesses the proportion of actual positive instances (ADMITTED) that were correctly predicted by the model. It is calculated as 12,241/(12,241 + 10,833) = 0.5308 or 53.08%.
Specificity (True Negative Rate): Specificity evaluates the proportion of correctly predicted negative instances (HOME) among all actual negative instances. It is calculated as 28,295/(28,295 + 10,833) = 0.7221 or 72.21%.
F1 Score: The F1 score merges precision and recall into a single metric, offering a balance between the two. It is computed using the following formula: 2 × (precision × recall)/(precision + recall).
Error Rate for “ADMITTED” class: Τhe number of false predictions (10,833) divided by the total number of instances in the “ADMITTED” class (23,074). The result is approximately 0.4695, which represents a 46.95% error rate for predicting “ADMITTED” cases.
Error Rate for “HOME” class: Τhe number of false predictions (3797) divided by the total number of instances in the “HOME” class (32,092). The result is approximately 0.1183, which represents an 11.83% error rate for predicting “HOME” cases.
Based on the given confusion matrix and related evaluation metrics, we can assess the performance of the GBM model by classifying instances into two classes: “ADMITTED” and “HOME”. The evaluation metrics obtained from the model’s performance on the test data are as follows:
Mean Square Error (MSE): The MSE is 0.1677, indicating the average squared difference between the predicted probabilities and the true class labels. A lower MSE value suggests higher model accuracy.
Root Mean Square Error (RMSE): The RMSE is 0.4095, representing the square root of the MSE. It provides a measure of the average deviation between the predicted probabilities and the actual class labels. A lower RMSE indicates better model performance.
Log Loss: The log loss is 0.5005, which evaluates the model’s performance by measuring the logarithmic difference between the predicted probabilities and the actual class labels. A lower log loss indicates greater model accuracy.
Mean Per-Class Error: The mean per-class error is 0.2939, representing the average error rate across both classes. It provides insights into the model’s performance in correctly classifying instances for each class. A lower mean per class error suggests better class separation.
Area Under the Curve (AUC): The AUC is 0.8256, indicating the model’s ability to discriminate between positive and negative classes. A higher AUC value signifies better overall model performance (Figure 1).
Area Under the Precision–Recall Curve (AUCPR): The AUCPR is 0.8722, which measures the trade-off between precision and recall. A higher AUCPR value suggests a better model.
To address concerns regarding the performance of our predictive model as illustrated in Table 3, particularly the issue of false negatives, it is imperative to clarify the intended use of our model within a clinical setting. Our model is designed as a decision-support tool, intended to augment but not replace the expert judgment of medical professionals. The primary role of this model is to assist emergency department staff by providing a statistical assessment that must be considered alongside a comprehensive clinical evaluation. Decisions on patient disposition (admission or discharge) ultimately reside with the attending healthcare providers, who can evaluate the broader context of a patient’s medical condition.
To visualize the distribution of predictions versus actual values, we can use a bar plot to compare the count of each class in the actual and predicted results (Figure 2). This plot provides a visual comparison between the distribution of the actual and predicted classes, enabling us to evaluate the model’s performance in correctly identifying different class labels.
Using the varimp() function in H20, we can perform a variable importance analysis for each classification model. For the GBM model, the six highest-ranked variables are acuity, waiting hours, concomitant drugs, age, means of arrival/transport, and race (Figure 3). Based on the H2O GBM predictions, a correlation heatmap was generated, enabling the visualization of the correlations between different features in the dataset (Figure 4).
Understanding these correlations helps in identifying relationships between different physiological measures. For example, the positive correlation between the acuity and heart rate can be critical in triaging patients based on their vitals.

4. Discussion

Based on the evaluation metrics, the GBM model shows promising performance in classifying instances into the “ADMITTED” and “HOME” classes. The low MSE and RMSE values indicate accurate predictions, supporting the model’s effectiveness in minimizing prediction errors. The low log loss confirms the reliability of the model’s predicted probabilities. The mean per-class error of 0.2939 demonstrates a reasonable balance in class separation, essential for applications where both classes are equally important. Moreover, the AUC and AUCPR values of 0.8256 and 0.8722, respectively, indicate the model’s ability to discriminate between classes and achieve a high precision and recall. The GBM model demonstrated promising results, achieving an accuracy of 73.17% on the test data. This accuracy shows that the model correctly predicts the class for the majority of instances, though there is room for improvement.
The model’s performance was further evaluated using various metrics derived from the confusion matrix. The evaluation metrics revealed that the model achieved a precision of 76.32%, indicating its ability to accurately predict positive instances (ADMITTED), while a recall of 53.08% suggests the model may miss many “ADMITTED” cases, highlighting an area for improvement. The specificity of 72.21% indicates the model’s competence in identifying true negative cases (HOME). The F1 score, which provides a balanced measure considering both precision and recall, resulted in a score of 0.6279. A balanced accuracy of 62.65% indicates moderate fairness across both classes, considering a potential dataset imbalance.
These results suggest that the GBM model might be an effective tool for predicting patient disposition, either being admitted or sent home, based on the available features. However, our detailed discussion of each evaluation metric validates the robustness of our methodology. While the model performs well in many areas, specific aspects like recall could be enhanced. Future work might involve fine-tuning the model, exploring alternative algorithms, or incorporating additional features to improve these areas. This comprehensive analysis ensures our results are transparent, interpretable, and actionable. Further analysis and validation on different datasets are recommended to ensure the model’s robustness and generalizability.
We emphasize that such models should be equipped with multiple safeguards. This includes the integration of model suggestions with traditional diagnostic tools and patient data, and mandatory oversight by healthcare practitioners to validate and approve model outputs before any clinical application. Additionally, regular updates and recalibrations of the model based on ongoing feedback and emerging clinical data are necessary in order to ensure its accuracy and relevance. The developers of such models should be responsible for maintaining transparency in the model’s function and actively working on enhancing its predictive accuracy.
Special attention should be given to improving the sensitivity of the model to reduce the risk of false negatives significantly. Additionally, explainability features should be incorporated to ensure that healthcare providers understand the rationale behind model predictions, fostering trust and facilitating more informed decision-making. These steps are crucial for embedding such models into the clinical workflow responsibly and ethically, ensuring that they serve as a reliable aid in the high-stakes environment of emergency medicine.
One particularly interesting type of further analysis refers to actually evaluating the quality of the medical decision, taking into account the re-admission occurrence (for example). The temporal dimension of patient admittance, when considered in parallel with some degree of retrospective analysis of previous diagnoses and treatments, may lead to a wealth of new information. However, this can be a delicate exercise in dealing with hospital procedures and is well beyond the scope of ML automation, where one expects to fast-automate aspects of work which will motivate interdisciplinary teams (medical experts, data scientists, and hospital management) to look into the matter of further investigating the findings and not simply to adopt them as such. For example, in our analysis, one might consider examining whether the final disposition variable actually has a valid value, to possibly infer something about the validity of the medical diagnosis, by checking how many cases refer to a patient returning to the ED within 30 days and whether he or she had the same symptoms as before; however, this analysis is not possible without resorting to some Natural Language Processing technology, since the chiefcomplaint variable from the ‘triage’ table is a string-type variable which includes a free text description for the symptoms, and this hardly amounts to AutoML at the present state of the practice.
These findings demonstrate the potential of using triage data and automated ML techniques, such as H2OAutoML, to aid in hospital admission prediction. However, significant challenges exist. The complexity and heterogeneity of medical data, as well as the critical need for interpretability and reliability in this domain, can make the application of AutoML challenging. Additionally, privacy and security concerns around the use of sensitive medical data in machine-learning models are another significant issue that must be addressed [26].
The choice of admitting an emergency department patient to the hospital or releasing him/her with medical advice is a non-trivial, multifactorial challenge. In this work, several models have been trained to predict the decision based on limited attributes. Real-life data enclose human biases and hidden secondary factors that affect doctors’ decisions. For example, according to the defined contribution health insurance options, a patient insurance capacity could plausibly affect, at a certain point, the institution’s capability of providing services. Apparently, due to privacy issues, this sensitive information is not available for analysis to train a decision-support system such as the proposed one. However, this factor, along with several other hidden parameters (medical staff’s fatigue, the seriousness of the patient’s condition relative to other patients, the level of doctor’s experience, the doctor’s ability to stay focused in high-pressure circumstances, etc.) are implicitly embedded in the black box of machine learning. Therefore, the proposed analysis does not provide an optimal decision for the doctors to rely on rather than a second opinion based on previous decisions of the average colleague.
In discussing the ethical use of AI in healthcare, we stress the importance of transparency and accountability. The integration of XAI techniques serves not only to enhance trust among end-users but also to uphold our ethical obligation to provide clear, understandable, and justifiable predictions. By ensuring that our AutoML-based predictions are explainable, we aim to contribute positively to clinical outcomes while respecting the crucial human elements of medical practice.
Despite its achievements, our study acknowledges several limitations that future research should address:
  • Data Preprocessing and Model Applicability: The exclusion of extremely atypical cases during data preprocessing may restrict the model’s applicability across diverse clinical scenarios. These cases, though rare, represent complex challenges that could benefit from predictive modeling. Future iterations will aim to expand the model’s scope to incorporate a wider spectrum of cases and explore advanced modeling techniques to handle the greater variability in patient presentations without sacrificing accuracy.
  • Dependence on Data Quality: The model’s performance and generalizability depend heavily on the quality and granularity of the input data, which can vary significantly across different healthcare settings. To address this, the further validation of the model on diverse datasets from varied healthcare environments is essential.
  • Ethical Consideration: As machine-learning systems become more prevalent in healthcare, ensuring adherence to the highest standards of data protection and ethical use is crucial. Its integration into clinical practice must consider privacy and data security implications carefully.
  • Feature Selection: To address the limitations in initial feature selection, future work will incorporate a broader and more complex array of variables. This includes integrating clinical notes via natural language processing (NLP) techniques to extract predictors such as patient symptoms, previous medical interventions, and social determinants of health.
  • External Validation: The absence of external validation is a significant limitation. Our future research agenda includes conducting external validations and cross-institutional studies to enhance the model’s credibility and applicability across diverse healthcare systems. Collaborations with multiple institutions to access varied datasets will allow the evaluation of the model across different populations and clinical conditions.
  • Our model is designed as a decision-support tool to assist healthcare professionals, not to replace them. The final decision regarding patient disposition should always be made by a qualified medical practitioner who can take into account the complete clinical picture. Before full-scale deployment, our model will undergo extensive pilot testing in a controlled clinical environment. This will help identify any potential issues with false negatives and allow for adjustments based on real-world feedback.
Besides these general problems, the most challenging and concrete step is to engage healthcare professionals by presenting those (or similar) results to them and eliciting a positive comment or counterargument; this could be the basis of a long and fruitful applied research agenda, as well as a means to train medical experts in how to deal with the application of ML techniques in several aspects of their work—after all, reviewing one’s own work vis à vis the recommendations from an ML-based system could be fruitful even on personal and professional basis. Despite these challenges, the potential of AutoML in medicine is immense and researchers are actively working on these issues.

5. Conclusions

This study successfully developed a predictive model for hospital admissions based on triage data, designed to integrate into a decision-support system to aid medical staff in emergency departments. The model provides early indications of hospital admissions, which is critical in high-workload environments or when medical staff availability is limited, helping to mitigate the risk of human errors in urgent decision-making processes.
Our work contributes to a refined, clinically integrated AutoML application that addresses specific challenges in emergency department settings, pushing the boundary of how automated systems can support urgent medical decision-making. By expanding the feature set and employing more sophisticated modeling approaches, we aim to enhance the model’s utility and provide deeper insights into factors driving hospital admissions. This expansion will not only improve accuracy and applicability but also ensure the model serves as an effective tool in diverse clinical scenarios, ultimately aiding in better resource allocation and patient care strategies. The adaptation of AutoML to the specific idiosyncrasies of our target population is indeed a key strength of our approach. This customization allows the model to effectively address the unique characteristics and requirements of the population we are studying. However, we acknowledge that this specificity could restrict the generalizability and applicability of the model to different populations or contexts.
Looking forward, we plan to expand our model’s capabilities and test its application in other high-stakes medical areas, potentially setting a new standard for the deployment of AutoML in healthcare. Additionally, exploring partnerships with international healthcare providers will enhance the diversity of our validation efforts and ensure our model’s global applicability. This step is crucial for developing a truly universal tool that can perform reliably in a broad range of emergency department settings worldwide.
In conclusion, while our current study lays a solid foundation for the use of AutoML in predicting hospital admissions from triage data, the full potential of this work will be realized through rigorous external validation. These enhancements and considerations could significantly improve the utility and reliability of such predictive models, making them more robust and broadly applicable to a variety of clinical conditions and settings. Further research and rigorous validation are recommended to enhance the generalizability and robustness of the predictive model, ensuring its effective implementation in improving patient outcomes and the overall efficiency of healthcare systems.

Author Contributions

Conceptualization, G.F., A.S., A.A., V.K. and V.S.V.; methodology, G.F.; software, G.F., I.K., A.A. and D.K. (Dimitrios Karapiperis); validation, C.K., V.K. and V.K.; formal analysis, G.F., A.S., A.A., R.T., I.K., C.K. and D.K. (Dimitris Kalles); investigation, I.K.; resources, G.F. and I.K.; data curation, I.K.; writing—original draft preparation, G.F., A.S., A.A., R.T., I.K., C.K. and D.K. (Dimitrios Karapiperis); writing—review and editing, D.K. (Dimitris Kalles), V.K. and V.S.V.; visualization, G.F.; supervision, D.K. (Dimitris Kalles), V.K. and V.S.V.; project administration, G.F. and V.S.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are derived from public domain resources.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Atkinson, P.; McGeorge, K.; Innes, G. Saving emergency medicine: Is less more? Can. J. Emerg. Med. 2022, 24, 9–11. [Google Scholar] [CrossRef] [PubMed]
  2. Morley, C.; Unwin, M.; Peterson, G.M.; Stankovich, J.; Kinsman, L. Emergency department crowding: A systematic review of causes, consequences and solutions. PLoS ONE 2018, 13, e0203316. [Google Scholar] [CrossRef] [PubMed]
  3. Sun, B.C.; Hsia, R.Y.; Weiss, R.E.; Zingmond, D.; Liang, L.J.; Han, W.; McCreath, H.; Asch, S.M. Effect of emergency department crowding on outcomes of admitted patients. Ann. Emerg. Med. 2013, 61, 605–611.e6. [Google Scholar] [CrossRef]
  4. Rosenbaum, L. Facing COVID-19 in Italy—Ethics, Logistics, and Therapeutics on the Epidemic’s Front Line. N. Engl. J. Med. 2020, 382, 1873–1875. [Google Scholar] [CrossRef] [PubMed]
  5. Dubey, R.; Gunasekaran, A.; Childe, S.J.; Blome, C.; Papadopoulos, T. Big Data and Predictive Analytics and Manufacturing Performance: Integrating Institutional Theory, Resource-Based View and Big Data Culture. Brit. J. Manag. 2019, 30, 341–361. [Google Scholar] [CrossRef]
  6. Abdesslem, F.B.; Parris, I.; Henderson, T. Reliable online social network data collection. In Computational Social Networks—Mining and Visualization; Abraham, A., Ed.; Springer: London, UK, 2012; Chapter 8; pp. 183–210. [Google Scholar]
  7. Khan, N.; Naim, A.; Hussain, M.R.; Naveed, Q.N.; Ahmad, N.; Qamar, S. The 51 v‘s of big data: Survey, technologies, characteristics, opportunities, issues and challenges. In Proceedings of the International Conference on Omni-layer Intelligent Systems, Crete, Greece, 5–7 May 2019; pp. 19–24. [Google Scholar]
  8. Dash, S.; Shakyawar, S.K.; Sharma, M.; Kaushik, S. Big data in healthcare: Management, analysis and future prospects. J. Big Data 2019, 6, 54. [Google Scholar] [CrossRef]
  9. Haas, O.; Maier, A.; Rothgang, E. Rule-Based Models for Risk Estimation and Analysis of In-hospital Mortality in Emergency and Critical Care. Front. Med. 2021, 8, 785711. [Google Scholar] [CrossRef] [PubMed]
  10. Zhao, L.; Yang, J.; Zhou, C.; Wang, Y.; Liu, T. A novel prognostic model for predicting the mortality risk of patients with sepsis-related acute respiratory failure: A cohort study using the MIMIC-IV database. Curr. Med. Res. Opin. 2022, 38, 629–636. [Google Scholar] [CrossRef] [PubMed]
  11. Xie, F.; Ong, M.E.H.; Liew, J.N.M.H.; Tan, K.B.K.; Ho, A.F.W.; Nadarajan, G.D.; Low, L.L.; Kwan, Y.H.; Goldstein, B.A.; Matchar, D.B.; et al. Development and Assessment of an Interpretable Machine Learning Triage Tool for Estimating Mortality after Emergency Admissions. JAMA Netw. Open 2021, 4, e2118467. [Google Scholar] [CrossRef]
  12. Silahtaroğlu, G.; Yılmaztürk, N. Data analysis in health and big data: A machine learning medical diagnosis model based on patients’ complaints. Commun. Stat.—Theory Methods 2021, 50, 1547–1556. [Google Scholar] [CrossRef]
  13. Parker, C.A.; Liu, N.; Wu, S.X.; Shen, Y.; Lam, S.S.W.; Ong, M.E.H. Predicting hospital admission at the emergency department triage: A novel prediction model. Am. J. Emerg. Med. 2019, 37, 1498–1504. [Google Scholar] [CrossRef]
  14. Araz, O.M.; Olson, D.; Ramirez-Nafarrate, A. Predictive analytics for hospital admissions from the emergency department using triage information. Int. J. Prod. Econ. 2019, 208, 199–207. [Google Scholar] [CrossRef]
  15. Bekker, R.; Uit Het Broek, M.; Koole, G. Modeling COVID-19 hospital admissions and occupancy in the Netherlands. Eur. J. Oper. Res. 2023, 304, 207–218. [Google Scholar] [CrossRef]
  16. Goto, T.; Camargo, C.A., Jr.; Faridi, M.K.; Freishtat, R.J.; Hasegawa, K. Machine Learning-Based Prediction of Clinical Outcomes for Children During Emergency Department Triage. JAMA Netw. Open 2019, 2, e186937. [Google Scholar] [CrossRef]
  17. Feretzakis, G.; Karlis, G.; Loupelis, E.; Kalles, D.; Chatzikyriakou, R.; Trakas, N.; Karakou, E.; Sakagianni, A.; Tzelves, L.; Petropoulou, S.; et al. Using Machine Learning Techniques to Predict Hospital Admission at the Emergency Department. J. Crit. Care Med. (Targu Mures) 2022, 8, 107–116. [Google Scholar] [CrossRef]
  18. Feretzakis, G.; Sakagianni, A.; Kalles, D.; Loupelis, E.; Panteris, V.; Tzelves, L.; Chatzikyriakou, R.; Trakas, N.; Kolokytha, S.; Batiani, P.; et al. Using Machine Learning for Predicting the Hospitalization of Emergency Department Patients. Stud. Health Technol. Inf. 2022, 295, 405–408. [Google Scholar] [CrossRef]
  19. Feretzakis, G.; Sakagianni, A.; Kalles, D.; Loupelis, E.; Tzelves, L.; Panteris, V.; Chatzikyriakou, R.; Trakas, N.; Kolokytha, S.; Batiani, P.; et al. Exploratory Clustering for Emergency Department Patients. Stud. Health Technol. Inf. 2022, 295, 503–506. [Google Scholar] [CrossRef]
  20. Feretzakis, G.; Sakagianni, A.; Loupelis, E.; Kalles, D.; Panteris, V.; Tzelves, L.; Chatzikyriakou, R.; Trakas, N.; Kolokytha, S.; Batiani, P.; et al. Prediction of Hospitalization Using Machine Learning for Emergency Department Patients. Stud. Health Technol. Inf. 2022, 294, 145–146. [Google Scholar] [CrossRef]
  21. Mahmoudi, E.; Kamdar, N.; Kim, N.; Gonzales, G.; Singh, K.; Waljee, A.K. Use of electronic medical records in development and validation of risk prediction models of hospital readmission: Systematic review. BMJ 2020, 369, m958. [Google Scholar] [CrossRef]
  22. Davazdahemami, B.; Peng, P.; Delen, D. A deep learning approach for predicting early bounce-backs to the emergency departments. Healthc. Anal. 2022, 2, 100018. [Google Scholar] [CrossRef]
  23. Tsoni, R.; Kaldis, V.; Kapogianni, I.; Sakagianni, A.; Feretzakis, G.; Verykios, V.S. A Machine Learning Pipeline Using KNIME to Predict Hospital Admission in the MIMIC-IV Database. In Proceedings of the 14th International Conference on Information, Intelligence, Systems & Applications (IISA), Volos, Greece, 10–12 July 2023. [Google Scholar] [CrossRef]
  24. Johnson, A.; Bulgarelli, L.; Pollard, T.; Celi, L.A.; Mark, R.; Horng, S. MIMIC-IV-ED (Version 2.2). PhysioNet. 2023. Available online: https://physionet.org/content/mimic-iv-ed/2.2/ (accessed on 29 April 2023). [CrossRef]
  25. Godinez, W.; Hossain, I.; Lazarescu, M.; Bennett, K. AutoML in the Medical Domain: A Complex and Challenging Environment. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020. [Google Scholar]
  26. Trister, A.D.; Buist, D.S.M.; Lee, C.I. Will Machine Learning Tip the Balance in Breast Cancer Screening? JAMA Oncol. 2017, 3, 1463–1464. [Google Scholar] [CrossRef]
  27. Vayena, E.; Blasimme, A.; Cohen, I.G. Machine learning in medicine: Addressing ethical challenges. PLoS Med. 2018, 15, e1002689. [Google Scholar] [CrossRef]
  28. TechTarget. Automated Machine Learning (AutoML). SearchEnterpriseAI, TechTarget. Available online: https://www.techtarget.com/searchenterpriseai/definition/automated-machine-learning-AutoML (accessed on 28 June 2024).
  29. H2O.ai. H2O’s AutoML: Automatic Machine Learning in Python [Software]. 2023. Available online: https://www.h2o.ai/ (accessed on 30 May 2024).
  30. Kelleher, J.D.; Mac Namee, B.; D’Arcy, A. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies; The MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
  31. Davenport, T.; Kalakota, R. The potential for artificial intelligence in healthcare. Future Healthc. J. 2019, 6, 94–98. [Google Scholar] [CrossRef]
  32. LeDell, E.; Poirier, S. H2o automl: Scalable automatic machine learning. In Proceedings of the AutoML Workshop at ICML, Vienna, Austria, 17–18 July 2020; Volume 2020. [Google Scholar]
  33. Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; Müller, H. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1312. [Google Scholar] [CrossRef]
  34. R Core Team. h2o: R Interface for the H‘2O’ Scalable Machine Learning Platform. R Package Version 3.32.0.1. 28 April 2023. Available online: https://github.com/h2oai/h2o-3 (accessed on 30 May 2024).
  35. Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
  36. Chen, T.Q.; Guestrin, C. Xgboost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  37. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd Conference on Neural Information Processing Systems NeurIPS 2018, Montréal, QC, Canada, 3–8 December 2018; Available online: https://github.com/catboost/catboost (accessed on 30 May 2024).
  38. Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
  39. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  40. Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef]
Figure 1. Receiver Operating Characteristic (ROC) Curve for the Gradient Boosting Machine (GBM) model. The ROC curve illustrates the performance and predictive accuracy of a Gradient Boosting Machine (GBM) model. The curve plots the true positive rate (sensitivity) against the false positive rate (1-specificity) at various classification thresholds. The closer the ROC curve is to the top-left corner, the better the model’s discriminatory power. The area under the ROC curve (AUC) quantifies the model’s overall performance, with a higher AUC indicating better predictive accuracy. The ROC curve for the GBM model visually represents its trade-off between sensitivity and specificity, helping to determine the optimal classification threshold for decision-making.
Figure 1. Receiver Operating Characteristic (ROC) Curve for the Gradient Boosting Machine (GBM) model. The ROC curve illustrates the performance and predictive accuracy of a Gradient Boosting Machine (GBM) model. The curve plots the true positive rate (sensitivity) against the false positive rate (1-specificity) at various classification thresholds. The closer the ROC curve is to the top-left corner, the better the model’s discriminatory power. The area under the ROC curve (AUC) quantifies the model’s overall performance, with a higher AUC indicating better predictive accuracy. The ROC curve for the GBM model visually represents its trade-off between sensitivity and specificity, helping to determine the optimal classification threshold for decision-making.
Applsci 14 06623 g001
Figure 2. Counts of actual vs predicted classes. The bar plot illustrates the counts of actual and predicted classes in the classification model.
Figure 2. Counts of actual vs predicted classes. The bar plot illustrates the counts of actual and predicted classes in the classification model.
Applsci 14 06623 g002
Figure 3. Variable importance analysis of GBM model.
Figure 3. Variable importance analysis of GBM model.
Applsci 14 06623 g003
Figure 4. Correlation heatmap of the features included in the model demonstrating the correlation amongst the features.
Figure 4. Correlation heatmap of the features included in the model demonstrating the correlation amongst the features.
Applsci 14 06623 g004
Table 1. The predictive attributes *.
Table 1. The predictive attributes *.
AttributesTypeValues/RangeMean (SD)
Age (years)Number (integer)18–10356.53 (19.47)
GenderStringFemale, MaleN/A
RaceStringAmerican Indian/Alaska Native, Asian 1, Black 1, Hispanic 1, Multiple race/ethnicities, Native Hawaiian or other Pacific Islander, Other, Patient decline to answer, Portuguese, South American, Unable to obtain, Unknown, White 1N/A
Arrival_transportStringWalk-in, ambulance, helicopter, unknown, otherN/A
Temperature (°F)Number (double)56–111.498.1 (1.00)
Heart rate (beats per minute)Number (double)13–25084.75 (17.47)
Resp.rate (breaths per minute)Number (integer)1–9717.6 (2.22)
O2 saturationNumber (double)8–10098.36 (2.00)
Systolic Blood pressure (SBP)Number (integer)5–299136.28 (22.84)
Diastolic Blood Pressure (DBP)Number (integer)0–29777.33 (15.02)
Pain (Numeric Pain Scale)Number (double)0–104.17 (3.77)
AcuityNumber (integer)0–52.63 (0.64)
DrugsNumber (integer)1–509.56 (7.69)
DispositionStringHome, AdmittedN/A
HoursNumber (double)0.33–72.97.44 (6.00)
* Data adapted with permission from Tsoni R. et al. (2023), IISA [23]. 1 Category contains sub-categories.
Table 2. Model Metrics Binomial: GBM **.
Table 2. Model Metrics Binomial: GBM **.
AUCAUCPRMSERMSELOGLOSSMEAN PER-CLASSERROR
0.82560.87220.16770.40950.50050.2939
** Reported on test data. Mean Square Error (MSE), Root Mean Square Error (RMSE), Area Under the Curve (AUC), Area Under the Precision–Recall Curve (AUCPR).
Table 3. Confusion matrix (actual/predicted).
Table 3. Confusion matrix (actual/predicted).
Actual valuesPredicted Values
AdmittedHomeErrorRate *
Admitted12,241 (TP)10,833 (FN)0.469510,833.0/23,074.0
Home3797 (FP)28,295 (TN)0.11833797.0/32,092.0
Total16,03839,1280.265214,630.0/55,166.0
True Positives (TP), False Negatives (FN), True Negatives (TN), False Positives (FP). * The “Rate” refers to the error rate for each class. It represents the proportion of incorrect predictions relative to the total number of instances in that class.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feretzakis, G.; Sakagianni, A.; Anastasiou, A.; Kapogianni, I.; Tsoni, R.; Koufopoulou, C.; Karapiperis, D.; Kaldis, V.; Kalles, D.; Verykios, V.S. Machine Learning in Medical Triage: A Predictive Model for Emergency Department Disposition. Appl. Sci. 2024, 14, 6623. https://doi.org/10.3390/app14156623

AMA Style

Feretzakis G, Sakagianni A, Anastasiou A, Kapogianni I, Tsoni R, Koufopoulou C, Karapiperis D, Kaldis V, Kalles D, Verykios VS. Machine Learning in Medical Triage: A Predictive Model for Emergency Department Disposition. Applied Sciences. 2024; 14(15):6623. https://doi.org/10.3390/app14156623

Chicago/Turabian Style

Feretzakis, Georgios, Aikaterini Sakagianni, Athanasios Anastasiou, Ioanna Kapogianni, Rozita Tsoni, Christina Koufopoulou, Dimitrios Karapiperis, Vasileios Kaldis, Dimitris Kalles, and Vassilios S. Verykios. 2024. "Machine Learning in Medical Triage: A Predictive Model for Emergency Department Disposition" Applied Sciences 14, no. 15: 6623. https://doi.org/10.3390/app14156623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop