Next Article in Journal
New Approaches to the Management of Cardiovascular Risk Associated with Sleep Respiratory Disorders in Pediatric Patients
Next Article in Special Issue
Long-Term Outcomes of COVID-19 in Hospitalized Type 2 Diabetes Mellitus Patients
Previous Article in Journal
Epidemiology, Risk Factors, and Biomarkers of Post-Traumatic Epilepsy: A Comprehensive Overview
Previous Article in Special Issue
HB-EGF Plasmatic Level Contributes to the Development of Early Risk Prediction Nomogram for Severe COVID-19 Cases
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Approaching Personalized Medicine: The Use of Machine Learning to Determine Predictors of Mortality in a Population with SARS-CoV-2 Infection

1
Autoimmunity and Inflammation Research Group, Río Hortega University Hospital, 47012 Valladolid, Spain
2
Cooperative Research Network Focused on Health Results—Advanced Therapies (RICORS TERAV), 28220 Madrid, Spain
3
Internal Medicine, Río Hortega University Hospital, 47012 Valladolid, Spain
4
Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16071 Cuenca, Spain
5
Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
*
Author to whom correspondence should be addressed.
Biomedicines 2024, 12(2), 409; https://doi.org/10.3390/biomedicines12020409
Submission received: 28 December 2023 / Revised: 5 February 2024 / Accepted: 7 February 2024 / Published: 9 February 2024

Abstract

:
The COVID-19 pandemic demonstrated the need to develop strategies to control a new viral infection. However, the different characteristics of the health system and population of each country and hospital would require the implementation of self-systems adapted to their characteristics. The objective of this work was to determine predictors that should identify the most severe patients with COVID-19 infection. Given the poor situation of the hospitals in the first wave, the analysis of the data from that period with an accurate and fast technique can be an important contribution. In this regard, machine learning is able to objectively analyze data in hourly sets and is used in many fields. This study included 291 patients admitted to a hospital in Spain during the first three months of the pandemic. After screening seventy-one features with machine learning methods, the variables with the greatest influence on predicting mortality in this population were lymphocyte count, urea, FiO2, potassium, and serum pH. The XGB method achieved the highest accuracy, with a precision of >95%. Our study shows that the machine learning-based system can identify patterns and, thus, create a tool to help hospitals classify patients according to their severity of illness in order to optimize admission.

1. Introduction

Since the first cases were reported in 31 December 2019, the COVID-19 pandemic has accumulated a total of 770,875,433 confirmed cases and 6,959,316 deaths [1]. The uncontrolled spread of the virus caused by overpopulation, globalization, hyperconnectivity, and the centralization of supply chains [2] triggered a collapse of health services and resources that forced countries to take severe social actions such as isolation or lockdowns, causing serious social and economic consequences [3,4]. In Spain, health centers in the most affected areas faced problems such as inadequate intensive care capacity, insufficient equipment (both for patients and health workers), lack of medical staff, or the delay or collapse of COVID-19 helplines, which led to the cancellation of non-urgent surgeries and the need to use private health services and military facilities for public purposes [5]. In the region of Castilla y León (Spain), 585 deaths due to COVID-19 were recorded in March 2020, the first month of the pandemic in Spain, although this figure probably underestimates the actual number of deaths due to the disease during this period. In addition, the total mortality in the region increased by 775 cases in that month compared to the previous month [6].
Several studies have shown that the disease is caused by severe inflammation, a cytokine storm, and dysregulation in the levels of immune cells, especially lymphocytes. It has also been observed that some biochemical parameters involved in the disease are related to kidney and/or liver damage, such as serum urea or bilirubin levels [7]. It is also known that comorbidities related to previous diseases generally worsen the prognosis of a patient. Despite all this, the pathophysiology of the disease is not fully understood as it is a multifactorial pathology [8,9], and we do not yet understand which specific processes of the innate and adaptive immune system are decompensated or how they specifically worsen viral impact [10].
Despite the progress of scientific research, the social and hygienic policies adopted, better knowledge of the virus, the use of microbiological and serological tests such as PCR or antigen tests, and the implementation of different treatments, including the development and widespread use of vaccines, health systems still receive a significant number of infected patients with severe clinical manifestations for various reasons, including lack of vaccination, advanced age, immunosuppression, and pre-existing pathologies. On the other hand, the World Health Organization (WHO) has announced that another pandemic may occur in the near future and that the various health systems should be prepared for this event. It is therefore essential to develop techniques that allow for the rapid identification of patients at a higher risk, according to the characteristics of a given population, in order to provide a more appropriate service and improve the efficiency of health system in terms of logistics, materials, and services.
In some healthcare systems, the clinical and sociodemographic data of all patients belonging to a healthcare center are now stored in an electronic medical record system, which makes it possible to easily collect specific data on a population limited to a hospital or health area. Healthcare facilities can now choose to use these datasets to perform complex analyses using high-performance computing technologies such as machine learning (ML) methods, which are nowadays applied to an increasing number of research areas, including the healthcare field [11,12,13]. ML is defined as a branch of artificial intelligence (AI) and computer science that focuses on using data and algorithms to emulate the way in which humans learn while gradually improving their accuracy [14]. One of the advantages of this tool over other traditional statistical methods is its ability to provide accurate predictions with a high level of scalability and adaptability, finding relationships between variables using large datasets. That is why its characteristics allow ML models to be applied in areas such as diagnosis [15,16], prognosis prediction [17], drug discovery, or personalized treatments [18].
Several studies have already used this technology to answer numerous questions related to the COVID-19 pandemic, many of them aimed at identifying the factors that might be more influential in predicting a patient’s outcome at the time of admission. However, there are clear differences. Some of them agree that advanced age is a high-risk factor for mortality [19,20,21]. Banoei et al. determined that saturation level and loss of consciousness were also risk factors [22]; Kumaran et al. indicated that it was respiratory difficulties [19]; Nieto-Codesido et al. concluded that acute phase reactants had a high level of importance in poor prognoses [20]; and Izquierdo et al. reported that fever was the second most important predictor after age [23]. These studies, like others, may differ in various aspects such as the ML model used, the time interval analyzed, and/or the variables studied. What is never the same, however, are the characteristics of the population, which are completely different in each study, simply because they are carried out in different places. This shows that there are no global conclusions that can be used to establish protocols in health centers.
In the context of a new and unknown public health problem in its early stages, such as this pandemic, the main objective of this study was to demonstrate the effectiveness of the extreme gradient boost (XGB) as an ML model for studying predictors of mortality in a given population. By confirming its reliability, the method could be established as a system for evaluating risk predictors in any other given population (hospital center, health area, city, etc.), and conclusions could be drawn from data from a small number of patients (150–300) in a short time interval. This would allow for the rapid design of specific protocols for each center, based on the specific needs and characteristics of each population, making it possible to predict the care required (admission to the ward, ICU, mechanical ventilation, etc.), improve personalized patient care, and optimize resources and services.

2. Materials and Method

2.1. Data Source and Description

The source of the clinical data used in this study is the electronic medical record system of the Río Hortega University Hospital in Valladolid (Spain). Data from 291 patients hospitalized with PCR-confirmed COVID-19 infection between the 2 February 2020 and the 23 April 2020 were obtained from this platform. All the information used in this study corresponded to the first 24 h of hospitalization of these patients. Each patient was numbered with an anonymized code to preserve their privacy, and all the patients gave their informed consent. This study was conducted according to the principles of Helsinki and was approved by the Ethics Committee of the Río Hortega University Hospital.
Data were collected, retrospectively reviewed, and entered manually into a predesigned database. These data included demographics, comorbidities, chronic treatments, symptoms on admission, laboratory data, need for ICU admission, and date of death. The laboratory tests whose results were used in this study were performed at the same hospital center, and the data were entered into the aforementioned hospital electronic data storage system prior to their use in this study.
The laboratory parameters considered relevant to this study were the following: leukocytes, neutrophils, lymphocytes, monocytes, eosinophils, basophils, erythrocytes, hemoglobin, hematocrit, M.C.V., platelets, D-dimer, prothrombin activity (PT), ratio (TP), I.N.R., aPTT, aPTT ratio, derived fibrinogen, sodium, potassium, chloride, glucose, urea, creatinine, estimated glomerular filtration rate (CKD-EPI 2009), alanine aminotransferase (ALT/GPT), aspartate aminotransferase (AST/GOT), gamma glutamyl transferase (GGT), total bilirubin, alkaline phosphatase, lactate dehydrogenase (LDH), phosphate, C-reactive protein, procalcitonin, pH, FIO2, pO2/FIO2, and lactate.

2.2. Machine Learning Methods

In this study, the XGB method served as the reference approach. Additionally, a comparison was conducted with other ML systems. XGB is a versatile, efficient, and portable supervised learning algorithm. Its primary advantages include high execution speed, scalability, and support for parallel computing and its tendency to consistently outperform other algorithms in terms of accuracy for a wide range of data science problems [21,24,25]. For these reasons, XGB was employed in the present study to classify severe COVID-19 patients and predict variables associated with increased mortality.
In this research, various other ML algorithms were implemented to assess the performance of the proposed method. All of these algorithms are widely recognized in the scientific community. The top-performing five were selected for the comparison, including decision tree (DT) [26], Gaussian Naive Bayes (GNB) [27], Bayesian linear discriminant analysis (BLDA) [28], k-nearest neighbors (KNN) [29], and support vector machine (SVM) [30].
A brief summary of the characteristics of the implemented machine learning methods is shown below.
DTs constitute a predictive model organized in tree structures, incorporating decision rules and outcomes. The tree comprises nodes, encompassing the root, internal nodes, and leaf nodes. The depth of the tree influences the model’s generalization, and pruning techniques are employed to avoid overfitting. The construction process involves iteratively selecting features to partition the data, with the objective of maximizing homogeneity [14,26].
Gaussian Naive Bayes (GNB) is a variant that assumes a Gaussian distribution for input features. Commonly employed in classification tasks, GNB necessitates a training dataset with class-labeled examples. Parameters for the Gaussian distribution are computed for each class, and classification is performed using Bayes’ rule, which offers a probabilistic estimation [14,27].
BLDA expands upon linear discriminant analysis (LDA) by incorporating additional probabilistic assumptions. It presupposes a multivariate normal distribution within each class and applies Bayesian methodologies. BLDA proves particularly valuable when classes display distinct distributions or varying variances [14,28].
KNN is a supervised learning algorithm used for classification, relying on the majority of labels from k-nearest neighbors. It depends on a labeled training dataset, employing a selected distance metric and a specified k value. The classification process entails voting among the k neighbors to determine the label for a new point [29].
SVM is a supervised learning algorithm specifically crafted for classification purposes. It endeavors to find an optimal hyperplane in a higher-dimensional space, aiming to maximize the margin between different classes. SVM is adept at handling non-linear data through the use of the kernel trick, which involves transforming the data into a more manageable space [14,30].
The models were designed using the MatLab Statistical and Machine Learning Toolbox (MatLab 2023a; The MathWorks, Natick, MA, USA). The database was divided into two segments: 70% for training and the remaining 30% for testing, with no overlap in patient data. To validate the results and prevent overfitting, 5-fold cross-validation was performed. The phases employed in this study are described in Figure 1. As depicted, the subjects of the study were initially selected. Following the implementation of the database, the ML methods were trained and validated.
Machine learning techniques typically involve one or more hyperparameters that allow for fine-tuning the algorithm during the training process. The diverse values assigned to these hyperparameters, such as the number of splits, learners, neighbors, distance metric, distant weight, kernel, box constraint level, multiclass method, etc., result in algorithms with varying prediction performances to achieve optimal results. To optimize these hyperparameters for each machine learning technique employed in this study, a Bayesian optimization approach was applied. Bayesian optimization aims to determine the hyperparameter configuration that maximizes the algorithm’s performance based on previous attempts, operating under the assumption that there is a correlation between the various hyperparameters and the algorithm’s achieved performance. The area under the AUC (receiver operating characteristic curve) and balanced accuracy served as the performance metrics to be maximized. Due to the stochastic nature of machine starting and machine learning in all simulations, 100 repetitions were conducted to calculate the mean and standard deviation values for the diverse performance metrics [14]. To mitigate the impact of data noise, ensure precise AUC calculations, and attain statistically meaningful results, the experiments were systematically replicated using a random uniform approach.
The key hyperparameters for the implemented systems are outlined below. For the SVM method, a Gaussian kernel function is utilized with the following parameters: C = 1, sigma = 0.5, numerical tolerance = 0.001, and iteration limit = 100. In the case of the DT system, adjustments are made to the base parameter estimator, with parameters set as follows: tree, maximum number of splits = 20, learning rate = 0.1, and number of learners = 40. For the GNB algorithm, the settings include usekernel: False, fL = 0, and Adjust = 0. The BLDA algorithm employs a Bayesian kernel. In the KNN method, the chosen distance metric is Euclidean, and it involves 20 neighbors. Lastly, for the XGB system, specific hyperparameters have been tuned: eta = 0.2, minimum child weight = 1, gamma = 0.3, alpha = 0.5, maximum depth = 9, lambda = 0.3, col sample by tree = 0.5, and maximum delta step = 5.
The preference for the proposed XGB over other alternative machine learning algorithms is based on its notable advantages, positioning it as a superior choice in terms of robustness, accuracy, and versatility.
Compared to SVM, XGB showcases a distinctive ability to handle intricate and high-dimensional datasets while maintaining computational efficiency. Its ensemble approach inherently introduces diversity, reducing the risk of overfitting and producing more generalized and predictive models, particularly in situations with heightened problem complexity.
In contrast to GNB, XGB excels in effectively managing irrelevant or noisy features. The integration of multiple independent decision trees allows the model to dismiss less informative variables, significantly improving its robustness and predictive efficacy.
Unlike KNN, which may be sensitive to noisy data, XGB demonstrates inherent resilience to dataset noise and variability. By constructing models based on multiple trees, the impact of outliers or errors is mitigated, ensuring greater reliability in decision making.
To sum up, the preference for XGB is substantiated by its ability to generate robust and accurate predictive models, particularly in complex environments and large datasets. Its resistance to overfitting, capability to handle irrelevant features, and versatility relative to other algorithms make it a favored choice, ensuring more dependable results and enhancing the model’s generalization capabilities.

2.3. Performance Evaluation

In this work, the different methods were compared with the following metrics: specificity, precision (also known as positive predictive value), recall (also known as sensitivity), balanced accuracy, degenerate Youden index (DYI), F1-score, Matthew’s correlation coefficient (MCC), Cohen’s Kappa index (CKI), receiver operating characteristic (ROC), and area under the curve (AUC) [14]. The F1 score is described as follows:
F 1 s c o r e = 2 P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l
MCC was also used to test the performance of the ML methods, defined as follows:
M C C = T P T N F P F N T P + F P T P + F N T N + F P T N + F N
where FP represents the number of false positives, TP shows the number of true positives, TN is the number of true negatives, and FN corresponds to the number of false negatives. DYI was used to estimate the overall performance of the system [14].

3. Results

A cohort of 291 patients was studied. In this cohort there were 156 men and 135 women with a median age of 67 years. There were 60 patients, 35 males and 25 females, who died during hospitalization and are counted as deceased. A more detailed description is given in Table 1.
At the time of data collection, the patients had the following chronic comorbidities: diabetes mellitus (56 patients, 19%); hypertension (128 patients, 44%); dyslipidemia (104 patients, 36%); asthma (26 patients, 9%); chronic lung disease (25 patients, 9%); congestive heart failure (77 patients, 26%); overweight/obesity (28 patients, 10%); active tumors (18 patients, 6%); and other diseases (8 patients, 3%). As a result of their comorbidities, 10 patients (3%) were receiving chronic immunosuppressive treatment; 1 patient (<1%) was receiving chronic biologic treatment; and 198 patients (68%) were receiving other relevant chronic treatments.
At the time of hospitalization, the following clinical symptoms were found: cough (181 patients, 62%); sputum (35 patients, 12%); dyspnea (157 patients, 54%); anosmia (14 patients, 5%); ageusia (20 patients, 7%); nausea (22 patients, 8%); vomiting (19 patients, 7%); diarrhea (51 patients, 18%); asthenia (104 patients, 36%); dizziness (9 patients, 3%); and myalgia (39 patients, 13%). During hospitalization, 65 patients (22%) required transfer to the intensive care unit (ICU).
Data on age, temperature, and laboratory parameters at the time of admission are shown in Table 1 and Table 2.
Different ML methods were used to identify mortality risk patterns in a population with confirmed COVID-19 infection. The objective of using several algorithms was to first use the one that gave the best results in terms of predictive capacity. The results obtained using different ML methods to identify mortality risk patterns in COVID-19 patients are presented below. Table 3 and Table 4 show the performance metrics (balanced accuracy, recall, specificity, precision, MCC, F1 score, kappa, AUC, and DYI) of the ML methods used.
As evidenced by the data, the method proposed, XGB, is confirmed as the one with the highest acquisition and recall value. The XGB algorithm also performs consistently and uniformly with a positive prediction value greater than 95%. Additionally, the ROC curve was calculated by portraying, for each threshold value, the sensitivity and specificity measures in order to compare the classification capacity of the different ML algorithms. Figure 2 shows the results obtained. Again, the proposed XGB-based system obtains a larger area, meaning that it allows for better accuracy in prediction.
In the present study, the model-training subsets present high scores for all the training subsets’ metrics, and, generally, they show slightly lower scores for the test subset. These similarities are due to the algorithm achieving an optimal level of training without over-fitting or under-fitting. As shown in the radar plots in Figure 3, the XGB model obtains a larger area than the other methods tested, and it is a suitable example of a well-balanced model with a high capability of generation, meaning that the algorithm gives a correct exit for each new entry.
As can be seen in Figure 4, according to the proposed XGB model, the most clinically relevant parameters contributing to the mortality of COVID-19 hospitalized patients, listed in descending order of relevance, are lymphocytes, urea, FiO2, potassium, serum pH, basophils, active tumors, total bilirubin, temperature, estimated glomerular filtrate (CKD-EPI 2009), alanine aminotransferase (ALT/GPT), dyspnea, and age. These are variables that can be easily obtained with a simple blood test at the time of hospital admission.

4. Discussion

Although the state of health emergency caused by the COVID-19 disease has been declared to be over [31], the virus has not disappeared, and preventive measures, both social and sanitary, are still essential to keep it at bay [32]. This possibility and the idea that another infection in the near future could reproduce a new pandemic scenario have led us to look for tools that allow the implementation of protocols as soon as possible [33] in a specific health institution. In this sense, and considering the high number of deaths caused by the COVID-19 pandemic, this study proposed a new ML model to identify clinically significant risk factors that could be measured on the first day of admission in patients hospitalized for COVID-19 during the first months of the pandemic with the aim of associating these early variables with a severe course of the disease.
The ML model based on extreme gradient boosting (XGB) was selected in our study because of its generalizability, low risk of overfitting, high interpretability [25], and high scalability [34]. XGB has been confirmed to be a reliable method for recognizing patterns in other diseases such as lupus erythematosus [16], traumatic brain injury-induced coagulopathy [35], epilepsy [36], diabetes [37], Alzheimer’s disease [38,39], HIV [40,41], or different types of cancer [42,43,44,45,46]. We, therefore, used the aforementioned ML technique to determine which factors were most predictive of disease severity in a closed group of patients hospitalized for COVID-19 during the first two months of the pandemic, a time when the population did not yet have herd immunity and had not yet been vaccinated.
The XGB model identified the patterns with the greatest weight for mortality risk in this population on the first day of hospitalization. In this context, some of them represented comorbidities such as active tumors, others clinical manifestations such as temperature and dyspnea, and, finally, analytical parameters. Among them, the first five analytical parameters were the most powerful. Serum lymphocyte count and urea level were the strongest predictors of mortality in hospitalized patients with COVID-19 in our study population. Furthermore, in a meta-analysis conducted by Tian et al. [47], which was consistent with our findings, they found that, in addition to urea and lymphocyte count, the levels of total bilirubin and ala-nine aminotransferase were also closely related to patient mortality, and they further suggested that they may indicate abnormal kidney and liver activity in patients who died. In our study, these data are reinforced by the fact that the estimated glomerular filtration rate is also among the most relevant parameters, as it has been shown in other studies to be a good predictor of admission mortality in COVID-19. In addition, our study showed a high relevance of FiO2, closely related to acute hypoxic respiratory failure-, and potassium and pH as predictors of a poor outcome and mortality, which is compatible with the results obtained by Satici et al. [48], Noori et al. [49], or Liu et al. [50]. Interestingly, hypokalemia has also been shown to be an indicator of a poorer prognosis and longer days of negative nucleic acid conversion. It is thought that this may be due to the fact that the virus interferes with ACE2, a key factor in the enzymatic cascades that maintain adequate levels of potassium in cells, causing the dysregulation of cellular physiological activity [51]. This disruption of cellular physiological systems also appears to cause the dysregulation of electrolyte homeostasis and pH imbalance [52]. Basophil count was another variable with significant weight in predicting mortality in our study. Basophils have been shown to be associated with chronic inflammatory diseases through the expression of Th17 and Th17/Th1. In the study by Murdaca et al., a significant reduction in basophils was observed during the first three days of hospitalization and returned to normal levels soon after [53].
On the other hand, in our population, active tumors were a relevant factor in relation to patient mortality, and several studies support this. Although it has been suggested that the mortality risk could be due to a suboptimal state of the immune system of these patients in relation to anticancer treatments, which would weaken the body’s response to SARS-CoV-2 [54], the findings by Lee et al. [55] and Desai et al. [56] agree that this tendency is not due to cancer-related treatments but to age, sex, and associated comorbidities. Among other clinical parameters, dyspnea was a relevant predictor in our study, consistent with few studies and in contrast to others, appearing as a parameter not closely related to mortality [57]. Finally, we found that age was a clear predictor of mortality risk, in contrast to other authors who consider that the relationship behind age being a predictor of mortality is also unclear due to the variability in the characteristics of each study [58,59].
All of the relevant parameters mentioned above are based on data that can be easily collected at the time of admission by a routine examination. Therefore, a model based on these indicators would allow for both more efficient triage and more personalized care for those patients who exhibit these risk predictors, thus improving both the care processes and the prognosis of patients infected with COVID-19.

5. Conclusions

The COVID-19 pandemic caused a global collapse of healthcare systems, prompting the need for new methods to identify the most at-risk patients in a timely manner. In this area, machine learning models are in the spotlight. Specifically, this study uses XGB-based modeling to identify predictors of high mortality risk in a group of patients hospitalized during the first months of the pandemic, when herd immunity was not established and vaccination had not yet begun. This analysis has allowed us to define relevant parameters that are highly useful as predictors of mortality risk at the time of hospital admission. This helps to improve patient care and treatment as well as the allocation of resources and the efficiency of health services. It has been demonstrated that the results of the proposed XGB method obtain high values of accuracy and efficiency, allowing the generation of a reliable diagnostic tool. This method could be implemented in each center, meaning that it could be used at the local level or even in each hospital to provide patients with care and attention appropriate to the demographic and environmental characteristics of each area.

Author Contributions

Conceptualization, M.Q. and J.B.; Methodology, M.Q., J.B., A.M.T. and J.M.; Software, A.M.T. and J.M.; Validation, M.Q., J.B. and J.M.; Formal analysis, M.Q., J.B., A.M.T. and J.M.; Investigation, M.Q., J.B., A.M.T. and J.M.; Writing—original draft, M.Q., J.B., A.M.T. and J.M.; Writing—review & editing, J.B., A.M.T. and J.M.; Visualization, M.Q., A.M.T. and J.M.; Supervision, J.M.; Project administration, J.M. All authors have read and agreed to the published version of the manuscript.

Funding

Research funded by the Institute of Technology (University of Castilla-La Mancha, Spain), the Río Hortega University Hospital (Valladolid, Spain), and the University of Valladolid [060/195041] (Spain).

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Río Hortega University Hospital.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets employed and analyzed in the current study are accessible upon reasonable request from the corresponding author. We do not have the patients’ permission to publish the data collected in this study in open access.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Situation by Region, Country, Territory & Area [Internet]. WHO Coronavirus COVID-19 Dashboard. Available online: https://covid19.who.int/table (accessed on 3 October 2023).
  2. Cheong, K.H.; Jones, M.C. Introducing the 21st Century’s New Four Horsemen of the Coronapocalypse. BioEssays 2020, 42, 2000063. [Google Scholar] [CrossRef]
  3. Both, L.M.; Zoratto, G.; Calegaro, V.C.; Ramos-Lima, L.F.; Negretto, B.L.; Hauck, S.; Freitas, L.H.M. COVID-19 Pandemic and Social Distancing: Economic, Psychological, Family, and Technological Effects. Trends Psychiatry Psychotherapy [Internet]. 2021. Available online: https://www.scielo.br/j/trends/a/cZNsN9kYFmd5ZNsgtk4dnYm/?lang=en (accessed on 3 October 2023).
  4. Nicola, M.; Alsafi, Z.; Sohrabi, C.; Kerwan, A.; Al-Jabir, A.; Iosifidis, C.; Agha, M.; Agha, R. The socio-economic implications of the coronavirus pandemic (COVID-19): A review. Int. J. Surg. 2020, 78, 185–193. [Google Scholar] [CrossRef]
  5. Legido-Quigley, H.; Mateos-García, J.T.; Campos, V.R.; Gea-Sánchez, M.; Muntaner, C.; McKee, M. The resilience of the Spanish health system against the COVID-19 pandemic. Lancet Public Health 2020, 5, e251–e252. [Google Scholar] [CrossRef] [PubMed]
  6. Ochoa Sangrador, C.; Garmendia Leiza, J.R.; Pérez Boillos, M.J.; Pastrana Ara, F.; Lorenzo Lobato, M.D.P.; Andrés de Llano, J.M. Impacto de la COVID-19 en la mortalidad de la comunidad autónoma de Castilla y León. Gac. Sanit. 2021, 35, 459–464. [Google Scholar] [CrossRef] [PubMed]
  7. Asha, K.S.; Singh, V.; Singi, Y.; Ranjan, R. The Association of Hematological and Biochemical Parameters with Mortality among COVID-19 Patients: A Retrospective Study from North India. Cureus [Internet]. 2022. Available online: https://www.cureus.com/articles/98268-the-association-of-hematological-and-biochemical-parameters-with-mortality-among-covid-19-patients-a-retrospective-study-from-north-india (accessed on 13 December 2023).
  8. Bukreieva, T.; Svitina, H.; Nikulina, V.; Vega, A.; Chybisov, O.; Shablii, I.; Ustymenko, A.; Nemtinov, P.; Lobyntseva, G.; Skrypkina, I.; et al. Treatment of Acute Respiratory Distress Syndrome Caused by COVID-19 with Human Umbilical Cord Mesenchymal Stem Cells. Int. J. Mol. Sci. 2023, 24, 4435. [Google Scholar] [CrossRef] [PubMed]
  9. Pius-Sadowska, E.; Kulig, P.; Niedźwiedź, A.; Baumert, B.; Łuczkowska, K.; Rogińska, D.; Sobuś, A.; Ulańczyk, Z.; Kawa, M.; Paczkowska, E.; et al. VEGFR and DPP-IV as Markers of Severe COVID-19 and Predictors of ICU Admission. Int. J. Mol. Sci. 2023, 24, 17003. [Google Scholar] [CrossRef] [PubMed]
  10. Lee, J.H.; Kanwar, B.; Khattak, A.; Balentine, J.; Nguyen, N.H.; Kast, R.E.; Lee, C.J.; Bourbeau, J.; Altschuler, E.L.; Sergi, C.M.; et al. COVID-19 Molecular Pathophysiology: Acetylation of Repurposing Drugs. Int. J. Mol. Sci. 2022, 23, 13260. [Google Scholar] [CrossRef] [PubMed]
  11. Martins, T.G.D.S.; Schor, P. Machine learning in image analysis in ophthalmology. Einstein São Paulo 2021, 19, eED6860. [Google Scholar] [CrossRef] [PubMed]
  12. Dos Santos, W.P.; Conti, V.; Gambino, O.; Naik, G.R. Editorial: Machine learning and applied neuroscience. Front. Neurorobot. 2023, 17, 1191045. [Google Scholar] [CrossRef] [PubMed]
  13. Sakamoto, T.; Goto, T.; Fujiogi, M.; Lefor, A.K. Machine learning in gastrointestinal surgery. Surg. Today 2022, 52, 995–1007. [Google Scholar] [CrossRef]
  14. Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques, 3rd ed.; Elsevier: Burlington, MA, USA, 2012. [Google Scholar]
  15. Suárez, M.; Martínez, R.; Torres, A.M.; Ramón, A.; Blasco, P.; Mateo, J. Personalized Risk Assessment of Hepatic Fibrosis after Cholecystectomy in Metabolic-Associated Steatotic Liver Disease: A Machine Learning Approach. J. Clin. Med. 2023, 12, 6489. [Google Scholar] [CrossRef] [PubMed]
  16. Usategui, I.; Barbado, J.; Torres, A.M.; Cascón, J.; Mateo, J. Machine learning, a new tool for the detection of immunodeficiency patterns in systemic lupus erythematosus. J. Investig. Med. 2023, 71, 742–752. [Google Scholar] [CrossRef]
  17. Yu, Y.; Liu, T.; Shao, L.; Li, X.; He, C.K.; Jamal, M.; Luo, Y.; Wang, Y.; Liu, Y.; Shang, Y.; et al. Novel biomarkers for the prediction of COVID-19 progression a retrospective, multi-center cohort study. Virulence 2020, 11, 1569–1581. [Google Scholar] [CrossRef] [PubMed]
  18. Rajula, H.S.R.; Verlato, G.; Manchia, M.; Antonucci, N.; Fanos, V. Comparison of Conventional Statistical Methods with Machine Learning in Medicine: Diagnosis, Drug Development, and Treatment. Medicina 2020, 56, 455. [Google Scholar] [CrossRef] [PubMed]
  19. Kumaran, M.; Pham, T.-M.; Wang, K.; Usman, H.; Norris, C.M.; MacDonald, J.; Oudit, G.Y.; Saini, V.; Sikdar, K.C. Predicting the Risk Factors Associated with Severe Outcomes among COVID-19 Patients–Decision Tree Modeling Approach. Front. Public Health 2022, 10, 838514. [Google Scholar] [CrossRef]
  20. Nieto-Codesido, I.; Calvo-Alvarez, U.; Diego, C.; Hammouri, Z.; Mallah, N.; Ginzo-Villamayor, M.J.; Salgado, F.J.; Carreira, J.M.; Rábade, C.; Barbeito, G.; et al. Risk Factors of Mortality in Hospitalized Patients with COVID-19 Applying a Machine Learning Algorithm. Open Respir. Arch. 2022, 4, 100162. [Google Scholar] [CrossRef] [PubMed]
  21. Styrzynski, F.; Zhakparov, D.; Schmid, M.; Roqueiro, D.; Lukasik, Z.; Solek, J.; Nowicki, J.; Dobrogowski, M.; Makowska, J.; Sokolowska, M.; et al. Machine Learning Successfully Detects Patients with COVID-19 Prior to PCR Results and Predicts Their Survival Based on Standard Laboratory Parameters in an Observational Study. Infect. Dis. Ther. 2023, 12, 111–129. [Google Scholar] [CrossRef]
  22. Banoei, M.M.; Rafiepoor, H.; Zendehdel, K.; Seyyedsalehi, M.S.; Nahvijou, A.; Allameh, F.; Amanpour, S. Unraveling complex relationships between COVID-19 risk factors using machine learning based models for predicting mortality of hospitalized patients and identification of high-risk group: A large retrospective study. Front. Med. 2023, 10, 1170331. [Google Scholar] [CrossRef]
  23. Izquierdo, J.L.; Ancochea, J.; Savana COVID-19 Research Group; Soriano, J.B. Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients with COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing. J. Med. Internet Res. 2020, 22, e21801. [Google Scholar] [CrossRef]
  24. Pal, M.; Parija, S.; Mohapatra, R.K.; Mishra, S.; Rabaan, A.A.; Al Mutair, A.; Alhumaid, S.; Al-Tawfiq, J.A.; Dhama, K. Symptom-Based COVID-19 Prognosis through AI-Based IoT: A Bioinformatics Approach. BioMed Res. Int. 2022, 2022, 3113119. [Google Scholar] [CrossRef]
  25. Montomoli, J.; Romeo, L.; Moccia, S.; Bernardini, M.; Migliorelli, L.; Berardini, D.; Donati, A.; Carsetti, A.; Bocci, M.G.; Garcia, P.D.W.; et al. Machine learning using the extreme gradient boosting (XGBoost) algorithm predicts 5-day delta of SOFA score at ICU admission in COVID-19 patients. J. Intensive Med. 2021, 1, 110–116. [Google Scholar] [CrossRef]
  26. Charbuty, B.; Abdulazeez, A. Classification Based on Decision Tree Algorithm for Machine Learning. J. Appl. Sci. Technol. Trends 2021, 2, 20–28. [Google Scholar] [CrossRef]
  27. Ontivero-Ortega, M.; Lage-Castellanos, A.; Valente, G.; Goebel, R.; Valdes-Sosa, M. Fast Gaussian Naïve Bayes for searchlight classification analysis. NeuroImage 2017, 163, 471–479. [Google Scholar] [CrossRef]
  28. Mu, J.; Dai, L.; Liu, J.-X.; Shang, J.; Xu, F.; Liu, X.; Yuan, S. Automatic detection for epileptic seizure using graph-regularized nonnegative matrix factorization and Bayesian linear discriminate analysis. Biocybern. Biomed. Eng. 2021, 41, 1258–1271. [Google Scholar] [CrossRef]
  29. Uddin, S.; Haque, I.; Lu, H.; Moni, M.A.; Gide, E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci. Rep. 2022, 12, 6256. [Google Scholar] [CrossRef]
  30. Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
  31. Torner, N. The end of COVID-19 public health emergency of international concern (PHEIC): And now what? Vacunas Engl. Ed. 2023, 24, 164–165. [Google Scholar] [CrossRef] [PubMed]
  32. Ali, I.; Alharbi, O.M.L. COVID-19: Disease, management, treatment, and social impact. Sci. Total Environ. 2020, 728, 138861. [Google Scholar] [CrossRef]
  33. Lai, J.W.; Cheong, K.H. Superposition of COVID-19 waves, anticipating a sustained wave, and lessons for the future. BioEssays 2020, 42, 2000178. [Google Scholar] [CrossRef] [PubMed]
  34. Torlay, L.; Perrone-Bertolotti, M.; Thomas, E.; Baciu, M. Machine learning–XGBoost analysis of language networks to classify patients with epilepsy. Brain Inform. 2017, 4, 159–169. [Google Scholar] [CrossRef]
  35. Yang, F.; Peng, C.; Peng, L.; Wang, J.; Li, Y.; Li, W. A Machine Learning Approach for the Prediction of Traumatic Brain Injury Induced Coagulopathy. Front. Med. 2021, 8, 792689. [Google Scholar] [CrossRef]
  36. Escobar-Ipuz, F.A.; Torres, A.M.; García-Jiménez, M.A.; Basar, C.; Cascón, J.; Mateo, J. Prediction of patients with idiopathic generalized epilepsy from healthy controls using machine learning from scalp EEG recordings. Brain Res. 2023, 1798, 148131. [Google Scholar] [CrossRef]
  37. Kushwaha, S.; Srivastava, R.; Jain, R.; Sagar, V.; Aggarwal, A.K.; Bhadada, S.K.; Khanna, P. Harnessing machine learning models for non-invasive pre-diabetes screening in children and adolescents. Comput. Methods Programs Biomed. 2022, 226, 107180. [Google Scholar] [CrossRef]
  38. Wei, H.; Wu, C.; Yuan, Y.; Lai, L. Uncovering the Achilles heel of genetic heterogeneity: Machine learning-based classification and immunological properties of necroptosis clusters in Alzheimer’s disease. Front. Aging Neurosci. 2023, 15, 1249682. [Google Scholar] [CrossRef]
  39. Khan, Y.F.; Kaushik, B.; Chowdhary, C.L.; Srivastava, G. Ensemble Model for Diagnostic Classification of Alzheimer’s Disease Based on Brain Anatomical Magnetic Resonance Imaging. Diagnostics 2022, 12, 3193. [Google Scholar] [CrossRef]
  40. Cai, Q.; Yuan, R.; He, J.; Li, M.; Guo, Y. Predicting HIV drug resistance using weighted machine learning method at target protein sequence-level. Mol. Divers. 2021, 25, 1541–1551. [Google Scholar] [CrossRef]
  41. Ridgway, J.P.; Ajith, A.; Friedman, E.E.; Mugavero, M.J.; Kitahata, M.M.; Crane, H.M.; Moore, R.D.; Webel, A.; Cachay, E.R.; Christopoulos, K.A.; et al. Multicenter Development and Validation of a Model for Predicting Retention in Care among People with HIV. AIDS Behav. 2022, 26, 3279–3288. [Google Scholar] [CrossRef] [PubMed]
  42. González-Castro, L.; Chávez, M.; Duflot, P.; Bleret, V.; Martin, A.G.; Zobel, M.; Nateqi, J.; Lin, S.; Pazos-Arias, J.J.; Del Fiol, G.; et al. Machine Learning Algorithms to Predict Breast Cancer Recurrence Using Structured and Unstructured Sources from Electronic Health Records. Cancers 2023, 15, 2741. [Google Scholar] [CrossRef] [PubMed]
  43. Rafid, A.K.M.R.H.; Azam, S.; Montaha, S.; Karim, A.; Fahim, K.U.; Hasan, Z. An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms. Biology 2022, 11, 1654. [Google Scholar] [CrossRef] [PubMed]
  44. Chen, C.; Chen, K.; Huang, Z.; Huang, X.; Wang, Z.; He, F.; Qin, M.; Long, C.; Tang, B.; Mo, X.; et al. Identification of intestinal microbiome associated with lymph-vascular invasion in colorectal cancer patients and predictive label construction. Front. Cell Infect. Microbiol. 2023, 13, 1098310. [Google Scholar] [CrossRef] [PubMed]
  45. Sun, J.; Wu, S.; Mou, Z.; Wen, J.Y.; Wei, H.; Zou, J.; Li, Q.; Liu, Z.; Xu, S.; Kang, M.; et al. Prediction model of ocular metastasis from primary liver cancer: Machine learning-based development and interpretation study. Cancer Med. 2023, 12, cam4.6540. [Google Scholar] [CrossRef]
  46. Zheng, Q.; Jiang, Z.; Ni, X.; Yang, S.; Jiao, P.; Wu, J.; Xiong, L.; Yuan, J.; Wang, J.; Jian, J.; et al. Machine Learning Quantified Tumor-Stroma Ratio Is an Independent Prognosticator in Muscle-Invasive Bladder Cancer. Int. J. Mol. Sci. 2023, 24, 2746. [Google Scholar] [CrossRef]
  47. Tian, W.; Jiang, W.; Yao, J.; Nicholson, C.J.; Li, R.H.; Sigurslid, H.H.; Wooster, L.; Rotter, J.I.; Guo, X.; Malhotra, R. Predictors of mortality in hospitalized COVID-19 patients: A systematic review and meta-analysis. J. Med. Virol. 2020, 92, 1875–1883. [Google Scholar] [CrossRef]
  48. Satici, M.O.; Islam, M.M.; Satici, C.; Uygun, C.N.; Ademoglu, E.; Altunok, I.; Aksel, G.; Eroglu, S.E. The role of a noninvasive index ‘Spo2/Fio2’ in predicting mortality among patients with COVID-19 pneumonia. Am. J. Emerg. Med. 2022, 57, 54–59. [Google Scholar] [CrossRef] [PubMed]
  49. Noori, M.; Nejadghaderi, S.A.; Sullman, M.J.M.; Carson-Chahhoud, K.; Kolahi, A.A.; Safiri, S. Epidemiology, prognosis and management of potassium disorders in COVID-19. Rev. Med. Virol. 2022, 32, e2262. [Google Scholar] [CrossRef] [PubMed]
  50. Liu, Q.; Ruan, H.; Sheng, Z.; Sun, X.; Li, S.; Cui, W.; Li, C. Nanoantidote for repression of acidosis pH promoting COVID-19 infection. View 2022, 3, 20220004. [Google Scholar] [CrossRef] [PubMed]
  51. Yin, J.; Yuan, N.; Huang, Z.; Hu, Z.; Bao, Q.; Shao, Z.; Mei, Q.; Xu, Y.; Wang, W.; Liu, D.; et al. Assessment of hypokalemia and clinical prognosis in Patients with COVID-19 in Yangzhou, China. PLoS ONE 2022, 17, e0271132. [Google Scholar] [CrossRef] [PubMed]
  52. Nahkuri, S.; Becker, T.; Schueller, V.; Massberg, S.; Bauer-Mehren, A. Prior fluid and electrolyte imbalance is associated with COVID-19 mortality. Commun. Med. 2021, 1, 51. [Google Scholar] [CrossRef] [PubMed]
  53. Murdaca, G.; Di Gioacchino, M.; Greco, M.; Borro, M.; Paladin, F.; Petrarca, C.; Gangemi, S. Basophils and Mast Cells in COVID-19 Pathogenesis. Cells 2021, 10, 2754. [Google Scholar] [CrossRef] [PubMed]
  54. Wolff, D.; Nee, S.; Hickey, N.S.; Marschollek, M. Risk factors for COVID-19 severity and fatality: A structured literature review. Infection 2021, 49, 15–28. [Google Scholar] [CrossRef]
  55. Lee, L.Y.; Cazier, J.-B.; Angelis, V.; Arnold, R.; Bisht, V.; Campton, N.A.; Chackathayil, J.; Cheng, V.W.T.; Curley, H.M.; Fittall, M.W.T.; et al. COVID-19 mortality in patients with cancer on chemotherapy or other anticancer treatments: A prospective cohort study. Lancet 2020, 395, 1919–1926. [Google Scholar] [CrossRef] [PubMed]
  56. Desai, A.; Gupta, R.; Advani, S.; Ouellette, L.; Kuderer, N.M.; Lyman, G.H.; Li, A. Mortality in hospitalized patients with cancer and coronavirus disease 2019: A systematic review and meta-analysis of cohort studies. Cancer 2021, 127, 1459–1468. [Google Scholar] [CrossRef] [PubMed]
  57. Kaliszewski, K.; Diakowska, D.; Nowak, Ł.; Tokarczyk, U.; Sroczyński, M.; Sępek, M.; Dudek, A.; Sutkowska-Stępień, K.; Kilis-Pstrusinska, K.; Matera-Witkiewicz, A.; et al. Assessment of Gastrointestinal Symptoms and Dyspnea in Patients Hospitalized due to COVID-19: Contribution to Clinical Course and Mortality. J. Clin. Med. 2022, 11, 1821. [Google Scholar] [CrossRef] [PubMed]
  58. Caruso, C.; Marcon, G.; Accardi, G.; Aiello, A.; Calabrò, A.; Ligotti, M.E.; Tettamanti, M.; Franceschi, C.; Candore, G. Role of Sex and Age in Fatal Outcomes of COVID-19: Women and Older Centenarians Are More Resilient. Int. J. Mol. Sci. 2023, 24, 2638. [Google Scholar] [CrossRef]
  59. Ho, F.K.; Petermann-Rocha, F.; Gray, S.R.; Jani, B.D.; Katikireddi, S.V.; Niedzwiedz, C.L.; Foster, H.; Hastie, C.E.; Mackay, D.F.; Gill, J.M.R.; et al. Is older age associated with COVID-19 mortality in the absence of other risk factors? General population cohort study of 470,034 participants. PLoS ONE 2020, 15, e0241824. [Google Scholar] [CrossRef]
Figure 1. The figure shows the structure of the process for the development of the machine learning method in this study.
Figure 1. The figure shows the structure of the process for the development of the machine learning method in this study.
Biomedicines 12 00409 g001
Figure 2. ROC curves for the six assessed machine learning predictors.
Figure 2. ROC curves for the six assessed machine learning predictors.
Biomedicines 12 00409 g002
Figure 3. Radar plots of the variables analyzed. The upper one represents the training phase, and the lower one represents the result of the test phase.
Figure 3. Radar plots of the variables analyzed. The upper one represents the training phase, and the lower one represents the result of the test phase.
Biomedicines 12 00409 g003
Figure 4. Histogram showing the 10 most relevant parameters that contribute to the mortality of COVID-19 hospitalized patients. The units in which these parameters are displayed are shown in Table 1 and Table 2.
Figure 4. Histogram showing the 10 most relevant parameters that contribute to the mortality of COVID-19 hospitalized patients. The units in which these parameters are displayed are shown in Table 1 and Table 2.
Biomedicines 12 00409 g004
Table 1. Categorical baseline clinical characteristics of patients.
Table 1. Categorical baseline clinical characteristics of patients.
Global
(291 Patients)
Alive
(231 Patients)
Deceased
(60 Patients)
(n)(% Total)(n)(% Total)(n)(% Total)
SexMen15654121423511
Women1354611038259
Smoking837210
Drinking211010
Diabetes mellitus56194515114
Hypertension1284497333111
Dyslipidemia104368830165
Asthma26923831
Other chronic lung diseases25918672
Congestive heart failure77265318248
Overweight/Obesity281024841
Active tumors18613452
Other relevant pathologies838300
Immunosuppressive chronic treatment1038321
Biological chronic treatment101000
Other relevant chronic treatments19868156544214
Need of ICU admission6522592062
Cough18162141484014
Sputum351227983
Dyspnea15754129442810
Loss of smell (anosmia)14511431
Loss of taste (ageusia)20717631
Nausea22817652
Vomiting19715541
Diarrhea51184114103
Asthenia104368730176
Dizziness937221
Myalgia39132910103
Table 2. Numerical baseline clinical characteristics of patients.
Table 2. Numerical baseline clinical characteristics of patients.
MEAN ± SEM
Global
(291 Patients)
Alive
(231 Patients)
Deceased
(60 Patients)
Age67.1 ± 164.5 ± 1.177.13 ± 1.5
Temperature (°C)36.9 ± 0.136.9 ± 0.136.18 ± 0.7
Leucocytes (×103 µL)6.2 ± 0.16.7 ± 0.29.5 ± 2
Neutrophils (×103 µL)4.6 ± 0.15 ± 0.26.5 ± 0.6
Lymphocytes (×103 µL)1.02 ± 0.091.10 ± 0.110.73 ± 0.05
Monocytes (×103 µL)0.51 ± 0.020.53 ± 0.020.45 ± 0.04
Eosinophils (×103 µL)0.009 ± 0.0020.009 ± 0.0020.007 ± 0.004
Basophils (×103 µL)0.014 ± 0.0020.013 ± 0.0020.016 ± 0.005
Erythrocytes (×106 µL)4.7 ± 05 ± 0.24.6 ± 0.1
Hemoglobin (g/dL)13.8 ± 0.113.8 ± 0.113.3 ± 0.2
Hematocrit (%)41.5 ± 0.341.8 ± 0.340.5 ± 0.7
V.C.M. (fL)88.3 ± 0.388.1 ± 0.388.8 ± 0.9
Platelets (×103 µL)183 ± 4.1191.8 ± 5.5181.9 ± 10
D-dimer (ng/mL)459 ± 16.9647.4 ± 69.11403 ± 610.6
Prothrombin activity (TP) (%)84.7 ± 0.983 ± 1.279.3 ± 3.4
Ratio (TP)1.1 ± 01.2 ± 01.6 ± 0.2
I.N.R.1.1 ± 01.2 ± 01.5 ± 0.2
Patient (TTPA) (s)31.7 ± 0.232.8 ± 0.533.9 ± 1.2
Ratio (TTPA)1.05 ± 0.011.04 ± 0.011.09 ± 0.01
Fibrinogen (Derived) (mg/dL)699 ± 10.2698.1 ± 11.5703.7 ± 22.1
Sodium (mmol/L)135 ± 0.2134.6 ± 0.7134.5 ± 0.6
Potassium (mmol/L)3.9 ± 05.7 ± 1.84 ± 0.1
Chloride (mmol/L)99.9 ± 0.3108.5 ± 6.5100.4 ± 1.1
Glucose (mg/dL)117 ± 1.4126.8 ± 3.6144.9 ± 6.5
Urea (mg/dL)40.5 ± 1.144.3 ± 2.457.9 ± 3.9
Creatinine (mg/dL)0.9 ± 01.7 ± 0.51.2 ± 0.1
Estimated glomerular filtrate (CKD-EPI 2009) (mL/min/1.73 m2)66.6 ± 1.369.4 ± 1.557.7 ± 2.8
Alanine aminotransferase (ALT/GPT) (U/L)29.7 ± 139.9 ± 2.234.9 ± 3.8
Aspartate aminotransferase (AST/GOT) (U/L)40.3 ± 1.148.4 ± 2.160.3 ± 7.2
Gammaglutamil transferase (GGT) (U/L)101 ± 43.5107.5 ± 66.388 ± 42
Total bilirubin (mg/dL)0.6 ± 00.9 ± 0.30.7 ± 0
Alkaline phosphatase (U/L)70.6 ± 15.166.8 ± 18.886 ± 0
Lactate dehydrogenase (LDH) (U/L)348 ± 7.7347 ± 9.4427.4 ± 29.1
Phosphate (mg/dL)3.2 ± 0.23.1 ± 0.33.4 ± 0.4
C-reactive protein (mg/dL)93 ± 4.595.9 ± 5.7136.7 ± 14.5
Procalcitonin (ng/mL)0.1 ± 00.4 ± 0.115.3 ± 13.1
pH7.442 ± 0.0037.446 ± 0.0047.426 ± 0.008
FIO2 (%)21 ± 024 ± 133.8 ± 4.1
pO2/FIO2262 ± 7.5270.8 ± 7.9227.4 ± 19.1
Lactate (mmol/L)1.5 ± 01.4 ± 02 ± 0.1
Table 3. The table shows the final results of balanced accuracy, precision, MCC, F1 score, and AUC for each machine learning method tested.
Table 3. The table shows the final results of balanced accuracy, precision, MCC, F1 score, and AUC for each machine learning method tested.
MethodsBalanced AccuracyPrecisionMCCF1 ScoreAUC
SVM87.48 ± 0.6586.85 ± 0.7377.62 ± 0.5487.22 ± 0.6587.34 ± 0.53
DT86.02 ± 0.5485.40 ± 0.6276.32 ± 0.4385.76 ± 0.5586.25 ± 0.47
BLDA81.91 ± 0.7981.33 ± 0.8272.68 ± 0.7681.67 ± 0.7381.43 ± 0.76
GNB78.75 ± 0.6478.19 ± 0.7169.88 ± 0.6378.52 ± 0.6678.35 ± 0.67
KNN89.44 ± 0.4688.80 ± 0.4779.36 ± 0.4389.17 ± 0.4889.32 ± 0.57
XGB96.02 ± 0.2495.33 ± 0.2685.20 ± 0.2195.73 ± 0.2396.03 ± 0.25
Table 4. The table presents the recall, specificity, kappa, and DYI values for each machine learning method tested.
Table 4. The table presents the recall, specificity, kappa, and DYI values for each machine learning method tested.
MethodsRecallSpecificityKappaDYI
SVM87.58 ± 0.6787.38 ± 0.7477.88 ± 0.5687.48 ± 0.65
DT86.12 ± 0.5685.91 ± 0.5376.58 ± 0.4686.02 ± 0.54
BLDA82.01 ± 0.7381.82 ± 0.7572.92 ± 0.6781.91 ± 0.73
GNB78.84 ± 0.6778.66 ± 0.6970.11 ± 0.6578.75 ± 0.66
KNN89.55 ± 0.4489.34 ± 0.4579.63 ± 0.4289.44 ± 0.45
XGB96.13 ± 0.2595.91 ± 0.2385.48 ± 0.2296.02 ± 0.24
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Queipo, M.; Barbado, J.; Torres, A.M.; Mateo, J. Approaching Personalized Medicine: The Use of Machine Learning to Determine Predictors of Mortality in a Population with SARS-CoV-2 Infection. Biomedicines 2024, 12, 409. https://doi.org/10.3390/biomedicines12020409

AMA Style

Queipo M, Barbado J, Torres AM, Mateo J. Approaching Personalized Medicine: The Use of Machine Learning to Determine Predictors of Mortality in a Population with SARS-CoV-2 Infection. Biomedicines. 2024; 12(2):409. https://doi.org/10.3390/biomedicines12020409

Chicago/Turabian Style

Queipo, Mónica, Julia Barbado, Ana María Torres, and Jorge Mateo. 2024. "Approaching Personalized Medicine: The Use of Machine Learning to Determine Predictors of Mortality in a Population with SARS-CoV-2 Infection" Biomedicines 12, no. 2: 409. https://doi.org/10.3390/biomedicines12020409

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop