Next Article in Journal
Influence Line-Based Design of Scissors-Type Bridge
Next Article in Special Issue
Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study
Previous Article in Journal
Deep Compressed Sensing Generation Model for End-to-End Extreme Observation and Reconstruction
Previous Article in Special Issue
An Ensemble Framework to Improve the Accuracy of Prediction Using Clustered Random-Forest and Shrinkage Methods
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Detection of Risk Predictors of COVID-19 Mortality with Classifier Machine Learning Models Operated with Routine Laboratory Biomarkers

1
Department of Biostatistics and Medical Informatics, Faculty of Medicine, Erzincan Binali Yıldırım University, Erzincan 24000, Turkey
2
Institute of Physics and Technology, Petrozavodsk State University, 33 Lenin Ave., 185910 Petrozavodsk, Russia
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(23), 12180; https://doi.org/10.3390/app122312180
Submission received: 22 October 2022 / Revised: 22 November 2022 / Accepted: 24 November 2022 / Published: 28 November 2022
(This article belongs to the Special Issue Decision Support Systems for Disease Detection and Diagnosis)

Abstract

:
Early evaluation of patients who require special care and who have high death-expectancy in COVID-19, and the effective determination of relevant biomarkers on large sample-groups are important to reduce mortality. This study aimed to reveal the routine blood-value predictors of COVID-19 mortality and to determine the lethal-risk levels of these predictors during the disease process. The dataset of the study consists of 38 routine blood-values of 2597 patients who died (n = 233) and those who recovered (n = 2364) from COVID-19 in August–December, 2021. In this study, the histogram-based gradient-boosting (HGB) model was the most successful machine-learning classifier in detecting living and deceased COVID-19 patients (with squared F1 metrics F12 = 1). The most efficient binary combinations with procalcitonin were obtained with D-dimer, ESR, D-Bil and ferritin. The HGB model operated with these feature pairs correctly detected almost all of the patients who survived and those who died (precision > 0.98, recall > 0.98, F12 > 0.98). Furthermore, in the HGB model operated with a single feature, the most efficient features were procalcitonin (F12 = 0.96) and ferritin (F12 = 0.91). In addition, according to the two-threshold approach, ferritin values between 376.2 μg/L and 396.0 μg/L (F12 = 0.91) and procalcitonin values between 0.2 μg/L and 5.2 μg/L (F12 = 0.95) were found to be fatal risk levels for COVID-19. Considering all the results, we suggest that many features combined with these features, especially procalcitonin and ferritin, operated with the HGB model, can be used to achieve very successful results in the classification of those who live, and those who die from COVID-19. Moreover, we strongly recommend that clinicians consider the critical levels we have found for procalcitonin and ferritin properties, to reduce the lethality of the COVID-19 disease.

1. Introduction

The complexity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2/COVID-19), and the rapid spread of the disease, causing serious and fatal complications has focused researchers’ attention on the clinical course of the disease [1,2]. The disease has placed an unprecedented strain on healthcare systems around the world, and has involved medical professionals in an unknown and challenging effort to treat populations of patients with a new and deadly disease [3,4,5].
Although important information about the genetic structure of this new virus has been obtained [6] and data on the symptoms of the disease are shared in the medical community [7], there are still many severe cases. Mortality rates for this disease differ among countries [8,9,10], and workload in hospitals affects mortality [5,10].
While mild symptoms (fever, dry cough, shortness of breath, myalgia, fatigue, etc.) are seen in most of the patients with SARS-CoV-2 infection, it has been stated that acute respiratory distress syndrome, septic shock, bleeding, coagulation disorder, and metabolic acidosis can be seen, and result in death in severe cases, noting that this disease can accompany multi-organ dysfunction and cause a variety of symptoms [11,12,13,14,15]. Onur et al. [16] and Huyut et al. [12] reported that COVID-19 disease may be asymptomatic or associated with severe ARDS, thought to be due to an inflammatory cytokine-storm. Furthermore, Chalmers et al. [17] noted that excessive and uncontrolled release of proinflammatory cytokines may be considered the most important primary cause of death from coronavirus, as has been reported in other infections caused by pathogenic coronaviruses. However, attempts to identify and treat hyperinflammation associated with the COVID-19 infection, continue [18].
Many studies have indicated that male gender, advanced age, comorbidities such as chronic obstructive pulmonary disease (COPD) and diabetes mellitus, and some routine laboratory tests such as D-dimer, procalcitonin, and CRP are associated with worse outcomes of the disease [12,15,19,20,21,22]. These studies also stated that COVID-19 patients with severe pneumonia have decreased serum-albumin and prealbumin levels, and signs of deterioration in liver and kidney functions.
Most of the previous routine blood-studies were aimed at identifying features that affect the diagnosis and prognosis of COVID-19 [3,22,23]. However, as the number of infected and fatal cases increases worldwide, there remains a need for a detailed investigation of clinical, radiological, and laboratory features, and, more importantly, mortality risk-factors in severe COVID-19 patients [11,24]. Zhang et al. [24] noted that there may be changes in the previously detected predictive-values of mortality in severe and critical COVID-19 patients. Therefore, Ponti et al. [25] and Huyut [3] stated that the determination of effective routine laboratory-biomarkers that can classify COVID-19 patients according to their fatal risk, supported by studies with large samples, is essential in order to guarantee rapid and effective treatment. Indeed, many studies have emphasized that the usefulness and effective breakpoints of many laboratory markers such as ferritin and D-dimer in predicting COVID-19 mortality have not been fully determined [11,17,26,27,28]. Therefore, Chalmers et al. [17] and Cheng et al. [29] stated that the predictive role of routine laboratory-features in identifying risk factors affecting COVID-19 mortality needs further confirmation.
However, it is known that even the most knowledgeable and experienced physicians can interpret little of the information contained in routine blood laboratory-results, and it is extremely difficult to determine the severity of COVID-19 patients based on laboratory findings alone [30]. In contrast, machine learning (ML) models have been successfully used to recognize subtle patterns in data to distinguish latent-association patterns between routine blood-parameters and disease [3,13,22,31,32]. Indeed, several studies have shown that ML models can predict COVID-19 patient groups with high accuracy, using patient demographics, physiological characteristics, and RBV data [33,34,35,36].
In our previous studies, we determined routine blood-values that predict the diagnosis and prognosis of COVID-19 with various supervised ML models and LogNNet [3,4,13,22,32]. Huyut and Velichko [22] used only three RBV features with the LogNNet model to predict the diagnosis and prognosis of COVID-19 disease. They achieved an accuracy of 99.17% in the diagnosis of the disease and an accuracy of 82.7% in determining the prognosis of the disease. Velichko et al. [32] achieved 100% accuracy in the diagnosis of COVID-19, using 11 RBV features with a histogram-based gradient-boosting model in their study. Huyut [3] classified severe and mild COVID patients from a large patient population, using 28 RBV features, and the models with the highest classification accuracy were locally weighted learning (97.86%) and k-nearest neighbor (94.05%).
Huyut and İlkbahar [4] used various biomarkers with the CHAID decision tree to detect the diagnosis and prognosis of COVID-19. The model showed 81.6% accuracy in recognizing the disease and 93.5% accuracy in determining the prognosis of the disease. Formica et al. [37] developed an ML model for early diagnosis of disease, using eight RBV features, and reported an 82% specificity with 83% sensitivity; however, the analysis was based on a small sample (171 patients). Banerjee et al. [38] classified a patient cohort of 598 cases, 39 of whom were COVID-19 positive, by various ML methods, using 12 RBV features, and reported good specificity (91%) but very low sensitivity (43%). Avila et al. [39] developed a Bayesian model using the dataset of 12 RBV features, and reported a sensitivity and specificity of 76.7% in the diagnosis of the disease. Joshi et al. [40] developed a trained logistic-regression model using only hemogram data on a dataset of 380 cases, and reported 93% sensitivity but low 43% specificity in the diagnosis of COVID-19 disease. Zhu et al. [41] used 78 features, consisting of demographic, clinical, and RBV values, with a deep neural network model to predict the mortality of COVID-19. They found the success of the method in diagnosis to be 95.4% AUC. Soltan et al. [42] ran models of multivariate logistic regression, random forests and extreme gradient-supported trees on more than 50 RBV data, to identify COVID-19. The most successful model in the diagnosis of the disease was the XGBoost method, with 85% sensitivity and 90% accuracy. Soares [43] developed an ML model using 15 RBV parameters to diagnose COVID-19 on a sample of 599 people, 81 of whom were COVID-19-positive. Combining the SVM, ensemble, and SMOTE Boost models, this model had 86% specificity and 70% success in diagnosing the disease.
In this study, 34 routine blood-values were determined with a statistical approach, in order to find the most successful ML classifier-model in detecting patients with COVID-19 who lived and died. Therefore, 16 ML models were run with these features, and the most successful model (histogram-based gradient-boosting/HGB) was determined. The predictors of mortality of COVID-19 disease were revealed with the HGB model. The performance of individual and binary combinations of these predictors in detecting patient groups was obtained with the HGB model. In addition, correlations between patient groups and binary combinations of features were interpreted in detail. In addition, cut-off values for these characteristics (mortality risk-levels) were calculated using one- and two-threshold approaches in the classification of patient groups according to direct characteristics. We think that the findings of this study will be an important motivational tool for clinicians in estimating the mortality of COVID-19 and detecting severe patients.
The paper has the following structure. Section 2 describes the data-collection procedure, metrics, characteristics of the participants, the feature-selection procedure, the one- and two-threshold approaches for classification based on the values of direct features and a new F12 criterion for the classification metric. Section 3 describes the correlations of features with patient groups, statistical differences of features between groups, classification results according to ML models, classification results of individual and pairwise combinations of features operated with the HGB model, and classification results according to one- and two-threshold values of the features. Section 4 discusses the results and compares them with known developments. Section 5 presents the limitations of the study. Finally, in Section 6, a general description of the study and its scientific significance is given.

2. Materials and Methods

In this retrospective cohort study, data suitable for our criteria were collected from the Erzincan Binali Yıldırım University Mengücek Gazi Training and Research Hospital information system between August and November 2021, and included in the study. The laboratory data of the patients were the routine blood-values measured at the time of admission to the hospital. The information about the patients was followed up until exit, and exit information was recorded. In our hospital, a diagnosis of SARS-CoV-2 was made using real-time reverse transcription polymerase chain reaction (RT-PCR) only, on nasopharyngeal or oropharyngeal swabs.

2.1. Measurements

Sysmex XN-1000 Hematology System (Sysmex Corporation, Kobe, Japan) was used to carry out cell blood count. Biochemical tests were analyzed by the spectrophoto spectrophotometric method using Beckman Coulter Olympus AU2700 Plus Chemistry Analyzer (Beckman Coulter, Tokyo, Japan) from serum Prothrombin time (PT), activated partial prothrombin time (aPTT), and fibrinogen were determined with a digital coagulation device from Ceveron-Alpha (Diapharma Group Inc., West Chester, Canada). The erythrocyte sedimentation rate (ESR) was measured using the TEST 1 BCL instrument (Alifax, Polverara, Italy), based on the principle of photometric capillary-flow kinetic analysis. Ferritin was evaluated with a chemiluminescence immunoassay (Centaur XP, Siemens Healthcare, Germany). C-reactive protein (CRP) was measured using the nephelometric method in the BNTM II System (Siemens, Munich, Germany). Procalcitonin (PCT), D-dimer, and troponin were analyzed from whole blood on the AQT90 flex RadiometerVR (Bronshoj, Denmark). All patient data were double-checked and analyzed by the research team.

2.2. Characteristics of Participants and Defined Datasets

In this study, only RBV data (features) of 2597 patients who were diagnosed with COVID-19 and treated at the hospital during the specified dates were used. During the treatment period, 233 (9.0%) of these patients died, while 2364 (91.0%) survived. Of the patients who lost their lives, 143 (61.3%) were male, while 90 (38.7%) were female. The mean age of the surviving patients was 55 years, while the mean age of the deceased patients was 76 years.
The routine laboratory-information of these patients was examined. The RBVs (features) that were measured from at least 80% of the patients were used. Missing data in this study were completed with the mean of the relevant parameter distribution, and outliers were normalized. A total of 38 routine blood-values calibrated from approximately 70 parameters were used in this study. The data used in this study will be used as “SARS-CoV-2-RBV3” (Supplementary Materials). The SARS-CoV-2-RBV3 dataset includes immunological, hematological, and biochemical parameters (Table 1).
These patients were of Turkish and Kurdish ethnicity. Only data from individuals over the age of 18 were recorded. Since it is a retrospective study, comorbidity data of the patients could not be obtained. In the SARS-CoV-2-RBV3 dataset, surviving patients were coded as 0, and patients who died were coded as 1 (survived COVID-19 = 0, non-survived COVID-19 = 1).
The features in this dataset are calibrated, and include almost all of the RBV values that are the subject of studies on COVID-19 mortality. Therefore, we think that the bias of our study using this data set was minimized in comparison with the literature. In addition, the use of our data set, which we can share upon request from researchers, is important in terms of demonstrating the reproducibility and suitability of the results.

2.3. Feature Selection for ML Models with Statistical Approach

In order to evaluate the difference of 38 RBV values between patients who survived COVID-19 and those who died, the assumptions of the parametric tests were checked first. The assumption of normality was analyzed with the Shapiro–Wilk test, and the homogeneity of variances in the groups was analyzed with Levene’s test. Since the assumptions of the parametric tests were not met, the significance of the difference of 38 features between the patient groups (two independent groups) was analyzed using the Mann–Whitney U test [44,45] and p-values were calculated. A total of 34 features were judged to be statistically different between patient groups, and these features were used by ML models for classification. It is understood that the features selected with this approach may be the determining factors between patients dying and patients surviving. In addition, we used features that were statistically different between patient groups as inputs to the ML models. This approach increased the clinical reliability of our results and reduced biased results.

2.4. Threshold Approach

The simplest approach for classification by one feature in the presence of only two classes is based on determining the threshold values separating the classes Vth [22].

2.4.1. One-Threshold Approach

For a one-threshold approach for the SARS-CoV-2-RBV3 dataset, we introduce the threshold value Type 1 or Type 2, in accordance with the rule:
Type   1 :   if   feature   value     V t h   then   s u r v i v e d   else   n o n s u r v i v e d   C O V I D 19 Type   2 :   if   feature   value     V t h   then   n o n s u r v i v e d   else   s u r v i v e d   C O V I D 19
The threshold type indicates which side of the threshold the non-survived and survived classes are on.

2.4.2. Two-Threshold Approach

For a two-threshold approach for the SARS-CoV-2-RBV3 dataset, we introduce the threshold value Type 1 or Type 2, in accordance with the rule:
Type   1 :   if   feature   ( value     V t h _ 1 ) a n d ( value     V t h _ 2 )   then   s u r v i v e d   else   n o n s u r v i v e d   C O V I D 19 Type   2 :   if   feature   ( value     V t h _ 1 ) a n d ( value     V t h _ 2 )   then   n o n s u r v i v e d   else   s u r v i v e d   C O V I D 19
The main metrics were calculated after balancing the dataset. The k-fold validation was not used when calculating Ath and F12. The threshold values Vth, Vth_1, Vth_2 were determined by stepwise enumeration and by finding the maximum value of Ath.

2.5. F12 Metric

To select the most significant features, we introduced an additional metric, F12, equal to the product of the F1 metrics of the two classes.
F 1 2 = F 1 ( n o n s u r v i v e d   C O V I D 19 ) F 1 ( s u r v i v e d   C O V I D 19 )

3. Results

3.1. Correlation Analysis of Dataset SARS-CoV-2-RBV3

Figure 1 shows the results of the correlation analysis of the diagnosis of features using the three types of Pearson, Spearman, and Kendall correlations over the entire volume of the SARS-CoV-2-RBV3 database. It can be seen that the Spearman and Kendall correlations have very similar values. The Pearson correlation gives in general a smaller number of features that correlate with the diagnosis, so we will use the Spearman correlation as the main one.
Full Spearman heatmaps across the entire database and by class (survived COVID-19 and non-survived COVID-19) are shown in Figure 2.
Figure 2b,c show that the non-survived COVID-19 class is characterized by an increased correlation between features, compared with the survived COVID-19 class, which indicates poor self-regulation in the body.
The most significant changes in the correlation of features of the non-survived COVID-19 class compared with the survived COVID-19 class are presented in Table 2. Here, the qualitative change is denoted as ‘Down’ and ‘Up’. For some pairs of features, the correlation increased, while for some it fell.

3.2. Comparison of RBV Features of Surviving and Non-Surviving COVID-19 Patients and Comparison of ML Classifiers

The statistical comparison results of 38 characteristics of surviving and non-surviving COVID-19 patients are presented in Table 3. Except for albumin, BASO, EOS, and MPV, the other 34 features were judged as statistically different between the patient groups. The 34 features selected here were used as inputs to identify patient groups with ML models and the classification performance of the models was obtained (Table 4 and Figure 3). Considering the F12 (see Equation (3)) criterion derived from the F1 metrics of the classes in the classification of patient groups, it was found that the most successful model was HGB (F12 value: 1). After HGB, the most successful models were Adaboost, Extra Trees, KNN, RF, and SVM-LK, (at least F12 > 0.99 in these models). The most unsuccessful model was quadratic discriminant analysis (F12 value: 0.72).

3.3. Investigation of the Effectiveness of the Models Operating on the One-Feature HGB Model

It is known that the use of the F1 score may be more useful than the accuracy value in cases where the data distributions are not equal. Figure 4 shows a comparison of F1 metrics for the survived-COVID-19 class, calculated for the original and SMOTE-balanced datasets. It can be seen that, for survived-COVID-19, there is a high F1 value for all features, for both databases.
Figure 5 shows a comparison of F1 metrics for the non-survived-COVID-19 class calculated for the original and SMOTE-balanced datasets. It can be seen that, for non-survived-COVID-19, a high F1 value is observed for most of the features for SMOTE-balanced dataset, while for the original dataset, only some of the features have a high F1. Thus, to select the main features, it is logical to use the results of calculating the metrics for the original dataset. Synthetic data, although well approximated by the model, nevertheless does not allow us to judge the performance of the model with real data. Table A1 in the Appendix A presents the classification result of SARS-CoV-2-RBV3 dataset for the HGB model, using a single input-feature for original dataset, indicating the main classification metrics (Precision, Recall, F1).

3.4. F12 Metric in the Detection of Patient Groups with the HGB Model, One-Threshold, and Two-Threshold Approaches

Figure 6 shows the dependence of F12 on the feature. Let us agree to consider as the most significant features those features in which F12 ≥ 0.5; this threshold is visualized in the figure by the blue line.
As a result, we obtain a list of the 12 most significant single features for HGB classification, shown in Table 5, in which F12 ≥ 0.5. No high F12 value was found in the classification of patient groups with the one-threshold-value approach. However, high F12 values were found in the classification of patient groups with the HGB model operated with a single feature and the two-threshold-value approach. Accordingly, PCT and ferritin properties were found to be the most effective in classification, according to the HGB model operated with one feature and the two-threshold approach.

3.4.1. Threshold Approach

For the one-threshold approach, we obtained the distribution of the F12 metric shown in Figure 7. Model types are marked with color (Type 1, Type 2). The complete collection of metrics (Type, Vth, Ath, Precision, Recall, F1, F12) is presented in Table A2 of the Appendix A.
For the two-threshold approach, we obtained the distribution of the F12 metric shown in Figure 8. Model types are marked with color (Type 1, Type 2). The complete collection of metrics (Type, Vth_1, Vth_2, Ath, Precision, Recall, F1, F12) is presented in Table A3 of the Appendix A.
Procalcitonin, ferritin, and fibrinogen samples for the histogram distributions and classification results of the characteristics of surviving and non-surviving COVID-19 patients according to the one-threshold approach, are shown in Figure 9.
The amylase samples for the histogram distribution and classification results of the characteristics of surviving and non-surviving COVID-19 patients according to the two-threshold approach, are shown in Figure 10.

3.4.2. Comparison of Spearman Correlation and HGB Model and Threshold Approach

It was observed that the performance of the HGB model with a single feature (F12) in the classification of patient groups who died from and survived COVID-19 was more successful than the classification made by considering one- and two-threshold values (Figure 11). In addition, Spearman’s correlation gave a similar distribution of features in terms of importance, as the presented models.

3.5. Investigation of the Effectiveness of the HGB Model Working on Two Features for the Detection of Surviving and Non-Surviving COVID-19

For the detection of living and deceased COVID-19 patient-groups, the SMOTE-trained HGB model was run with dual features, and classification performances are presented in Table 4. In addition, F12 values related to binary properties and classification performances operated with the HGB model are visualized in two-dimensional space (Figure 12).
When Table 6 and Figure 12 are examined, the HGB model shows a classification performance of F12 = 0.98 with only D-dimer and PCT feature pair, in the detection of surviving and non-surviving patients. The classification performances of the feature pairs formed by PCT with ESR, D-Bil, Ferritin, and LDH were approximately F12 = 0.98. Both surviving and non-surviving patients with these feature pairs were identified, with high precision and recall values. PCT appears to be the feature that most closely matches other features in predicting disease mortality. After PCT, it can be said that ferritin is the most-matching property with other properties. In addition, ≥ 0.94 F12 values were found in the patient-group classification of various feature pairs with the HGB model (Table 6)
In addition, according to the two-threshold approach, it was found that the majority of patients who died had fibrinogen values between 349.98 g/L and 379.05 g/L (F12 = 0.68), D-dimer values between 1009.99 μg/L and 10,742.71 μg/L (F12 = 0.65), ESR values between 36.12 and 56.62 (F12 = 0.70), ferritin values between 376.2 μg/L and 396.0 μg/L (F12 = 0.91) and PCT values between 0.2 μg/L and 5.2 μg/L (F12 = 0.95) (Figure 8 and Table A3). It can be said that the determined value ranges of these features are the most important lethal-risk levels. It is noteworthy that procalcitonin and ferritin are the most important feature pairs in the detection of surviving and non-surviving COVID-19 patients, according to both the HGB classifier and the two-threshold approach.

3.6. Concept of 1D and 2D Masks

In order to understand the working principle of the HGB model in the classification of surviving and non-surviving COVID-19 patients, the cut-off values of one and two features and their sampling distributions were drawn, and the results were visualized with the masking technique.

3.6.1. 1D Mask of the HGB Model

Figure 13a shows the distribution of procalcitonin by patient groups on the original dataset, and shows how the patient groups were classified according to the threshold values of this feature. The procalcitonin value is used as an example for understanding the procedure for identifying patient groups according to a single feature of the HGB model. The classification results of the HGB model, which uses the cut-off values of procalcitonin to determine the patient groups, are visualized in the 1D-mask technique in Figure 13b.

3.6.2. 2D Mask of HGB Model

Figure 14a,c shows the distribution of D-dimer-ferritin and MCH-creatine kinase properties in two-dimensional space, according to patient groups on the original dataset. These feature pairs have been chosen as examples to understand the working principle of the HGB model with dual features. The results of the classification of living and deceased COVID-19 patients with the HGB model using these features were visualized with the 2D-masking technique (Figure 14b,d). D-dimer-ferritin properties were the feature pairs with the highest F12 score in the identification of patient groups with HGB, while MCH-Creatine kinase were the feature pairs with the lowest F12 score. Here, we have shown the working principle of these two contrasting features with HGB.

4. Discussion

COVID-19, caused by the novel coronavirus SARS-CoV-2, is a new disease for humanity and contains many unknowns [16]. During the course of the disease, changes are observed in many biochemical parameters, as well as hematological abnormalities [1,4,15,23]
While most patients have mild symptoms, some patients may develop severe symptoms such as severe pneumonia, acute respiratory distress syndrome (ARDS), and multiple organ dysfunction syndromes (MODS) [4,23,46]. Therefore, early evaluation of patients who require special care, high mortality-expectation, and effective identification of relevant biomarkers on large sample groups are important to reduce mortality [4,13,29,35].
In this study, firstly, increasing and decreasing relationship-levels between living and deceased patient-groups and feature pairs were examined (Table 2). Then, 34 features were determined using a statistical approach to determine the most successful ML classifier-model in detecting living and deceased COVID-19 patients (Table 3). Our dataset was balanced with SMOTE, and our ML models were trained with the balanced dataset, as there was a large sample difference (91% versus 9%) between the groups of patients who lived and those who died from COVID-19, in our dataset. The patient groups were classified using 16 ML models operated with 34 features, and the most successful was the histogram-based gradient boosting (HGB) model (F12 = 1) (Table 4). Then, with the HGB model, the most important predictors (12 features) in estimating the mortality of COVID-19 were revealed, and lethal-risk factors of the disease were determined (Table 5). In addition, pairs of features with the highest classification-rate were determined by using binary combinations of all features to determine patient groups (Table 6). Moreover, classification results were found by calculating the most important cut-off values in the classification of patients who lived and those who died, according to one- and two-threshold values (Table 5, Table A2 and Table A3).
In this study, patients who died and those who survived COVID-19 were highly associated with the feature pairs HGB-HCT, RBC-HCT, RBC-HGB, NEU-WBC, INR-PT (Figure 2b,c). We think that these pairs of features are associated with the prognosis of the disease and have significant negative effects on the immune system during the disease process. Moradi et al. [47] stated that the components of the immune system are the organs most frequently affected by COVID-19, after the lungs, and stated that necrosis and bleeding, as well as spleen atrophy and significant reductions in lymphocyte and neutrophil counts, may occur in these patients. In addition, Guzik et al. [48] noted that these features were highly correlated with the prognosis of the disease. Song et al. [49] determined that increased NEU, WBC, CRP, and D-dimer levels may reflect an imbalance in the inflammatory response, and these features can be considered as a possible indicator of disease severity in infectious diseases such as sepsis and bacteremia. In one study, it was reported that lower levels of RBC, lymphocytes, platelets, HGB, and higher neutrophils were observed in the peripheral blood system of severe COVID-19 patients [50].
In this study, there was a significant decrease in the level of the relationship between the patients who died and the pairs of albumin-glucose, ESR-D-dimer, creatinine-ALT, HCT-EOS, ESR-Fibrinogen, and ferritin-D-dimer properties when compared with the patients who survived (shown as “Down” in Table 2). Here, we can say that the applications applied to the patients who passed away have little effect on the values of these features, and that there are hidden relationship-structures between these feature pairs and mortality. We think that the decrease in the relationship structure between these feature pairs and the disease increases mortality. In addition, the relationship rate of all feature pairs shown as “Up” in Table 2 with deceased patients was significantly increased, compared with living patients. In particular, the greatly increased level of relationship between NEU-amylase, BASO-CKMB, MPV-AST, D-dimer-creatinine and MPV-UA feature pairs and patients who died made us think that important disorders such as kidney and liver functions occur in severe COVID-19 patients. Although the increasing relationship of these feature pairs with the patients who lost their lives points to the lack of self-care, we think that these feature pairs hide important information in the increasing mortality of the disease. It is understood that there are serious increases and decreases in the level of relationship between this feature and its various combinations in the period until death, in patients who lost their lives. We think that the difficulties in the management of this process and the serious changes in the levels of these feature pairs indicate very different complications in severe patients. In this context, we can say that the increase or decrease in one of these features has a significant effect on the metabolism of the other feature, depending on the severity of the disease.
Huyut et al. [12] stated that the patients who died had significant changes in liver- and kidney-function tests, cardiac-troponin and hemogram values, and parameters related to inflammation. They also stated that high ESR, PT, CRP, D-dimer, ferritin, and RDW values are the most effective predictors of mortality of COVID-19. Similarly, Chen et al. [14] and Tan et al. [51] determined that disorders resulting from hematological abnormalities were associated with disease severity. Many studies have reported that leukocytosis and lymphopenia levels are independent predictors of in-hospital mortality [12,46,49]. Huyut et al. [12] did not find EOS and other hematological values to be a predictive risk factor for COVID-19 mortality, while they stated that high NEU, WBC and RDW values are important mortality risk-indicators of COVID-19. Similarly, one study noted that neutrophils play an important role in inflammation, and this increase contributes to the development of ARDS [52]. In other studies, neutrophil was noted to be an independent predictor of severe disease, and associated with hypersensitivity pneumonia in SARS-CoV-2 [52,53]. Although some studies have indicated that increased amylase or lipase indicates pancreatic injury in COVID-19 patients, this has not been proven in other studies, and it has been stated that the increase in these enzymes can also be seen in other clinical conditions [54].
It is known that the use of the F1 score may be more useful than the accuracy value in cases where the data distributions are not equal. In the classification of surviving patients, all features were found to have a high F1 score for both the original and SMOTE-balanced datasets (Figure 4). In addition, in the classification of patients who died, it was found that the majority of the features had high F1 values for the SMOTE-balanced dataset, while this score was high for only some features in the original dataset (Figure 5).
To select the most important features in defining patient-groups, we defined an additional metric, (F12), equal to the product of the F1 metrics of the two classes (Equation (3)). We tested the HGB model on the original dataset for single (Figure 6 and Table 5) and dual features (Figure 12 and Table 6), although synthetic data on surviving and deceased patients were well predicted by the model (Table A1, Figure 4 and Figure 5). PCT, ferritin, fibrinogen, ESR, PT, and D-dimer were found to be the most important features according to the F12 metric for the histogram-based gradient-boosting model operated with single features in the classification of surviving and deceased patient-groups (Figure 6 and Table 5). In addition, it was observed that PCT and ferritin were the most important feature-pairs in the identification of living and deceased patients (precision > 0.98, recall > 0.98, F12 > 0.98 in both living and deceased patients) (Table 6 and Figure 12). In addition, other feature pairs run with the HGB model produced an F12 value of ≥ 0.94 in identifying patient-groups (Table 6). Accordingly, our HGB model, which was trained with SMOTE, was found to largely accurately identify living and deceased COVID-19 patients. In addition, the performance of the HGB model with a single feature (F12) in the classification of patient groups was found to be more successful than the classification made by considering one and two cut-off-values (Figure 11). In addition, the approach of identifying patient groups based on the relationship structure (Spearman) of the characteristics of the patient groups produced the lowest F12 results (Figure 11).
In this study, in order to determine the critical-risk levels of the features in COVID-19 mortality, the lethal levels of the features were determined with one- and two-threshold approaches (see Section 2.4), and patient groups were classified according to these values [Figure 7 and Figure 8 and Table A2 and Table A3]. According to the one-threshold approach, PT values greater than 13.50 Sec (F12 = 0.58), D-dimer values greater than 1009.99 μg/L (F12 = 0.64), INR values greater than 1.51 (F12 = 0.58), amylase values greater than 76.79 mg/dL (F12 = 0.61) and CK-MB values greater than 18.86 U/L (F12 = 0.60) were found to be lethal critical-levels for COVID-19 mortality. According to the two-threshold approach, it was found that the majority of patients who died had fibrinogen values between 349.98 g/L and 379.05 g/L (F12 = 0.68), D-dimer values between 1009.99 μg/L and 10,742.71 μg/L (F12 = 0.65), ESR values between 36.12 and 56.62 (F12 = 0.70), ferritin values between 376.2 μg/L and 396.0 μg/L (F12 = 0.91) and PCT values between 0.2 μg/L and 5.2 μg/L (F12 = 0.95). It can be said that the determined value ranges of these features are the most important lethal-risk levels. It is noteworthy that procalcitonin and ferritin are the most important feature pairs in the detection of surviving and non-surviving COVID-19 patients, according to both the HGB classifier and the two-threshold approach.
Similar to the findings in this study, many studies have supported the view that any significant increase in PCT levels reflects the development of a critical condition in COVID-19 [55,56,57,58,59]. Lima et al. [60] stated that, due to the characteristic structure of PCT in bacterial and viral infections, it may play a role in the prognosis of COVID-19. Ahmed et al. [55] noted that despite several limitations, elevated PCT levels can be used as a rapid indicator of criticality, a worsening clinical-picture, and even mortality, in COVID-19. Similarly, Lippi et al. [58] stated in a meta-analysis that procalcitonin levels above 0.5 μg/L were correlated with a 5-fold greater risk of serious infection in COVID-19 patients. In another study, Juneja et al. [61] showed that more than 96% of COVID-19 patients with low disease-severity had serum procalcitonin levels of less than 0.5 μg/L, and that these patients had better clinical outcomes. Additionally, Juneja et al. noted that PCT levels above 0.5 μg/L are associated with a more serious COVID-19 illness or secondary bacterial infection [61]. A meta-analysis involving Caucasians and South Asians found a strong association between PCT and the severity of COVID-19 [55]. This multiethnic assessment further reinforces the importance of PCT as a prognostic biomarker in cases of COVID-19. Lippi et al. [58], emphasizing the properties of PCT, its reliable kinetics and the potential relationship of its decreasing levels with infection resolution, stated that this feature may be a promising prognostic biomarker for COVID-19. These results support our results in our study.
In a meta-analysis examining a limited amount of the literature, Henry et al. [62] stated that high-hematological findings detected in COVID-19 patients and an increase in values such as D-dimer and IL-6 were accepted as an indicator of widespread cytokine-release. Similarly, Onur et al. [16] determined that the increase in biochemical parameters such as ferritin, fibrinogen, D-dimer, and troponin measured at the first hospitalization was associated with mortality. In other studies, Perricone et al. [18] and Torti et al. [63] noted that circulating ferritin levels may not only reflect the acute-phase response, but may also play a critical role in inflammation. In addition, some studies reported that ferritin as a signaling molecule may be a direct mediator of the immune system [17,64]. Similar to the ferritin findings in this study, Feld et al. [26] and Kernan and Carcillo. [65] stated that ferritin, the essential intracellular iron-storage protein, is an acute-phase reactant that is elevated in many inflammatory conditions, including acute infections. Onur et al. [16] stated that ferritin, an indicator of systemic inflammation, may be an indicator of disease severity and mortality. Winata and Kurniawan [66] emphasized that D-dimer and fibrinogen degradation product (FDP) are increased in all patients in the late stage of COVID-19. These results suggested that D-dimer and FDP levels were elevated due to increased hypoxia in severe COVID-19 patients, and that these properties were significantly associated with coagulation.
In addition, Huyut et al. [12], Mertoğlu et al. [23] and Huyut and İlkbahar [4] stated that increased fibrinogen, D-dimer, and CRP levels cause widespread inflammation and cytokine storm in severe COVID-19 patients, and they stated that high values of these features will increase mortality. In addition, high PT- and INR-values were interpreted as favoring hypercoagulation in a significant proportion of patients who died in this study. This result supported the idea that the risk of hypercoagulation is high in COVID-19 patients who die. In another study, similar to the findings in this study, increased PT, INR and low aPTT values were interpreted as favoring hypercoagulation in a significant proportion of patients who died [12]. These results contribute to the thought [12,21,23] that cardiovascular pathologies due to coagulation may be increased in patients who die.

5. Limitations of the Study

The data set in this article does not include the comorbidities of the patients and the inpatient/outpatient follow-up. However, in practice, it is seen that a training set collected during a certain time period cannot meet all these demands. In addition, this study was carried out only on the Turkish ethnicity. Results may need to be tested on other populations. However, the histogram-based gradient-boosting-model approach is easy to retrain and test with data from patients of different ethnicities. As more data becomes available, the algorithm will improve in terms of predictive performance of mortality from COVID-19.

6. Conclusions

In this study, the histogram-based gradient-boosting (HGB) model was the most successful ML classifier in detecting surviving and non-surviving COVID-19 patients (F12 = 1). Major changes were observed in many RBV values of patients who died from COVID-19. This situation indicated that self-care insufficiency developed due to the process in patients who died, but it also suggested that important disorders occurred in the functions of many organs such as the liver and kidney. In addition, we can say that an increase or decrease in an RBV value according to the severity of the disease, has a significant effect on the metabolism of another RBV value.
The HGB model, which was run with only procalcitonin and ferritin, correctly detected almost all of the COVID-19 patients, both living and deceased (precision > 0.98, recall > 0.98, F12 > 0.98). In addition, ferritin values between 376.2 μg/L and 396.0 μg/L (F12 = 0.91) and procalcitonin values between 0.2 μg/L and 5.2 μg/L (F12 = 0.95) were found to be fatal risk-levels for COVID-19.
In this study, we suggest that the HGB model and ferritin and procalcitonin properties can be used to obtain highly successful results in predicting the mortality of COVID-19. In addition, it was found to be remarkable that procalcitonin and ferritin were the most important features in the determination of patient groups, both with the HGB model and using the two-threshold approach. Accordingly, we think that the critical levels of ferritin and procalcitonin properties we have determined should be taken into account to reduce the lethality of the COVID-19 disease. These biomarkers and their critical levels can also serve as a risk-stratification tool for resource allocation and aggressive therapeutics, along with clinical details in over-crowded medical centers during the epidemic.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app122312180/s1, SARS-CoV-2-RBV3_Dataset.zip. Kindly cite our paper when you wish to use this dataset.

Author Contributions

Conceptualization, M.T.H.; methodology, M.T.H., A.V. and M.B.; software, M.T.H., A.V. and M.B.; validation, M.T.H.; formal analysis, M.T.H.; investigation, M.T.H., A.V. and M.B.; resources, M.T.H.; data curation, M.T.H.; writing—original draft preparation, M.T.H., A.V. and M.B.; writing—review and editing, M.T.H., A.V. and M.B.; visualization M.T.H., A.V. and M.B.; supervision, M.T.H.; project administration, M.T.H.; funding acquisition, A.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Russian Science Foundation (grant no. 22-11-00055, https://rscf.ru/en/project/22-11-00055/ (accessed on 22 June 2022)).

Institutional Review Board Statement

The dataset used in this study was collected in order to be used in various studies in the estimation of the diagnosis, prognosis and mortality of COVID-19. The necessary permissions for the collected dataset were given by the Ministry of Health of the Republic of Turkey and the Ethics Committee of Erzincan Binali Yıldırım University. This study was conducted in accordance with the 1989 Declaration of Helsinki. Erzincan Binali Yıldırım University Human Research Health and Sports Sciences Ethics Committee Decision Number: 2021/02-07.

Informed Consent Statement

In this study, a dataset including only routine blood-values, RT-PCR results (positive or negative) and treatment units of the patients was downloaded retrospectively from the information system of our hospital, in the digital environment. New samples were not taken from the patients. There is no information in the dataset that includes identifying characteristics of individuals. It was stated that the routine blood-values would only be used in academic studies, and written consent was obtained from the institutions for this. In addition, therefore, written informed consent was not administered to every patient.

Data Availability Statement

The data used in this study can be shared with the parties, provided that the article is cited.

Acknowledgments

We thank the method used by Erzincan Mengücek Gazi Training and Research Hospital for their support in obtaining the material used in this study. Special thanks to the editors of the journal and to the anonymous reviewers, for their constructive criticism and improvement suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The classification results of SARS-CoV-2-RBV3 dataset for the HGB model using a single-input feature. Classification metrics (Precision, Recall, F1, F12) separately for classes (survived COVID-19 and non-survived COVID-19).
Table A1. The classification results of SARS-CoV-2-RBV3 dataset for the HGB model using a single-input feature. Classification metrics (Precision, Recall, F1, F12) separately for classes (survived COVID-19 and non-survived COVID-19).
PrecisionRecallF1F12
FeatureSurv.Non-Surv.Surv.Non-Surv.Surv.Non-Surv.
1ALT0.85580.32450.92920.18030.89090.23140.20615
2AST0.89040.37710.93690.24860.9130.29810.27217
3Albumin0.68320.93410.99090.22160.80860.35810.28956
4ALP0.87940.6230.96040.33310.9180.43240.39694
5Amylase0.91120.96920.99680.51340.9520.6710.63879
6CK-MB0.86550.9170.99090.39750.92390.5540.51184
7D-Bil0.96020.53470.95540.57130.95780.55030.52708
8Glucose0.85830.5350.95040.2670.9020.3560.32111
9Creatinine0.83670.62270.95830.26990.89330.37630.33615
10CK0.82190.76780.97350.29640.89110.42670.38023
11LDH0.84980.87740.98670.36590.91260.51270.46789
12eGFR0.68490.90370.98680.21740.80820.35010.28295
13UA0.88240.75860.97430.38580.9260.51060.47282
14BASO0.99530.00880.91240.07860.9520.01570.01495
15EOS0.98180.0130.91160.05330.94540.02080.01966
16HCT0.90570.09670.91230.08530.90880.08870.08061
17HGB0.98730.02640.91310.15520.94880.0450.0427
18LYM0.84810.19710.91650.10710.88080.13820.12173
19MCH0.95430.10970.91750.19550.93550.13790.12901
20MCHC0.97970.04390.9140.13750.94570.06630.0627
21MCV0.80790.25440.91820.1150.85940.15810.13587
22MONO0.90610.16650.91850.14910.91220.15690.14312
23MPV0.99490.00430.9120.10.95160.00830.0079
24NEU0.49620.69720.94420.11810.65030.20190.1313
25PLT0.86970.16710.91540.11030.89180.13190.11763
26RBC0.88370.11850.91220.08880.89760.10070.09039
27RDW0.90820.24540.92580.20670.91690.22410.20548
28WBC0.91540.17980.92060.16290.91780.16780.15401
29CRP0.73860.68880.9610.20340.8350.31360.26186
30D-dimer0.9370.8420.9840.56770.95990.67690.64976
31Ferritin0.98520.92530.99280.86550.98890.89150.8816
32Fibrinogen0.98050.95620.99570.82740.98810.88630.87575
33INR0.93150.8470.98440.5480.95710.6630.63456
34PT0.9590.82530.98280.65880.97070.73120.70978
35Procalcitonin0.99920.93860.99410.9910.99660.96350.96022
36ESR0.9670.86840.98710.72290.97690.78680.76862
37Troponin0.9670.28980.93390.46990.95010.35470.337
38aPTT0.88370.88620.98780.42660.93270.57470.53602
Table A2. Results of classification according to the one-threshold approach of surviving and non-surviving COVID-19 patients, for the balanced dataset.
Table A2. Results of classification according to the one-threshold approach of surviving and non-surviving COVID-19 patients, for the balanced dataset.
PrecisionRecallF1F12
Feature (Units)TypeVthAthSurv.Non-Surv.Surv.Non-Surv.Surv.Non-Surv.
1ALT (U/L)134.840.6490.6480.6650.9520.1570.7710.2540.19583
2AST (U/L)233.4720.8060.8390.4760.9420.2260.8870.3060.27142
3Albumin (g/L)249.080.9180.9880.2060.9270.6320.9560.3110.29732
4ALP (U/L)285.3050.8680.8930.6180.960.3620.9250.4560.4218
5Amylase (U/L)276.790.9360.9660.6270.9630.6460.9650.6360.61374
6CK-MB (U/L)218.860.920.9350.7640.9760.5380.9550.6310.6026
7D-Bil. (mg/dL)20.129850.8420.8540.7250.9690.3280.9080.4520.41042
8Glucose (mg/dL)2136.8540.8340.8620.5540.9510.2830.9040.3740.3381
9Creatinine (mg/dL)21.166560.8770.9180.4640.9460.3580.9320.4040.37653
10CK (U/L)2116.10.8870.9120.6310.9620.4140.9360.50.468
11LDH (U/L)2253.260.8740.8750.8670.9850.4060.9270.5530.51263
12eGFR182.574290.770.7720.7510.9690.2450.8590.3690.31697
13UA (mg/dL)239.010.8180.8240.7550.9720.2980.8920.4270.38088
14BASO (103/μL)20.010260.360.3310.6570.9070.0880.4850.1550.07517
15EOS (103/μL)20.013230.3680.3440.6140.90.0840.4980.1480.0737
16HCT (%)144.09460.2610.2030.8540.9340.0960.3340.1720.05745
17HGB (g/L)115.39720.2290.1620.9060.9460.0960.2770.1740.0482
18LYM (103/μL)11.726720.4140.3840.7120.9310.1020.5440.1790.09738
19MCH (pg)229.60580.7210.7610.3180.9190.1160.8320.170.14144
20MCHC (g/dL)133.6960.560.560.5580.9280.1110.6990.1850.12931
21MCV (fL)283.74560.5030.4890.6390.9320.110.6420.1870.12005
22MONO (103/μL)20.450780.4220.3920.730.9360.1060.5530.1850.10231
23MPV (fL)111.09880.2650.2140.7850.910.090.3460.1610.05571
24NEU (103/μL)24.3790.5710.560.6910.9480.1340.7040.2240.1577
25PLT (103/μL)1245.850.4510.4230.730.9410.1110.5840.1930.11271
26RBC (106/μL)15.068440.340.2940.8030.9380.1010.4480.1790.08019
27RDW (%)213.20960.5980.5850.730.9560.1480.7260.2460.1786
28WBC (103/μL)26.20060.4920.4680.7380.9480.120.6260.2070.12958
29CRP (mg/L)219.4880.720.7190.7380.9650.2050.8240.3210.2645
30D-dimer (μg/L)21009.9980.920.9220.9060.990.5330.9550.6710.6408
31Ferritin (μg/L)2376.20.8780.8710.940.9930.4190.9280.5790.53731
32Fibrinogen (mg/dL)2349.986080.8340.820.9790.9970.3490.90.5150.4635
33INR21.151510.9090.9180.8110.980.4950.9480.6150.58302
34PT (Sec)213.505120.9010.9030.880.9870.4710.9430.6140.579
35PCT (ng/mL)20.20.8820.8780.9230.9910.4270.9310.5830.54277
36ESR (nm/hr)236.1250.8830.880.9180.9910.430.9320.5850.54522
37Troponin (ng/L)213.20.9060.9680.2790.9320.4610.9490.3480.33025
38aPTT (Sec)132.45940.8750.870.9310.9920.4130.9270.5730.53117
Table A3. Results of classification according to the two-threshold approach of surviving and non-surviving COVID-19 patients, for the balanced dataset.
Table A3. Results of classification according to the two-threshold approach of surviving and non-surviving COVID-19 patients, for the balanced dataset.
PrecisionRecallF1F12
Feature (Units)TypeVth_1Vth_2AthSurv.Non-Surv.Surv.Non-Surv.Surv.Non-Surv.
1ALT (U/L)134.8435.360.5180.4830.8760.9750.1430.6460.2460.15892
2AST (U/L)132.94933.4720.5360.4920.9790.9960.160.6590.2750.18123
3Albumin (g/L)136.8149.080.8080.8260.6270.9570.2620.8870.370.32819
4ALP (U/L)183.58285.3050.7250.7010.970.9960.2420.8230.3880.31932
5Amylase (U/L)172.9276.790.9220.9170.9740.9970.5350.9550.6910.6599
6CK-MB (U/L)118.418.860.8320.8160.9960.9990.3470.8980.5150.46247
7D-Bil. (mg/dL)10.049950.129850.8360.8450.7470.9710.3220.9040.450.4068
8Glucose (mg/dL)1135.631136.8540.5570.5140.9910.9980.1670.6790.2860.19419
9Creatinine (mg/dL)10.964921.166560.6170.5950.8450.9750.1710.7390.2840.20988
10CK (U/L)192.88116.10.6630.6360.9310.9890.2010.7740.3310.25619
11LDH (U/L)2253.26597.640.8760.8770.8670.9850.4110.9280.5570.5169
12eGFR182.5742146.22500.770.7720.7550.970.2460.8590.3710.31869
13UA (mg/dL)1039.010.8180.8240.7550.9720.2980.8920.4270.38088
14BASO (103/μL)10.009880.010260.3220.2770.7770.9260.0960.4260.170.07242
15EOS (103/μL)20.013230.119070.5740.5960.3520.9030.0790.7180.1290.09262
16HCT (%)230.125744.09460.2930.2460.7680.9150.0910.3880.1630.06324
17HGB (g/L)29.512815.39720.2520.1930.850.9290.0940.3190.1690.05391
18LYM (103/μL)20.593561.726720.4810.4660.6350.9280.1050.620.180.1116
19MCH (pg)229.605835.67060.7220.7620.3130.9180.1150.8330.1680.13994
20MCHC (g/dL)228.43133.6960.5620.5630.5580.9280.1120.7010.1860.13039
21MCV (fL)283.7456113.06240.5030.4890.6390.9320.110.6420.1880.1207
22MONO (103/μL)20.450786.700230.4230.3930.730.9360.1060.5540.1850.10249
23MPV (fL)29.901811.09880.5390.5490.4380.9080.0870.6850.1460.10001
24NEU (103/μL)24.37924.8530.5730.5610.6910.9480.1340.7050.2250.15862
25PLT (103/μL)2108.025245.850.4740.4530.6870.9360.110.6110.190.11609
26RBC (106/μL)20.007225.068440.340.2950.8030.9380.1010.4490.1790.08037
27RDW (%)213.209621.17120.6030.5920.7170.9550.1480.7310.2450.1791
28WBC (103/μL)26.200644.0540.4930.4690.7380.9480.120.6270.2070.12979
29CRP (mg/L)219.488252.9380.7220.7210.730.9640.2050.8250.320.264
30D-dimer (μg/L)21009.9910742.700.9230.9250.9060.990.5440.9560.680.65008
31Ferritin (μg/L)2376.23960.9840.9890.9310.9930.8970.9910.9140.90577
32Fibrinogen (mg/dL)2349.986379.0540.9270.9230.970.9970.5530.9580.7040.67443
33INR21.1515110.47530.910.920.8110.980.50.9490.6190.58743
34PT (Sec)213.5051110.09500.9020.9040.880.9870.4750.9440.6170.58245
35PCT (ng/mL)20.25.20.99210.9180.9920.9950.9960.9550.95118
36ESR (nm/hr)236.12556.6250.9390.9440.8880.9880.6090.9660.7230.69842
37Troponin (ng/L)213.23269.20.9060.9690.2750.9310.4640.950.3450.32775
38aPTT (Sec)222.158232.45940.8780.8730.9270.9920.4190.9290.5770.53603

References

  1. Huyut, M.T.; Soygüder, S. The Multi-Relationship Structure between Some Symptoms and Features Seen during the New Coronavirus 19 Infection and the Levels of Anxiety and Depression Post-Covid. East J. Med. 2022, 27, 1–10. [Google Scholar] [CrossRef]
  2. Huyut, M.T.; Kocaturk, İ. The Effect of Some Symptoms and Features During the Infection Period on the Level of Anxiety and Depression of Adults After Recovery From COVID-19. Curr. Psychiatry Res. Rev. 2022, 18, 151–163. [Google Scholar] [CrossRef]
  3. Huyut, M.T. Automatic Detection of Severely and Mildly Infected COVID-19 Patients with Supervised Machine Learning Models. IRBM 2022, 1, 1–12. [Google Scholar] [CrossRef] [PubMed]
  4. Huyut, M.T.; İlkbahar, F. The Effectiveness of Blood Routine Parameters and Some Biomarkers as a Potential Diagnostic Tool in the Diagnosis and Prognosis of Covid-19 Disease. Int. Immunopharmacol. 2021, 98, 107838. [Google Scholar] [CrossRef]
  5. Feigin, E.; Levinson, T.; Wasserman, A.; Shenhar-Tsarfaty, S.; Berliner, S.; Ziv-Baran, T. Age-Dependent Biomarkers for Prediction of In-Hospital Mortality in COVID-19 Patients. J. Clin. Med. 2022, 11, 2682. [Google Scholar] [CrossRef] [PubMed]
  6. Ciotti, M.; Angeletti, S.; Minieri, M.; Giovannetti, M.; Benvenuto, D.; Pascarella, S.; Sagnelli, C.; Bianchi, M.; Bernardini, S.; Ciccozzi, M. COVID-19 Outbreak: An Overview. Chemotherapy 2020, 64, 215–223. [Google Scholar] [CrossRef]
  7. Richardson, S.; Hirsch, J.S.; Narasimhan, M.; Crawford, J.M.; McGinn, T.; Davidson, K.W.; Barnaby, D.P.; Becker, L.B.; Chelico, J.D.; Cohen, S.L.; et al. Presenting Characteristics, Comorbidities, and Outcomes among 5700 Patients Hospitalized with COVID-19 in the New York City Area. JAMA J. Am. Med. Assoc. 2020, 323, 2052–2059. [Google Scholar] [CrossRef]
  8. Asch, D.A.; Sheils, N.E.; Islam, M.N.; Chen, Y.; Werner, R.M.; Buresh, J.; Doshi, J.A. Variation in US Hospital Mortality Rates for Patients Admitted with COVID-19 during the First 6 Months of the Pandemic. JAMA Intern. Med. 2021, 181, 471–478. [Google Scholar] [CrossRef]
  9. Strålin, K.; Wahlström, E.; Walther, S.; Bennet-Bark, A.M.; Heurgren, M.; Lindén, T.; Holm, J.; Hanberger, H. Mortality Trends among Hospitalised COVID-19 Patients in Sweden: A Nationwide Observational Cohort Study. Lancet Reg. Health Eur. 2021, 4, 100054. [Google Scholar] [CrossRef]
  10. Strålin, K.; Wahlström, E.; Walther, S.; Bennet-Bark, A.M.; Heurgren, M.; Lindén, T.; Holm, J.; Hanberger, H. Mortality in Hospitalized COVID-19 Patients Was Associated with the COVID-19 Admission Rate during the First Year of the Pandemic in Sweden. Infect. Dis. 2022, 54, 145–151. [Google Scholar] [CrossRef]
  11. Zheng, Y.; Zhang, Y.; Chi, H.; Chen, S.; Peng, M.; Luo, L.; Chen, L.; Li, J.; Shen, B.; Wang, D. The Hemocyte Counts as a Potential Biomarker for Predicting Disease Progression in COVID-19: A Retrospective Study. Clin. Chem. Lab. Med. 2020, 58, 1106–1115. [Google Scholar] [CrossRef]
  12. Huyut, M.T.; Huyut, Z.; İlkbahar, F.; Mertoğlu, C. What Is the Impact and Efficacy of Routine Immunological, Biochemical and Hematological Biomarkers as Predictors of COVID-19 Mortality? Int. Immunopharmacol. 2022, 105, 108542. [Google Scholar] [CrossRef]
  13. Huyut, M.; Üstündaǧ, H. Prediction of Diagnosis and Prognosis of COVID-19 Disease by Blood Gas Parameters Using Decision Trees Machine Learning Model: A Retrospective Observational Study. Med. Gas Res. 2022, 12, 60–66. [Google Scholar] [CrossRef]
  14. Chen, N.; Zhou, M.; Dong, X.; Qu, J.; Gong, F.; Han, Y.; Qiu, Y.; Wang, J.; Liu, Y.; Wei, Y.; et al. Epidemiological and Clinical Characteristics of 99 Cases of 2019 Novel Coronavirus Pneumonia in Wuhan, China: A Descriptive Study. Lancet 2020, 395, 507–513. [Google Scholar] [CrossRef] [Green Version]
  15. Huyut, M.T.; Huyut, Z. Forecasting of Oxidant/Antioxidant Levels of COVID-19 Patients by Using Expert Models with Biomarkers Used in the Diagnosis/Prognosis of COVID-19. Int. Immunopharmacol. 2021, 100, 108127. [Google Scholar] [CrossRef]
  16. Tural Onur, S.; Altın, S.; Sokucu, S.N.; Fikri, B.İ.; Barça, T.; Bolat, E.; Toptaş, M. Could Ferritin Level Be an Indicator of COVID-19 Disease Mortality? J. Med. Virol. 2021, 93, 1672–1677. [Google Scholar] [CrossRef]
  17. Gómez-Pastora, J.; Weigand, M.; Kim, J.; Wu, X.; Strayer, J.; Palmer, A.F.; Zborowski, M.; Yazer, M.; Chalmers, J.J. Hyperferritinemia in Critically Ill COVID-19 Patients–Is Ferritin the Product of Inflammation or a Pathogenic Mediator? Clin. Chim. Acta 2020, 509, 249–251. [Google Scholar] [CrossRef]
  18. Perricone, C.; Bartoloni, E.; Bursi, R.; Cafaro, G.; Guidelli, G.M.; Shoenfeld, Y.; Gerli, R. COVID-19 as Part of the Hyperferritinemic Syndromes: The Role of Iron Depletion Therapy. Immunol. Res. 2020, 68, 213–224. [Google Scholar] [CrossRef]
  19. Luo, X.; Zhou, W.; Yan, X.; Guo, T.; Wang, B.; Xia, H.; Ye, L.; Xiong, J.; Jiang, Z.; Liu, Y.; et al. Prognostic Value of C-Reactive Protein in Patients with Coronavirus 2019. Clin. Infect. Dis. 2020, 71, 2174–2179. [Google Scholar] [CrossRef]
  20. Cecconi, M.; Piovani, D.; Brunetta, E.; Aghemo, A.; Greco, M.; Ciccarelli, M.; Angelini, C.; Voza, A.; Omodei, P.; Vespa, E.; et al. Early Predictors of Clinical Deterioration in a Cohort of 239 Patients Hospitalized for Covid-19 Infection in Lombardy, Italy. J. Clin. Med. 2020, 9, 1548. [Google Scholar] [CrossRef]
  21. Mertoglu, C.; Huyut, M.T.; Arslan, Y.; Ceylan, Y.; Coban, T.A. How Do Routine Laboratory Tests Change in Coronavirus Disease 2019? Scand. J. Clin. Lab. Investig. 2021, 81, 24–33. [Google Scholar] [CrossRef] [PubMed]
  22. Huyut, M.T.; Velichko, A. Diagnosis and Prognosis of COVID-19 Disease Using Routine Blood Values and LogNNet Neural Network. Sensors 2022, 22, 4820. [Google Scholar] [CrossRef] [PubMed]
  23. Mertoglu, C.; Huyut, M.T.; Olmez, H.; Tosun, M.; Kantarci, M.; Coban, T. COVID-19 Is More Dangerous for Older People and Its Severity Is Increasing: A Case-Control Study. Med. Gas Res. 2022, 12, 51–54. [Google Scholar] [CrossRef] [PubMed]
  24. Zhang, J.J.; Cao, Y.Y.; Tan, G.; Dong, X.; Wang, B.C.; Lin, J.; Yan, Y.Q.; Liu, G.H.; Akdis, M.; Akdis, C.A.; et al. Clinical, Radiological, and Laboratory Characteristics and Risk Factors for Severity and Mortality of 289 Hospitalized COVID-19 Patients. Allergy Eur. J. Allergy Clin. Immunol. 2021, 76, 533–550. [Google Scholar] [CrossRef] [PubMed]
  25. Ponti, G.; Maccaferri, M.; Ruini, C.; Tomasi, A.; Ozben, T. Biomarkers Associated with COVID-19 Disease Progression. Crit. Rev. Clin. Lab. Sci. 2020, 57, 389–399. [Google Scholar] [CrossRef]
  26. Feld, J.; Tremblay, D.; Thibaud, S.; Kessler, A.; Naymagon, L. Ferritin Levels in Patients with COVID-19: A Poor Predictor of Mortality and Hemophagocytic Lymphohistiocytosis. Int. J. Lab. Hematol. 2020, 42, 773–779. [Google Scholar] [CrossRef]
  27. Hou, H.; Zhang, B.; Huang, H.; Luo, Y.; Wu, S.; Tang, G.; Liu, W.; Mao, L.; Mao, L.; Wang, F.; et al. Using IL-2R/Lymphocytes for Predicting the Clinical Progression of Patients with COVID-19. Clin. Exp. Immunol. 2020, 201, 76–84. [Google Scholar] [CrossRef]
  28. Kaushal, K.; Kaur, H.; Sarma, P.; Bhattacharyya, A.; Sharma, D.J.; Prajapat, M.; Pathak, M.; Kothari, A.; Kumar, S.; Rana, S.; et al. Serum Ferritin as a Predictive Biomarker in COVID-19. A Systematic Review, Meta-Analysis and Meta-Regression Analysis. J. Crit. Care 2022, 67, 172–181. [Google Scholar] [CrossRef]
  29. Cheng, L.; Li, H.; Li, L.; Liu, C.; Yan, S.; Chen, H.; Li, Y. Ferritin in the Coronavirus Disease 2019 (COVID-19): A Systematic Review and Meta-Analysis. J. Clin. Lab. Anal. 2020, 34, 1–18. [Google Scholar] [CrossRef]
  30. Kukar, M.; Gunčar, G.; Vovko, T.; Podnar, S.; Černelč, P.; Brvar, M.; Zalaznik, M.; Notar, M.; Moškon, S.; Notar, M. COVID-19 Diagnosis by Routine Blood Tests Using Machine Learning. Sci. Rep. 2021, 11, 10738. [Google Scholar] [CrossRef]
  31. Podnar, S.; Kukar, M.; Gunčar, G.; Notar, M.; Gošnjak, N.; Notar, M. Diagnosing Brain Tumours by Routine Blood Tests Using Machine Learning. Sci. Rep. 2019, 9, 1–7. [Google Scholar] [CrossRef] [Green Version]
  32. Velichko, A.; Huyut, M.T.; Belyaev, M.; Izotov, Y.; Korzun, D. Machine Learning Sensors for Diagnosis of COVID-19 Disease Using Routine Blood Values for Internet of Things Application. Sensors 2022, 22, 7886. [Google Scholar] [CrossRef]
  33. Booth, A.L.; Abels, E.; McCaffrey, P. Development of a Prognostic Model for Mortality in COVID-19 Infection Using Machine Learning. Mod. Pathol. 2021, 34, 522–531. [Google Scholar] [CrossRef]
  34. Luo, Y.; Szolovits, P.; Dighe, A.S.; Baron, J.M. Using Machine Learning to Predict Laboratory Test Results. Am. J. Clin. Pathol. 2016, 145, 778–788. [Google Scholar] [CrossRef] [Green Version]
  35. Doğanay, F.; Elkonca, F.; Seyhan, A.U.; Yılmaz, E.; Batırel, A.; Ak, R. Shock Index as a Predictor of Mortality among the Covid-19 Patients. Am. J. Emerg. Med. 2021, 40, 106–109. [Google Scholar] [CrossRef]
  36. Zhang, S.; Huang, S.; Liu, J.; Dong, X.; Meng, M.; Chen, L.; Wen, Z.; Zhang, L.; Chen, Y.; Du, H.; et al. Identification and Validation of Prognostic Factors in Patients with COVID-19: A Retrospective Study Based on Artificial Intelligence Algorithms. J. Intensive Med. 2021, 1, 103–109. [Google Scholar] [CrossRef]
  37. Formica, V.; Minieri, M.; Bernardini, S.; Ciotti, M.; D’Agostini, C.; Roselli, M.; Andreoni, M.; Morelli, C.; Parisi, G.; Federici, M.; et al. Complete Blood Count Might Help to Identify Subjects with High Probability of Testing Positive to SARS-CoV-2. Clin. Med. J. R. Coll. Physicians Lond. 2020, 20, 114–119. [Google Scholar] [CrossRef]
  38. Banerjee, A.; Ray, S.; Vorselaars, B.; Kitson, J.; Mamalakis, M.; Weeks, S.; Baker, M.; Mackenzie, L.S. Use of Machine Learning and Artificial Intelligence to Predict SARS-CoV-2 Infection from Full Blood Counts in a Population. Int. Immunopharmacol. 2020, 86, 6705. [Google Scholar] [CrossRef]
  39. Avila, E.; Kahmann, A.; Alho, C.; Dorn, M. Hemogram Data as a Tool for Decision-Making in COVID-19 Management: Applications to Resource Scarcity Scenarios. PeerJ 2020, 2020, 9482. [Google Scholar] [CrossRef]
  40. Joshi, R.P.; Pejaver, V.; Hammarlund, N.E.; Sung, H.; Lee, S.K.; Furmanchuk, A.; Lee, H.-Y.; Scott, G.; Gombar, S.; Shah, N.; et al. A Predictive Tool for Identification of SARS-CoV-2 PCR-Negative Emergency Department Patients Using Routine Test Results. J. Clin. Virol. 2020, 129, 104502. [Google Scholar] [CrossRef]
  41. Zhu, J.S.; Ge, P.; Jiang, C.; Zhang, Y.; Li, X.; Zhao, Z.; Zhang, L.; Duong, T.Q. Deep-learning Artificial Intelligence Analysis of Clinical Variables Predicts Mortality in COVID-19 Patients. J. Am. Coll. Emerg. Physicians Open 2020, 1, 1364–1373. [Google Scholar] [CrossRef] [PubMed]
  42. Soltan, A.A.; Kouchaki, S.; Zhu, T.; Kiyasseh, D.; Taylor, T.; Hussain, Z.B.; Peto, T.; Brent, A.J.; Eyre, D.W.; Clifton, D. Artificial Intelligence Driven Assessment of Routinely Collected Healthcare Data Is an Effective Screening Test for COVID-19 in Patients Presenting to Hospital. medRxiv 2020, medRxiv:20148361. [Google Scholar]
  43. Soares, F. A Novel Specific Artificial Intelligence-Based Method to Identify COVID-19 Cases Using Simple Blood Exams. medRxiv 2020, medRxiv:20061036. [Google Scholar]
  44. Shapiro, S.S.; Wilk, M.B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika 1965, 52, 591. [Google Scholar] [CrossRef]
  45. Mann, H.B.; Whitney, D.R. On a Test of Whether One of Two Random Variables Is Stochastically Larger than the Other. Ann. Math. Stat. 1947, 18, 50–60. [Google Scholar] [CrossRef]
  46. Lippi, G.; Plebani, M.; Henry, B.M. Thrombocytopenia Is Associated with Severe Coronavirus Disease 2019 (COVID-19) Infections: A Meta-Analysis. Clin. Chim. Acta 2020, 506, 145–148. [Google Scholar] [CrossRef]
  47. Vafadar Moradi, E.; Teimouri, A.; Rezaee, R.; Morovatdar, N.; Foroughian, M.; Layegh, P.; Rezvani Kakhki, B.; Ahmadi Koupaei, S.R.; Ghorani, V. Increased Age, Neutrophil-to-Lymphocyte Ratio (NLR) and White Blood Cells Count Are Associated with Higher COVID-19 Mortality. Am. J. Emerg. Med. 2021, 40, 11–14. [Google Scholar] [CrossRef]
  48. Guzik, T.J.; Mohiddin, S.A.; Dimarco, A.; Patel, V.; Savvatis, K.; Marelli-Berg, F.M.; Madhur, M.S.; Tomaszewski, M.; Maffia, P.; D’Acquisto, F.; et al. COVID-19 and the Cardiovascular System: Implications for Risk Assessment, Diagnosis, and Treatment Options. Cardiovasc. Res. 2020, 116, 1666–1687. [Google Scholar] [CrossRef]
  49. Song, H.; Kim, H.J.; Park, K.N.; Kim, S.H.; Oh, S.H.; Youn, C.S. Neutrophil to Lymphocyte Ratio Is Associated with In-Hospital Mortality in Older Adults Admitted to the Emergency Department. Am. J. Emerg. Med. 2021, 40, 133–137. [Google Scholar] [CrossRef]
  50. Mo, P.; Xing, Y.; Xiao, Y.; Deng, L.; Zhao, Q.; Wang, H.; Xiong, Y.; Cheng, Z.; Gao, S.; Liang, K.; et al. Clinical Characteristics of Refractory Coronavirus Disease 2019 in Wuhan, China. Clin. Infect. Dis. 2021, 73, E4208–E4213. [Google Scholar] [CrossRef] [Green Version]
  51. Tan, L.; Wang, Q.; Zhang, D.; Ding, J.; Huang, Q.; Tang, Y.Q.; Wang, Q.; Miao, H. Lymphopenia Predicts Disease Severity of COVID-19: A Descriptive and Predictive Study. Signal Transduct. Target. Ther. 2020, 5, 16–18. [Google Scholar] [CrossRef]
  52. Wu, C.; Chen, X.; Cai, Y.; Xia, J.; Zhou, X.; Xu, S.; Huang, H.; Zhang, L.; Zhou, X.; Du, C.; et al. Risk Factors Associated with Acute Respiratory Distress Syndrome and Death in Patients with Coronavirus Disease 2019 Pneumonia in Wuhan, China. JAMA Intern. Med. 2020, 180, 934–943. [Google Scholar] [CrossRef]
  53. Zhu, N.; Zhang, D.; Wang, W.; Li, X.; Yang, B.; Song, J.; Zhao, X.; Huang, B.; Shi, W.; Lu, R.; et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N. Engl. J. Med. 2020, 382, 727–733. [Google Scholar] [CrossRef]
  54. Pribadi, R.R.; Simadibrata, M. Increased Serum Amylase and/or Lipase in Coronavirus Disease 2019 (COVID-19) Patients: Is It Really Pancreatic Injury? JGH Open 2021, 5, 190–192. [Google Scholar] [CrossRef]
  55. Ahmed, S.; Jafri, L.; Hoodbhoy, Z.; Siddiqui, I. Prognostic Value of Serum Procalcitonin in Covid-19 Patients: A Systematic Review. Indian J. Crit. Care Med. 2021, 25, 77–84. [Google Scholar] [CrossRef]
  56. Li, X.; Wang, L.; Yan, S.; Yang, F.; Xiang, L.; Zhu, J.; Shen, B.; Gong, Z. Clinical Characteristics of 25 Death Cases with COVID-19: A Retrospective Review of Medical Records in a Single Medical Center, Wuhan, China. Int. J. Infect. Dis. 2020, 94, 128–132. [Google Scholar] [CrossRef]
  57. Ke, C.; Wang, Y.; Zeng, X.; Yang, C.; Hu, Z. 2019 Novel Coronavirus Disease (COVID-19) in Hemodialysis Patients: A Report of Two Cases. Clin. Biochem. 2020, 81, 9–12. [Google Scholar] [CrossRef]
  58. Lippi, G.; Plebani, M. Procalcitonin in Patients with Severe Coronavirus Disease 2019 (COVID-19): A Meta-Analysis. Clin. Chim. Acta 2020, 505, 190–191. [Google Scholar] [CrossRef]
  59. Lin, L.; Lu, L.; Cao, W.; Li, T. Hypothesis for Potential Pathogenesis of SARS-CoV-2 Infection–a Review of Immune Changes in Patients with Viral Pneumonia. Emerg. Microbes Infect. 2020, 9, 727–732. [Google Scholar] [CrossRef] [Green Version]
  60. De Sousa Lima, M.E.; Barros, L.C.M.; Aragão, G.F. Could Autism Spectrum Disorders Be a Risk Factor for COVID-19? Med. Hypotheses 2020, 144, 109899. [Google Scholar] [CrossRef]
  61. Juneja, D.; Savio, R.D.; Srinivasan, S.; Pandit, R.A.; Ramasubban, S.; Reddy, P.K.; Singh, M.; Gopal, P.B.N.; Chaudhry, D.; Govil, D.; et al. Basic Critical Care for Management of COVID-19 Patients: Position Paper of Indian Society of Critical Care Medicine, Part-I. Indian J. Crit. Care Med. 2020, 24, S244–S253. [Google Scholar] [CrossRef] [PubMed]
  62. Henry, B.M.; De Oliveira, M.H.S.; Benoit, S.; Plebani, M.; Lippi, G. Hematologic, Biochemical and Immune Biomarker Abnormalities Associated with Severe Illness and Mortality in Coronavirus Disease 2019 (COVID-19): A Meta-Analysis. Clin. Chem. Lab. Med. 2020, 58, 1021–1028. [Google Scholar] [CrossRef] [PubMed]
  63. Torti, F.M.; Torti, S.V. Regulation of Ferritin Genes and Protein. Blood 2002, 99, 3505–3516. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  64. Rosário, C.; Zandman-Goddard, G.; Meyron-Holtz, E.G.; D’Cruz, D.P.; Shoenfeld, Y. The Hyperferritinemic Syndrome: Macrophage Activation Syndrome, Still’s Disease, Septic Shock and Catastrophic Antiphospholipid Syndrome. BMC Med. 2013, 11, 185. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  65. Kernan, K.F.; Carcillo, J.A. Hyperferritinemia and Inflammation. Int. Immunol. 2017, 29, 401–409. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  66. Winata, S.; Kurniawan, A. Coagulopathy in COVID-19: A Systematic Review. Medicinus 2021, 8, 72. [Google Scholar] [CrossRef]
Figure 1. Pearson, Spearman and Kendall correlations of the SARS-CoV-2-RBV3 dataset for COVID-19 mortality-feature pairs.
Figure 1. Pearson, Spearman and Kendall correlations of the SARS-CoV-2-RBV3 dataset for COVID-19 mortality-feature pairs.
Applsci 12 12180 g001
Figure 2. Spearman correlation analysis results for (a) the entire database, (b) survived COVID-19 class, and (c) non-survived COVID-19 class from the SARS-CoV-2-RBV3 dataset.
Figure 2. Spearman correlation analysis results for (a) the entire database, (b) survived COVID-19 class, and (c) non-survived COVID-19 class from the SARS-CoV-2-RBV3 dataset.
Applsci 12 12180 g002
Figure 3. Performance of ML models in classifying surviving and non-surviving COVID-19 patients, using the 34 features.
Figure 3. Performance of ML models in classifying surviving and non-surviving COVID-19 patients, using the 34 features.
Applsci 12 12180 g003
Figure 4. F1 metrics for survived-COVID-19 class, calculated for original and SMOTE-balanced datasets.
Figure 4. F1 metrics for survived-COVID-19 class, calculated for original and SMOTE-balanced datasets.
Applsci 12 12180 g004
Figure 5. F1 metric for non-survived-COVID-19 class, calculated for original and SMOTE-balanced datasets.
Figure 5. F1 metric for non-survived-COVID-19 class, calculated for original and SMOTE-balanced datasets.
Applsci 12 12180 g005
Figure 6. F12 metric of the HGB model according to each feature for the detection of surviving and non-surviving COVID-19 patients.
Figure 6. F12 metric of the HGB model according to each feature for the detection of surviving and non-surviving COVID-19 patients.
Applsci 12 12180 g006
Figure 7. The F12 metric for the classification of surviving and non-surviving COVID-19 patients, according to a single feature for the one-threshold approach, with dependency-type visualization (Type 1, Type 2).
Figure 7. The F12 metric for the classification of surviving and non-surviving COVID-19 patients, according to a single feature for the one-threshold approach, with dependency-type visualization (Type 1, Type 2).
Applsci 12 12180 g007
Figure 8. The F12 metric for the classification of surviving and non-surviving COVID-19 patients, according to a single feature for the two-threshold approach, with dependency-type visualization (Type 1, Type 2).
Figure 8. The F12 metric for the classification of surviving and non-surviving COVID-19 patients, according to a single feature for the two-threshold approach, with dependency-type visualization (Type 1, Type 2).
Applsci 12 12180 g008
Figure 9. Histogram distributions and F12 results of (a) procalcitonin, (b) ferritin and (c) fibrinogen properties, according to the single-cut-off value approach in estimating COVID-19 mortality. Vth (blue line) is the threshold for detecting COVID-19 mortality.
Figure 9. Histogram distributions and F12 results of (a) procalcitonin, (b) ferritin and (c) fibrinogen properties, according to the single-cut-off value approach in estimating COVID-19 mortality. Vth (blue line) is the threshold for detecting COVID-19 mortality.
Applsci 12 12180 g009
Figure 10. Histogram distributions and F12 results of amylase feature according to two-threshold value approach in estimating COVID-19 mortality. Vth_1 (pink line) and Vth_2 (blue line) is the threshold for detecting COVID-19 mortality.
Figure 10. Histogram distributions and F12 results of amylase feature according to two-threshold value approach in estimating COVID-19 mortality. Vth_1 (pink line) and Vth_2 (blue line) is the threshold for detecting COVID-19 mortality.
Applsci 12 12180 g010
Figure 11. F12 metric of SARS-CoV-2-RBV3 dataset for different models.
Figure 11. F12 metric of SARS-CoV-2-RBV3 dataset for different models.
Applsci 12 12180 g011
Figure 12. Feature pairs with the highest F12 value that was found with the HGB classifier for detection of surviving and non-surviving COVID-19 patients.
Figure 12. Feature pairs with the highest F12 value that was found with the HGB classifier for detection of surviving and non-surviving COVID-19 patients.
Applsci 12 12180 g012
Figure 13. (a) Distribution of the procalcitonin feature in the original data of patients who survived and those who died from COVID-19, and the two-threshold value for this feature in classification. (b) The 1D masking technique for classifying patient-groups in the HGB model operated with the procalcitonin feature.
Figure 13. (a) Distribution of the procalcitonin feature in the original data of patients who survived and those who died from COVID-19, and the two-threshold value for this feature in classification. (b) The 1D masking technique for classifying patient-groups in the HGB model operated with the procalcitonin feature.
Applsci 12 12180 g013
Figure 14. Distributions of non-surviving and surviving COVID-19 patients over the original data on D-dimer-ferritin (a) and CK-MCH (c) feature pairs. The 2D-masking technique for patient-group classification of the HGB model operated with D-dimer-ferritin (b) and CK-MCH (d) feature pairs.
Figure 14. Distributions of non-surviving and surviving COVID-19 patients over the original data on D-dimer-ferritin (a) and CK-MCH (c) feature pairs. The 2D-masking technique for patient-group classification of the HGB model operated with D-dimer-ferritin (b) and CK-MCH (d) feature pairs.
Applsci 12 12180 g014
Table 1. Feature-numbering for SARS-CoV-2-RBV3 dataset.
Table 1. Feature-numbering for SARS-CoV-2-RBV3 dataset.
FeatureFeatureFeatureFeature
1ALT11LDH21MCV31Ferritin
2AST12eGFR22MONO32Fibrinogen
3Albumin13UA23MPV33INR
4ALP14BASO24NEU34PT
5Amylase15EOS25PLT35PCT
6CK-MB16HCT26RBC36ESR
7D-Bil17HGB27RDW37Troponin
8Glucose18LYM28WBC38aPTT
9Creatinine19MCH29CRP
10CK20MCHC30D-dimer
ALT: alanine aminotransaminase; AST: aspartate aminotransferase; ALP: alkaline phosphatase; CK-MB: creatine kinase myocardial band; D-Bil: direct bilirubin; CK: creatinine kinase; LDH: lactate dehydrogenase; eGFR; estimated glomerular filtration rate; UA: uric acid; BASO: basophil count; EOS: eosinophil count; HCT: hematocrit; HGB: hemoglobin; LYM: lymphocyte count; MCH: mean corpuscular hemoglobin; MCHC: mean corpuscular hemoglobin concentration; MCV: mean corpuscular volume; MONO: monocyte count; MPV: mean platelet volume; NEU: neutrophil count; PLT: platelet count; RBC: red blood cells; RDW: red cell distribution width; WBC: white blood cell count; CRP: C-reactive protein; INR: international normalized ratio; PT: prothrombin time; PCT: procalcitonin; ESR: erythrocyte sedimentation rate; aPTT: activated partial prothrombin time.
Table 2. Changes in the correlation of feature-pairs of the non-survived COVID-19-class compared with the survived COVID-19-class.
Table 2. Changes in the correlation of feature-pairs of the non-survived COVID-19-class compared with the survived COVID-19-class.
Spearman Survived COVID-19Spearman Non-Survived COVID-19Change in the Correlation of Features, in Present of
Non-Survived COVID-19
FeatureFeature
83−0.311940.01274DownGlucoseAlbumin
133−0.3753−0.10287DownUAAlbumin
36290.429110.16647DownESRCRP
1050.054170.30605UpCKAmylase
129−0.53482−0.77476UpeGFRCreatinine
53−0.030840.26142UpAmylaseAlbumin
1230.34790.11987DowneGFRAlbumin
30290.33650.1146DownD-dimerCRP
650.054130.26867UpCK-MBAmylase
32300.21431−1.48572 × 10−4DownFibrinogenD-dimer
1060.182430.39221UpCKCK-MB
36300.25085−0.04489DownESRD-dimer
2450.00641−0.21234UpNEUAmylase
146−0.00966−0.20355UpBASOCK-MB
43−0.032220.22455UpALTAlbumin
26180.307580.1153DownRBCLYM
910.244860.05265DownCreatinineALT
16150.201040.01024DownHCTEOS
920.302930.1127DownCreatinineAST
17140.248560.05841DownHGBBASO
25150.08490.27473UpPLTEOS
2520−0.08083−0.27021UpPLTMCHC
73−0.01190.19865UpD-BilAlbumin
2015−0.04499−0.23071UpMCHCEOS
31290.39310.2085DownFerritinCRP
2860.0179−0.20122UpWBCCK-MB
2721−0.30218−0.11908DownRDWMCV
760.06350.24642UpD-BilCK-MB
36320.27287−0.09309DownESRFibrinogen
20140.02204−0.1999UpMCHCBASO
1380.423280.24648DownUAGlucose
1540.015650.19167UpEOSALT
31300.22095−0.04543DownFerritinD-dimer
610.065570.23993UpCK-MBALT
32290.319190.14655DownFibrinogenCRP
2320.008570.18043UpMPVAST
32−0.20943−0.03802DownAlbuminAST
353−0.01609−0.18485UpPCTAlbumin
309−0.007430.17441UpD-dimerCreatinine
23130.005860.1727UpMPVUA
Table 3. Descriptive statistics of RBV values of surviving and non-surviving COVID-19 groups.
Table 3. Descriptive statistics of RBV values of surviving and non-surviving COVID-19 groups.
Surviving GroupNon-Surviving Group
Parameters (Units)MedianPercentile 25Percentile 75MedianPercentile 25Percentile 75p
ALT (U/L)35.3124.0035.3123.0015.0035.20<0.001
AST (U/L)33.2425.0033.2432.0022.0047.230.033
Albumin (g/L)38.5938.5938.5938.2933.0043.540.539
ALP (U/L)84.1084.1084.10103.2372.00103.23<0.001
Amylase (U/L)73.7073.7073.70101.0058.00107.62<0.001
CK-MB (U/L)18.7918.7918.7932.7519.4032.75<0.001
D-Bil. (mg/dL)0.130.130.130.250.120.27<0.001
Glucose (mg/dL)136.03108.00136.03145.00113.00188.00<0.001
Creatinine (mg/dL)1.140.901.141.110.861.64<0.001
CK (U/L)104.2683.00104.26220.0079.00350.53<0.001
LDH (U/L)252.94252.94252.94309.76309.76309.76<0.001
eGFR82.7482.7485.1062.1644.4782.50<0.001
UA (mg/dL)38.8032.0038.8056.7439.1375.95<0.001
BASO (103/μL)0.020.010.040.0210.0140.0440.869
EOS (103/μL)0.040.010.120.030.000.120.232
HCT (%)39.5536.0043.2038.8034.9042.300.041
HGB (g/L)13.3012.0014.6513.1011.5014.500.016
LYM (103/μL)1.460.992.031.320.851.880.015
MCH (pg)28.6027.3029.6028.8027.2030.100.041
MCHC (g/dL)33.8032.9034.7033.5032.4034.600.004
MCV (fL)83.9080.8087.0085.2081.8088.90<0.001
MONO (103/μL)0.510.380.670.560.440.72<0.001
MPV (fL)10.309.7010.9010.309.6011.000.604
NEU (103/μL)4.052.855.855.253.987.65<0.001
PLT (103/μL)229.00184.00287.00200.00166.00250.00<0.001
RBC (106/μL)4.744.365.144.644.164.980.001
RDW (%)13.1012.5013.9014.0013.2015.40<0.001
WBC (103/μL)6.505.008.307.806.2010.10<0.001
CRP (mg/L)6.763.0223.5072.0017.1072.00<0.001
D-dimer (μg/L)441.00441.00441.001277.001277.001277.00<0.001
Ferritin (μg/L)125.9590.90175.80395.00395.00395.00<0.001
Fibrinogen (mg/dL)321.10321.10321.10350.00350.00350.00<0.001
INR1.101.101.101.201.201.20<0.001
PT (Sec)13.1013.1013.1014.2014.2014.20<0.001
PCT (ng/mL)0.120.120.122.752.532.75<0.001
ESR (nm/hr)17.0017.0017.0049.0049.0049.00<0.001
Troponin (ng/L)16.1210.0019.0053.2715.0075.00<0.001
aPTT (Sec)32.7532.7532.7532.0032.0032.00<0.001
p < 0.05 was considered significant.
Table 4. Classification performance of ML models run with 34 features to detect patient groups.
Table 4. Classification performance of ML models run with 34 features to detect patient groups.
NoML ModelsF12
1Histogram-based Gradient Boosting (HGB)1.0000
2Adaboost (AB)0.9952
3Extra Trees (ET)0.9952
4K-nearest neighbors (KNN)0.9929
5Random Forest (RF)0.9928
6Support Vector Machine with Linear Kernel (SVM-LK)0.9904
7Multinomial Naive Bayes (MNB)0.9881
8Gaussian Naive Bayes (GNB)0.9646
9Stochastic Gradient Descent (SGD)0.9642
10Decision Tree (DT)0.9642
11Bernoulli Naive Bayes (BNB)0.9563
12Linear discriminant analysis (LDA)0.9431
13Support Vector Machine with non-linear Kernel (SVM-NLK)0.9428
14Multilayer Perceptron (MP)0.9011
15Passive-Aggressive (PA)0.8772
16Quadratic Discriminant Analysis (QDA)0.7212
Table 5. List of the 12 most significant single features for classification using the HGB algorithm, with F12 metric.
Table 5. List of the 12 most significant single features for classification using the HGB algorithm, with F12 metric.
Feature NameF12
HGB Model
F12
One-Threshold
Approach
F12
Two-Threshold
Approach
PCT350.96210.542770.95118
Ferritin310.909660.537310.90577
Fibrinogen320.884170.46350.67443
ESR360.8450.545220.69842
PT340.764010.5790.58245
D-dimer300.715350.64080.65008
INR330.702040.583020.58743
Amylase50.66990.613740.6599
aPTT380.624510.531170.53603
D-Bil70.545670.410420.4068
CK-MB60.542770.60260.46247
UA130.524540.380880.38088
Table 6. Feature pairs with the highest metrics found with the HGB classifier for detection of surviving and non-surviving COVID-19 patients.
Table 6. Feature pairs with the highest metrics found with the HGB classifier for detection of surviving and non-surviving COVID-19 patients.
Feature Pairs PrecisionRecallF1F12
Surv.Non-Surv.Surv.Non-Surv.Surv.Non-Surv.
D-dimerPCT30350.99790.98670.99870.97860.99830.98250.98083
PCTESR35360.9970.99110.99920.97040.99810.98050.97864
D-BilPCT7350.99870.97350.99750.98660.99810.97980.97794
FerritinPCT31350.9970.98680.99870.96990.99790.97820.97615
LDHPCT11350.99920.96480.99660.9910.99790.97740.97535
PTPCT34350.99530.97810.99790.95360.99660.96540.96212
PCTaPTT35380.99750.95640.99580.9750.99660.96430.96102
CK-MBPCT6350.99830.94730.9950.9830.99660.96410.96082
INRPCT33350.99410.98680.99870.94350.99640.96410.96063
MCHPCT19350.99920.93860.99410.9910.99660.96350.96022
ALTPCT1350.99920.93860.99410.9910.99660.96350.96022
MCVPCT21350.99920.93860.99410.9910.99660.96350.96022
eGFRPCT12350.99920.93860.99410.9910.99660.96350.96022
CreatininePCT9350.99920.93860.99410.9910.99660.96350.96022
RBCPCT26350.99920.93860.99410.9910.99660.96350.96022
GlucosePCT8350.99920.93860.99410.9910.99660.96350.96022
UAPCT13350.99920.93860.99410.9910.99660.96350.96022
WBCPCT28350.99870.93860.99410.98630.99640.96140.95794
BASOPCT14350.99870.93860.99410.98650.99640.96130.95784
PLTPCT25350.99870.93860.99410.98650.99640.96130.95784
RDWPCT27350.99870.93860.99410.98650.99640.96130.95784
ASTPCT2350.99870.93860.99410.98650.99640.96130.95784
PCTTroponin35370.99830.93860.99410.98210.99620.95930.95565
CKPCT10350.99790.94310.99450.9780.99620.95930.95565
MPVPCT23350.99830.93860.99410.98170.99620.95930.95565
MONOPCT22350.99830.93860.99410.98190.99620.95930.95565
AlbuminPCT3350.99920.92970.99330.99120.99620.9580.95436
MCHCPCT20350.99870.92980.99330.98780.9960.95660.95277
CK-MBFerritin6310.99240.9870.99870.92710.99550.95580.9515
AmylasePCT5350.99660.94740.9950.96480.99580.95540.95139
HCTPCT16350.99790.93430.99370.97920.99580.95490.95089
FerritinaPTT31380.99150.98250.99830.92580.99490.95110.94625
EOSPCT15350.9970.93430.99370.96880.99540.95060.94623
HGBPCT17350.99790.92530.99290.97680.99540.94950.94513
LYMPCT18350.99620.93860.99410.96040.99510.94880.94415
D-dimerESR30360.99110.98250.99830.91610.99470.94770.94268
CRPPCT29350.99580.93860.99410.95790.99490.94680.94197
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huyut, M.T.; Velichko, A.; Belyaev, M. Detection of Risk Predictors of COVID-19 Mortality with Classifier Machine Learning Models Operated with Routine Laboratory Biomarkers. Appl. Sci. 2022, 12, 12180. https://doi.org/10.3390/app122312180

AMA Style

Huyut MT, Velichko A, Belyaev M. Detection of Risk Predictors of COVID-19 Mortality with Classifier Machine Learning Models Operated with Routine Laboratory Biomarkers. Applied Sciences. 2022; 12(23):12180. https://doi.org/10.3390/app122312180

Chicago/Turabian Style

Huyut, Mehmet Tahir, Andrei Velichko, and Maksim Belyaev. 2022. "Detection of Risk Predictors of COVID-19 Mortality with Classifier Machine Learning Models Operated with Routine Laboratory Biomarkers" Applied Sciences 12, no. 23: 12180. https://doi.org/10.3390/app122312180

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop