Next Article in Journal
Add-On Cyclic Angiotensin-(1-7) with Cyclophosphamide Arrests Progressive Kidney Disease in Rats with ANCA Associated Glomerulonephritis
Previous Article in Journal
Role of Intestinal Dysbiosis and Nutrition in Rheumatoid Arthritis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Explainable Artificial Intelligence Helps in Understanding the Effect of Fibronectin on Survival of Sepsis

by
Anna Lemańska-Perek
1,*,
Dorota Krzyżanowska-Gołąb
1,
Katarzyna Kobylińska
2,
Przemysław Biecek
2,3,
Tomasz Skalec
4,
Maciej Tyszko
4,
Waldemar Gozdzik
4 and
Barbara Adamik
4
1
Department of Chemistry and Immunochemistry, Wroclaw Medical University, M. Sklodowskiej-Curie 48/50, 50-369 Wroclaw, Poland
2
Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
3
Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warszawa, Poland
4
Clinical Department of Anesthesiology and Intensive Therapy, Wroclaw Medical University, Borowska 213, 50-556 Wroclaw, Poland
*
Author to whom correspondence should be addressed.
Cells 2022, 11(15), 2433; https://doi.org/10.3390/cells11152433
Submission received: 9 June 2022 / Revised: 30 July 2022 / Accepted: 3 August 2022 / Published: 5 August 2022
(This article belongs to the Section Intracellular and Plasma Membranes)

Abstract

:
Fibronectin (FN) plays an essential role in the host’s response to infection. In previous studies, a significant decrease in the FN level was observed in sepsis; however, it has not been clearly elucidated how this parameter affects the patient’s survival. To better understand the relationship between FN and survival, we utilized innovative approaches from the field of explainable machine learning, including local explanations (Break Down, Shapley Additive Values, Ceteris Paribus), to understand the contribution of FN to predicting individual patient survival. The methodology provides new opportunities to personalize informative predictions for patients. The results showed that the most important indicators for predicting survival in sepsis were INR, FN, age, and the APACHE II score. ROC curve analysis showed that the model’s successful classification rate was 0.92, its sensitivity was 0.92, its positive predictive value was 0.76, and its accuracy was 0.79. To illustrate these possibilities, we have developed and shared a web-based risk calculator for exploring individual patient risk. The web application can be continuously updated with new data in order to further improve the model.

1. Introduction

Sepsis is a life-threatening condition caused by the body’s unbalanced response to an infection; it can rapidly lead to organ failure and death. It is the primary cause of death in intensive care units (ICUs), and the prognosis of patients with sepsis is often difficult. The severity of a patient’s condition on admission to the ICU can be determined using clinical scales such as the APACHE II score (Acute Physiology and Chronic Health Evaluation II), and the NEWS (National Early Warning Score), and the degree of organ dysfunction and outcome can be assessed daily by the SOFA score (Sequential Organ Failure Assessment), the SAPS (Simplified Acute Physiology) score, and the SSS (Sepsis Severity Score) [1,2]. Predicting mortality risk in ICU patients has numerous applications. It is practical for planning the allocation of resources and evaluating the performance of ICU wards. Mortality risk assessment is also used in clinical trials to characterize and compare patient groups, and is also an important part of quality assessment; risk-adjusted mortality is the most commonly used indicator of the quality of ICU care [3].
Accurately identifying patients with sepsis who are more likely to die and who can benefit most from additional monitoring or treatment remains a challenge. The additional use of biomarkers to help identify these patients is an attractive solution [4]. However, due to the heterogeneity and complex pathophysiology of sepsis, a single biomarker often provides insufficient information and cannot be reliably qualified as a predictor of outcome in patients with sepsis. Artificial intelligence prediction models, which have been shown to be useful for diagnosing and prognostication in other fields of medicine [5,6,7,8,9], could potentially add much value to these areas for sepsis patients. Previous studies have demonstrated the effectiveness of machine learning algorithms in detecting sepsis early in general patient populations, showing an improvement in the early identification of at-risk patients [10,11]. In another study, the use of a machine learning algorithm was associated with improved sepsis patient outcomes; statistically significant differences for sepsis patients were found for the length of stay and in-hospital mortality [12].
Our previous study showed that fibronectin (FN) levels are associated with clinical indices of sepsis severity and could be used as a predictor of outcome [13]. Fibronectin, a key component of the extracellular matrix (ECM), is an adhesive dimeric glycoprotein with variable molecular conformations and splice variants. It is involved in several processes, including vascular development, wound healing, and ECM remodeling [14]. There are two defined types of FN: soluble plasma fibronectin (pFN) synthesized by hepatocytes, and insoluble cellular fibronectin (cFN) synthesized locally by different cell types, which accumulates in tissues as a component of the ECM [14,15]. Because of alternating splicing of the FN gene, the cellular form of FN contains two extra domains, A and B (EDA and the EDB isoform), which are absent or present in trace amounts in the blood of a healthy person, but their level rapidly increases in various pathological conditions, including sepsis [13,16,17]. The plasma form of FN is involved in haemostasis as an important component of blood clots, and, through its interaction with fibrin, plays a significant role in the coagulation cascade [18]. Cellular FN promotes inflammatory processes by activating the toll-like-receptor-4 signaling pathway. Furthermore, cellular FN is a ligand for integrin-91 that is expressed on various inflammatory cells, such as neutrophils and macrophages [19]. Fibronectin is a target for many bacterial proteins, and as part of a three-component bridge (fibronectin–integrin–fibronectin binding proteins), it contributes to the bacterial colonization of endothelial and epithelial cells [20,21].
In this paper, we demonstrate the efficacy of a machine learning model based on a random forest algorithm, developed to predict the probability of patient survival of sepsis on admission to the ICU. In addition to the commonly used indicators of the clinical condition of sepsis patients, the model included the fibronectin concentration recorded on the day of admission to the ICU. Moreover, we used Explainable Artificial Intelligence (XAI) techniques to better understand the model predictions. XAI techniques are becoming widely used in medicine [22,23]. In our study, we present two local level explanation techniques that assist in understanding the model prediction for a specific patient; these tools identify which features are the most important for a patient and indicate the change that occurs in the prediction with a change in the value of a variable. Such insight is much more interesting than averaged model behavior, both for an individual patient and for the physician. Therefore, these approaches can be considered personalized medicine. An online application of the model was developed, which presents the survival prognosis for both an individual patient and the entire data set in the model. This is a preliminary study to further develop the machine learning model, and data on patients with sepsis will be systematically added.

2. Materials and Methods

2.1. Study Group

This observational retrospective study included patients with sepsis/septic shock admitted to the Intensive Care Unit (ICU) in a tertiary-care university hospital. The analysis of plasma and cellular forms of fibronectin was conducted at the Department of Chemistry and Immunochemistry of Wroclaw Medical University. The survival prediction model and the online application of the machine learning model were created at the Faculty of Mathematics, Informatics and Mechanics, University of Warsaw. The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board Bioethics Committee of the Medical University in Wroclaw (No. 637/2014), and informed consent was obtained from the patients or their representatives.
Inclusion criteria were: age >18 years and a diagnosis of sepsis/septic shock, according to the Sepsis-3 definition on admission to the intensive care unit [2].
Exclusion criteria were: previously treated at the ICU, pregnancy, terminal illness with no chance for meaningful recovery or expected ICU length of stay of 24 h or less.
All patients admitted to the ICU who met the inclusion criteria were included in the study. All patients in the study received standard treatment for septic shock according to the Surviving Sepsis Campaign guidelines [24]. Demographic, laboratory and clinical data were collected from patient medical records.
The severity of the clinical status of the patient on admission to the ICU was determined using the APACHE II (Acute Physiology and Chronic Health Evaluation II) score [25]. The score is made up of 12 physiological variables (the fraction of inspired oxygen, partial pressure of oxygen, body temperature, mean arterial pressure, blood pH, heart rate, respiratory rate, serum sodium, serum potassium, serum creatinine, hematocrit, white blood cell count, and the Glasgow Coma Scale) and 2 disease-related variables (history of severe organ failure or immunocompromise and the type of ICU admission).
The degree of organ dysfunction of patients on admission to the ICU was assessed with a Sequential Organ Failure Assessment (SOFA) score. The score is used in the ICU for monitoring the severity of sepsis based on the status of six body systems: respiratory (PaO2/FiO2 index), cardiovascular (mean arterial pressure and the dose of vasopressors), hepatic (serum bilirubin level), coagulation (platelets level), renal (serum creatinine level/urine output), and neurological (Glasgow Coma Scale). Both the APACHE II and SOFA scores are routinely used tools for ICU patients. Demographic data and laboratory parameters, such as the white blood cell (WBC) count, C-reactive protein (CRP) level, procalcitonin (PCT) level, and coagulation parameters (d-dimers; international normalized ratio, INR) were also recorded.

2.2. Blood Sample Collection and Fibronectin Concentration Measurement

Blood samples (2.7 mL), anticoagulated with 3.2% sodium citrate, were collected from patients diagnosed with sepsis or septic shock on the day of admission to the ICU. Plasma was separated by centrifugation at 2000 rcf for 10 min, aliquoted, and stored at −70 °C for fibronectin measurements.

2.2.1. Plasma FN Concentrations

Plasma FN concentrations were determined by an enzyme-linked immunosorbent assay (ELISA) using a well-defined domain-specific monoclonal antibody directed to the cell-binding domain of FN (FN30-8; M010 TaKaRa Shuzo Co. Ltd., Shiga, Japan), as described earlier [26]. Briefly, the monoclonal antibody directed to the cell-binding domain of FN (FN30-8 M010, diluted 1:10,000) was used as a coating agent in the wells of a microtiter plate (Nagle Nunc International, Naperville, IL, USA) to bind FN from the samples. The amount of FN bound by the monoclonal antibody was quantified using rabbit anti-FN polyclonal antibodies (Sigma Chemical Co., St Louis, MO, USA, diluted 1:5000) and secondary antibody peroxidase conjugated goat anti-rabbit immunoglobulins (Sigma Chemical Co., St Louis, MO, USA, diluted 1:30,000). The test was assayed by a colorimetric reaction using o-phenylenediamine dihydrochloride/H2O2 as the enzyme substrate. The results were expressed in absorbance units (AU). The samples were analyzed in two different dilutions, each in duplicate. The pFN concentration is given in milligrams per liter. A human plasma FN preparation (Sigma Chemical Co., St. Louis, MO, USA, from 1.5 to 50 ng/well) was used as a standard for determining the pFN-ELISA.

2.2.2. EDA-FN Concentrations

The EDA-FN concentration was determined by ELISA using a domain-specific primary antibody (S-FN5, clone IST-9, Sirius Biotech S.r.l., Genoa, Italy) and a biotinylated secondary antibody (715-066-151, Jackson ImmunoResearch, Baltimore, MD, USA). The detection of EDA-FN in the plasma was based on the method described earlier [27]. Briefly, gelatin (0.5% in TBS/well overnight at 4 °C) was used as a coating agent on ELISA plates (Nalge Nunc International, Naperville, IL, USA). FN from the plasma samples was bound to the gelatin; next, an IST-9 monoclonal antibody (1.0 µg/mL, Sirius Biotech S.r.l., Genoa, Italy) was used to detect FN containing the EDA-FN domain. Biotinylated donkey anti-mouse antiserum (diluted 1:10,000, Dianova GmbH, Hamburg, Germany) was added in the next step, and finally, horseradish peroxidase (HRP)-labelled streptavidin (1 µg/mL, Dianova GmbH, Hamburg, Germany) was added. Incubation with the secondary antibody and peroxidase-conjugated Streptavidin was performed without access to light. The test was assayed by a colorimetric reaction using o-phenylenediamine dihydrochloride/H2O2 as the enzyme substrate. A cellular fibronectin from human foreskin fibroblasts (Sigma Chemical Co., St. Louis, MO, USA, from 1.5 to 50 ng/well) was used as a standard. The samples were analyzed in two different dilutions, each in duplicate. The EDA-FN concentrations are given in milligrams per liter. To determine non-specific binding, two controls were included in the tests: without the primary antibody and without the secondary antibody.

2.3. Statistical Methods

Continuous variables were summarized with three statistics: the median and the interquartile range between the 25th and 75th percentiles, while categorical variables were summarized as counts and fractions. A comparison of the continuous variables between two independent groups (Nonsurvivors vs. Survivors) was performed using the Mann–Whitney test. Categorical variables were analyzed with the chi-squared test, and contingency tables were used to analyze the frequency distribution of categorical variables. p-values less than 0.05 were regarded as significant. This study presents the effectiveness of an artificial intelligence model designed to predict the probability of survival in sepsis. Figure 1 shows a workflow diagram. Three types of models were developed: a logistic regression model and two complex, tree-based models: random forest and gradient boosting [28,29]. Logistic regression is our choice for a transparent model that is widely used in the medical domain. Random forest and gradient boosting, the so-called black box models, are very often more accurate, but at the cost of interpretability.In order to understand the machine learning model predictions, we used two types of Explainable Artificial Intelligence (XAI) methods: global-level methods and instance-level methods. Global level methods assist in understanding the overall model structure. We used the model-agnostic Feature Importance method, which has been previously described in detail by Fisher et al. [30]. Briefly, the Feature Importance method measures changes in model performance after the perturbation of a variable. The bigger the loss of model performance after perturbation of a selected variable, the more important that variable is. The method helps to determine which variables influence the final prediction the most, but also which variables are not important.
Moreover, three local-level methods, Break Down, Shapley Additive Explanations (SHAP values) and Ceteris Paribus, were applied to the model [31]. The local-level methods helped us to understand how a model behaved for a particular, selected patient. The Break Down method [32] and SHAP values [33] presented how the variables contributed to the final prediction. However, the decomposition of the model predictions into the attribution of each variable was computed in a different way: the value of the attribution depended on the order of the explanatory variables for the Break Down method, whereas the SHAP values averaged the value of variable attribution over all possible orderings. Ceteris Paribus profiles showed the dependencies between continuous variables and model predictions. The method presented how a prediction would change if the value of a single variable changed for a selected instance. All analysis was performed with R 3.6.1 [34] with mlr [35] and DALEX [36] packages.
We have created a GitHub repository with code needed to prepare the models and explain them: https://github.com/KasiaKobylinska/XAIForSepsis (accessed on 8 June 2022).

3. Results

A total of 127 consecutive patients who were treated for sepsis/septic shock between January 2018 and December 2019 were screened for inclusion/exclusion criteria. Of this number, 122 patients met the inclusion criteria and were included in the final analysis. Five patients were excluded due to incomplete data records. Sepsis/septic shock was diagnosed on the basis of Sepsis-3 diagnostic criteria [2]. All patients on admission had a clinical suspicion of infection diagnosed by the attending physician on the basis of the source of infection (clinical, radiological or microbiological). Body fluids were routinely collected for culture from each patient on admission to the ICU: 99 patients had microbiologically confirmed bacterial infections. Moreover, samples taken for culture on subsequent days also confirmed bacterial infections in the remaining patients with sepsis. The most frequent primary sources of sepsis were abdominal (45%) and pulmonary (34%) infections. The 28-day mortality was 44%, and patient characteristics are summarized in Table 1. The patients who survived were younger, had lower APACHEII and SOFA scores, and lower levels of pro-inflammatory markers such as procalcitonin, c-reactive protein, and leukocytes.

3.1. Fibronectin Concentrations

Both forms of FN, plasma (pFN) and cellular (EDA-FN), were measured on admission to the ICU. The median pFN concentration was 114.06 mg/L; in Nonsurvivors, the pFN values were significantly lower compared to the values recorded in Survivors (83.02 mg/L vs. 138.82 mg/L, p < 0.001). The median value of the EDA-FN concentration in Nonsurvivors was higher than the median value measured in Survivors, but there was no statistically significant difference between the groups (9.4 mg/L vs. 5.04 mg/L, p = 0.055).

3.2. Results of Modeling

We took into consideration three types of models: the logistic regression, random forest and gradient boosting models. After running the models, we also prepared a benchmark to compare the results with different test data sets. The benchmark consisted of dividing the input data set into test and training sets, preparing the model on the training data set and computing the area under the curve (AUC) on the test data set. The procedure was repeated five times. The proportion of the test to training patients in the sets was 1:2. The mean test AUC was 0.85 for the random forest model, 0.78 for the gradient boosting model, and 0.81 for the logistic regression model. The results of the test AUCs for the models are presented in Figure 2.
The best results were obtained for the random forest model, and this model is discussed further. The independent variables which were put into the model were selected based on the significance tests presented in Table 1 and based on the results obtained for fibronectin. Additionally, d-dimers were included in the analysis as a parameter indicative of fibrin degradation. In our previous study, we detected the presence of fibronectin-fibrin complexes in the plasma of sepsis patients; furthermore, the frequency of occurrence and the relative amount of fibronectin–fibrin complexes were higher in Nonsurvivors than in Survivors [13]. The machine learning model was developed with input features of the concentration of plasma fibronectin, the INR value, the SOFA score, the patient’s age, the APACHE II score, the procalcitonin level, the platelet count, and the level of d-dimers. A 10-time cross-validation was performed to optimize the random forest model parameters and unique overfitting. The mean AUC of the 10-time cross-validation computed for the test data sets was 0.82. The final model was prepared on the training data set. The ROC curve analysis of the random forest model showed that the rate of successfully classifying patients with the model was 0.92 (AUC computed on the whole data set) (Figure 3), with a sensitivity of 0.92 (recall), positive predictive value of 0.76 (precision), and accuracy of 0.79 obtained.

3.3. Global-Level Methods for Model Explanation

When making high-risk decisions, it is crucial to understand the structure of a model and the dependencies contained within. As it was difficult to interpret the black box model, we applied explanatory methods, such as Feature Importance, Break Down, SHAP values and Ceteris Paribus, which can help physicians interpret the results. The global-level perspective was applied in order to understand the model structure and the dependencies between the patient features and the model predictions. The permutational variable importance method indicated which variables were significant to the model response. The approach of this method was to compare the performance of the model with the performance after some variable permutations. The larger the loss functions, the more important the selected variable is.

3.4. Feature Importance with the Random Forest Model

The importance of specific features for predicting the probability of patient survival is presented in Figure 4. The longer the variable’s bar, the more significant this feature was in the random forest model. According to the plot, the INR value and the concentration of plasma fibronectin were the most important variables for the random forest model. The APACHE II score and age were also important variables for the model. The model indicated that the SOFA score, platelet count, and procalcitonin level were less informative variables. The level of d-dimers did not seem to have an impact on the model predictions.

3.5. Local-Level Methods for Model Explanation

We also applied the local explaining methodology in order to support medical decision making for an individual patient. We used the Ceteris Paribus method, which presents the dependency between the possible model responses, if the change in one feature would have occurred for a selected instance. We used the Break Down method and the SHAP values, which separate the model prediction into each variable contribution for a selected instance. These methods were helpful in understanding which variables were the most important for a selected patient, and how they influenced the model’s result.

Example of Clinical Application of the Random Forest Prediction Model

The local methods presented in the article were applied to one selected patient, but could be extended to other patients thanks to the online application. The application created for predicting the survival of septic patients is based on the random forest model described above. The following input variables were included: plasma fibronectin, the INR value, the platelet count, the APACHE II score, age, the SOFA score, the procalcitonin level, and the level of d-dimers. The online application is available at https://stats4med.shinyapps.io/xai2shiny/ (accessed on 8 June 2022), and the analysis presented below is an example of the use of the model.
Analysis: The patient was admitted to the Intensive Care Unit directly from the operating theatre after relaparotomy due to the perforation of the caecum and fecal peritonitis. On admission, the patient was diagnosed with septic shock and given catecholamines and cordarone in order to stabilize the cardiovascular system. Due to respiratory failure, the patient was mechanically ventilated with high pressure and oxygen support. Empirical antibiotic therapy was implemented, and parenteral nutrition was introduced. Due to persistent oliguria, hemodiafiltration was implemented in citrate anticoagulation. The patient’s clinical status was assessed with the APACHE II score (20 pts.) and the SOFA score (9 pts.). The other parameters used in the predictive model were as follows: plasma fibronectin level 149.51 mg/L, procalcitonin 4.68 ng/L, d-dimers 1.33 mg/L, and INR 1.36. According to the presented random forest model, the prediction of 28-day survival calculated for this patient on admission to the ICU was 0.764, which was higher than the average model prediction. Based on the Break Down method, the most important variable in the model that increased the accuracy of the prediction was the SOFA score, with the contribution + 0.064, and the level of pFN, with the contribution + 0.45 (Figure 5A). Other variables were of less importance, and the only variable that had a negative impact on the prediction was age. Very similar results were obtained using the SHAP values (Figure 5B). Moreover, when taking into account the Ceteris Paribus profiles, we can see that a loss in the pFN value or even a small increase in the SOFA level resulted in a worse prediction for that patient (Figure 5C). According to the hospital documentation, the patient was alive on day 28 of treatment.

4. Discussion

In this study, we used a machine learning model with input features, including fibronectin and a new potential sepsis biomarker, in conjunction with routinely measured vital indices, to predict the survival of patients diagnosed with sepsis on admission to the ICU. Three types of models were developed: logistic regression, random forest and gradient boosting; based on the test and train AUC values, the random forest was chosen as the final model. Our results showed that the most important indicators for predicting survival were the INR and the level of plasma fibronectin, followed by age and the APACHE II score.
On the basis of the data from the analyzed cohort, an online application was developed for predicting the survival of individual patients.
Sepsis is a major cause of death in the ICU, and much research has been done to test models for the early prediction of sepsis diagnosis and outcome [37,38,39]. In addition to the clinical scores routinely used to assess the clinical status of patients with sepsis, and basic demographic parameters such as age, gender, and race, biomarkers have been shown to be useful in the early diagnosis of sepsis and prognosis of outcome. Numerous studies indicate that protein biomarkers can be valuable parameters in such models, increasing the accuracy of diagnosis and prognosis [5,40,41,42].
Fibronectin has previously been proposed as a biomarker of sepsis: it was observed that low concentrations of FN in the plasma of patients who were suspected to have sepsis were consistent with the final diagnosis of sepsis and a positive blood culture result [43]. Decreased plasma levels of FN have also been observed in acute inflammation, surgical trauma and disseminated intravascular coagulation [44,45]. Mamani et al. proposed FN as a diagnostic marker of sepsis; however, the ROC curve analysis and a comparison of the AUC values for FN and CRP showed that the diagnostic value of CRP was significantly higher [46]. In our previous work, we showed that the plasma FN concentration measured in septic patients was significantly lower than in healthy adult controls [13]. In addition, we found that plasma FN levels in Nonsurvivors were significantly lower than in Survivors, and particularly low levels of FN were measured in septic patients with DIC. All these observations suggest the potential usefulness of FN as a prognostic marker in sepsis. However, previous attempts to search for prognostic biomarkers in patients with systemic inflammatory response syndrome (SIRS) have not shown the usefulness of FN in both univariate and multivariate logistic regression analysis [44]. FN is a multifunctional protein; therefore, changes in the protein level may be the result of many different processes or effects. Fibronectin is a key component of a fibrin clot; decreases in the FN concentration in sepsis may result from the consumption of coagulation factors caused by intensified coagulation processes. We previously observed the formation of fibrin–fibronectin complexes in the plasma of patients with sepsis [13]. Moreover, FN is involved in haemostasis, rapidly depositing on the wall of the damaged vessel and promoting platelet aggregation through pFN–fibrin complexes, but interestingly, in the absence of fibrin, fibronectin inhibited this process [18]. One of the reasons for the decrease in the level of FN concentration in the plasma of sepsis patients is the cleavage of FN in pathological conditions linked with extracellular matrix remodeling and the release of FN fragments with pro-inflammatory and degradative properties. Our previous study confirmed that FN cleavage resulted in the presence of FN fragments with a mass mainly from 150 kDa to 70 kDa [13].
In addition to the plasma FN, we also measured the cellular form of FN, and found significantly higher levels of this FN form in septic patients than in healthy people [13]. This is consistent with previous reports of a rapid increase in cellular FN levels in pathological conditions, including sepsis [13,47,48]. Unlike plasma FN, we found no significant difference between the levels of cellular FN recorded in Nonsurvivors and Survivors. This is not surprising, given that these two FN isoforms are produced and function independently [14]. Our last study showed that the level of the cellular form of FN is related to treatment outcome and is significantly higher in COVID-19 Nonsurvivors than in Survivors. Additionally, EDA-FN levels correlated with APACHE II and SOFA scores [49].
In this study, we initially applied input features to the machine learning model of the concentrations of both forms of fibronectin (plasma FN and cellular FN). The plasma FN was the most important feature in the model, while the cellular form of FN was not significant for the model and was excluded from the final analysis. The ROC curve analysis of our prediction model presented in Figure 3 showed that the rate of successful classification by the model was as high as 0.92 with a sensitivity of 0.92, and had a positive predictive value of 0.76 (precision), and an accuracy of 0.79. As shown in Figure 4, the most important variable in the model was the INR value, followed by the plasma FN concentration, age, and the APACHE II score. Additionally, the SOFA score, platelet count, and the levels of procalcitonin and d-dimers were entered into the model as less informative variables.
On the basis of the input data from the analyzed cohort, we have created a web application for predicting the survival of individual patients: https://stats4med.shinyapps.io/xai2shiny/ (accessed on 8 June 2022). This is a preliminary design, limited by the small amount of data, although it will be easy to extend it in the future. The model will learn and develop as the database is expanded with parameters from the next cohort, and the application could serve as a survival prediction tool for outcome prognosis in sepsis.
Our study has several limitations. The number of cases was small, and we view the created model as a preliminary one that will be developed. All the analyzed patients had severe infections of bacterial origin. To continue the project, we intend to extend the database to include cases of bacterial and viral sepsis. There are limitations to the model itself: the analysis was prepared on a small data set, so the machine learning model could have been overfitted. The next study should include an extended database with more observations. Then, future studies may address the interpretability of even more sophisticated artificial intelligence tasks.

5. Conclusions

To better understand the relationship between FN and survival, we utilized innovative approaches from the field of explainable machine learning, including local explanations, to understand the contribution of FN in predicting survival at the level of a single patient. The methodology provides new opportunities to personalize informative predictions for patients. To illustrate these possibilities, we have developed and shared a web-based risk calculator allowing exploration of individual patient risk. With the random forest model, we showed that the concentration of fibronectin, in combination with routinely assessed parameters, could be useful as a marker for predicting survival in sepsis. An online application developed for predicting survival for individual patients can be continuously expanded with new data.

Author Contributions

Conceptualization, B.A., A.L.-P. and D.K.-G.; methodology, B.A., A.L.-P., D.K.-G., K.K. and M.T.; investigation, B.A., A.L.-P., D.K.-G., K.K., T.S. and M.T.; resources, B.A., A.L.-P. and D.K.-G.; data curation, B.A., A.L.-P., D.K.-G. and K.K.; writing—original draft preparation, B.A., A.L.-P. and D.K.-G.; writing—review and editing, B.A., A.L.-P., D.K.-G., K.K. and W.G.; visualization, B.A., A.L.-P., D.K.-G. and K.K.; supervision, B.A., A.L.-P., W.G. and P.B.; project administration, B.A. and A.L.-P.; funding acquisition, B.A., A.L.-P. and W.G. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge funding support from Wroclaw Medical University, Poland (grant No. A070.16.025 and grant No. ST.A170.18.024).

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board Bioethics Committee of the Medical University in Wroclaw (No. 637/2014).

Informed Consent Statement

Informed consent was obtained from all subjects or their representatives.

Data Availability Statement

The data presented in the study are available on request from the corresponding author. The data have not been made publicly available, because they contain information that could compromise the privacy of the study participants.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Evans, L.; Rhodes, A.; Alhazzani, W.; Antonelli, M.; Coopersmith, C.M.; French, C.; Machado, F.R.; Mcintyre, L.; Ostermann, M.; Prescott, H.C.; et al. Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock 2021. Crit. Care Med. 2021, 49, e1063–e1143. [Google Scholar] [CrossRef] [PubMed]
  2. Singer, M.; Deutschman, C.S.; Seymour, C.W.; Shankar-Hari, M.; Annane, D.; Bauer, M.; Bellomo, R.; Bernard, G.R.; Chiche, J.D.; Coopersmith, C.M.; et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 2016, 315, 801–810. [Google Scholar] [CrossRef] [PubMed]
  3. Wełna, M.; Adamik, B.; Goździk, W.; Kübler, A. External validation of the sepsis severity score. Int. J. Immunopathol. Pharmacol. 2020, 34, 1–10. [Google Scholar] [CrossRef] [PubMed]
  4. Power, G.S.; Harrison, D.A. Why try to predict ICU outcomes. Curr. Opin. Crit. Care 2014, 20, 544–549. [Google Scholar] [CrossRef] [PubMed]
  5. Zhang, Y.; Khalid, S.; Jiang, J. Diagnostic and predictive performance of biomarkers in patients with sepsis in an intensive care unit. J. Int. Med. Res. 2019, 47, 44–58. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Esteva, A.; Kuprel, B.; Novoa, R.A. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef]
  7. Lundberg, S.M.; Nair, B.; Vavilala, M.S.; Horibe, M.; Eisses, M.J.; Adams, T.; Liston, D.E.; Low, D.K.; Newman, S.F.; Kim, J.; et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2018, 2, 749–760. [Google Scholar] [CrossRef]
  8. Kobylińska, K.; Mikołajczyk, T.; Adamek, M.; Orłowski, T.; Biecek, P. Explainable Machine Learning for Modeling of Early Postoperative Mortality in Lung Cancer. In Artificial Intelligence in Medicine: Knowledge Representation and Transparent and Explainable Systems, KR4HC 2019, TEAAM 2019; Lecture Notes in Computer Science; Marcos, M., Juarez, J.M., Lenz, R., Nalepa, G.J., Nowaczyk, S., Peleg, M., Stefanowski, J., Stiglic, G., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 161–174. [Google Scholar]
  9. Blanco-Justicia, A.; Domingo-Ferrer, J.; Martínez, S.; Sánchez, D. Machine learning explainability via microaggregation and shallow decision trees. Knowl. Based Syst. 2020, 194, 105532. [Google Scholar] [CrossRef]
  10. Calvert, J.S.; Price, D.A.; Chettipally, U.K.; Barton, C.W.; Feldman, M.D.; Hoffman, J.L.; Jay, M.; Das, R. A computational approach to early sepsis detection. Comput. Biol. Med. 2016, 74, 69–73. [Google Scholar] [CrossRef] [Green Version]
  11. Calvert, J.; Desautels, T.; Chettipally, U.; Barton, C.; Hoffman, J.; Jay, M.; Mao, Q.; Mohamadlou, H.; Das, R. High-performance detection and early prediction of septic shock for alcohol-use disorder patients. Ann. Med. Surg. 2016, 10, 50–55. [Google Scholar] [CrossRef]
  12. Shimabukuro, D.W.; Barton, C.W.; Feldman, M.D.; Mataraso, S.J.; Das, R. Effect of a machine learning-based severe sepsis prediction algorithm on patient survival and hospital length of stay: A randomised clinical trial. BMJ Open Resp. Res. 2017, 4, e000234. [Google Scholar] [CrossRef] [PubMed]
  13. Lemańska-Perek, A.; Krzyżanowska-Gołąb, D.; Skalec, T.; Adamik, B. Plasma and cellular forms of fibronectin as prognostic markers in sepsis. Mediat. Inflamm. 2020, 2020, 8364247. [Google Scholar] [CrossRef] [PubMed]
  14. To, W.S.; Midwood, K.S. Plasma and cellular fibronectin: Distinct and independent functions during tissue repair. Fibrogen Tissue Repair 2011, 4, 21. [Google Scholar] [CrossRef] [Green Version]
  15. Pankov, R.; Yamada, K.M. Fibronectin at a glance. J. Cell Sci. 2002, 115, 3861–3863. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. White, E.S.; Muro, A.F. Fibronectin splice variants: Understanding their multiple roles in health and disease using engineered mouse models. IUBMB Life 2011, 63, 538–546. [Google Scholar] [CrossRef]
  17. White, E.S.; Baralle, F.E.; Muro, A.F. New insights into form and function of fibronectin splice variants. J. Pathol. 2008, 216, 1–14. [Google Scholar] [CrossRef] [Green Version]
  18. Wang, Y.; Reheman, A.; Spring, C.M.; Kalantari, J.; Marshall, A.H.; Wolberg, A.S.; Gross, P.L.; Weitz, J.I.; Rand, M.L.; Mosher, D.F.; et al. Plasma fibronectin supports hemostasis and regulates thrombosis. J. Clin. Investig. 2014, 124, 4281–4293. [Google Scholar] [CrossRef] [Green Version]
  19. Dhanesha, N.; Chorawala, M.R.; Jain, M.; Bhalla, A.; Thedens, D.; Nayak, M.; Doddapattar, P.; Chauhan, A.K. Fn-EDA (Fibronectin Containing Extra Domain A) in the Plasma, but Not Endothelial Cells, Exacerbates Stroke Outcome by Promoting Thrombo-Inflammation. Stroke 2019, 50, 1201–1209. [Google Scholar] [CrossRef]
  20. Shinji, H.; Yosizawa, Y.; Tajima, A.; Iwase, T.; Sugimoto, S.; Seki, K.; Mizunoe, Y. Role of fibronectin-binding proteins A and B in vitro cellular infections and in vivo septic infections by staphylococcus aureus. Infect. Immun. 2011, 79, 2215–2223. [Google Scholar] [CrossRef] [Green Version]
  21. Schröder, A.; Schröder, B.; Roppenser, B.; Linder, S.; Sinha, B.; Fässler, R.; Aepfelbacher, M. Staphylococcus aureus fibronectin binding protein-A induces motile attachment sites and complex actin remodelling in living endothelial cells. Mol. Biol. Cell 2006, 17, 5198–5210. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nature 2020, 2, 56–67. [Google Scholar] [CrossRef]
  23. Kobylińska, K.; Orłowski, T.; Adamek, M.; Biecek, P. Explainable Machine Learning for Lung Cancer Screening Models. Appl. Sci. 2022, 12, 1926. [Google Scholar] [CrossRef]
  24. Rhodes, A.; Evans, L.E.; Alhazzani, W.; Levy, M.M.; Antonelli, M.; Ferrer, R.; Kumar, A.; Sevransky, J.E.; Sprung, C.L.; Nunnally, M.E.; et al. Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock: 2016. Intensive Care Med. 2017, 43, 304–377. [Google Scholar] [CrossRef]
  25. Knaus, W.A.; Draper, E.A.; Wagner, D.P.; Zimmerman, J.E. APACHE II: A severity of disease classification system. Crit. Care Med. 1985, 13, 818–829. [Google Scholar] [CrossRef]
  26. Lemańska-Perek, A.; Pupek, M.; Polańska, B.; Leszek, J.; Kątnik-Prastowska, I. Alterations in molecular status of plasma fibronectin associated with aging of normal human individuals. Clin. Biochem. 2013, 46, 787–794. [Google Scholar] [CrossRef] [PubMed]
  27. Ziffels, B.; Ospel, J.; Grün, K.; Neri, D.; Pfeil, A.; Fritzenwanger, M.; Figulla, H.R.; Jung, C.; Berndt, A.; Franz, M. Detection of soluble ED-A+ fibronectin and evaluation as novel serum biomarker for cardiac tissue remodeling. Dis. Markers 2016, 2016, 3695454. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  29. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  30. Fisher, A.; Rudin, C.; Dominici, F. All Models Are Wrong, but Many Are Useful: Learning a Variable’s Importance by Studying an Entire Class of Predic-tion Models Simultaneously. J. Mach. Learn. Res. 2019, 20, 1–81. [Google Scholar]
  31. Biecek, P.; Burzykowski, T. Explanatory Models Analysis; Chapman and Hall/CRC: New York, NY, USA, 2021; ISBN 9780367135591. [Google Scholar] [CrossRef]
  32. Staniak, M.; Biecek, P. Explanations of model predictions with live and breakdown pack-ages. R J. 2018, 10, 359–409. [Google Scholar] [CrossRef] [Green Version]
  33. Lundberg, S.M.; Lee, S. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
  34. R Core Team 2020. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017; Available online: https://www.R-project.org (accessed on 8 June 2022).
  35. Bischl, B.; Lang, M.; Kotthoff, L.; Schiffner, J.; Richter, J.; Studerus, E.; Casalicchio, G.; Jones, Z.M. mlr: Machine Learning in R. J. Mach. Learn. Res. 2016, 17, 5938–5942. [Google Scholar]
  36. Biecek, P. DALEX: Explainers for Complex Predictive Models in R. J. Mach. Learn. Res. 2018, 19, 3245–3249. [Google Scholar]
  37. Awad, A.; Bader-El-Den, M.; McNicholas, J.; Briggs, J. Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach. Int. J. Med. Inform. 2017, 108, 185–195. [Google Scholar] [CrossRef] [Green Version]
  38. Nemati, S.; Holder, A.; Razmi, F.; Stanley, M.D.; Clifford, G.D.; Buchman, T.G. An interpretable machine learning model for accurate prediction of sepsis in the ICU. Crit. Care Med. 2018, 46, 547–553. [Google Scholar] [CrossRef]
  39. Ford, D.W.; Goodwin, A.J.; Simpson, A.N.; Johnson, E.; Nadig, N.; Simpson, K.N. A severe sepsis mortality prediction model and score for use with administrative data. Crit. Care Med. 2016, 44, 319–327. [Google Scholar] [CrossRef]
  40. Kutz, A.; Hausfater, P.; Amin, D.; Amin, A.; Canavaggio, P.; Sauvin, G.; Bernard, M.; Conca, A.; Haubitz, S.; Struja, T.; et al. The TRIAGE-ProADM score for an early risk stratification of medical patients in the emergency department—Development based on a multi-national, prospective, observational study. PLoS ONE 2016, 11, e0168076. [Google Scholar] [CrossRef]
  41. Yu, H.; Nie, L.; Liu, A.; Wu, K.; Hsein, Y.C.; Yen, D.W.; Lee, M.G.; Lee, C.C. Combining procalcitonin with the qSOFA and sepsis mortality prediction. Med. Baltim. 2019, 98, e15981. [Google Scholar] [CrossRef]
  42. Liu, D.; Su, L.; Han, G.; Yan, P.; Xie, L. Prognostic value of procalcitonin in adult patients with sepsis: A systematic review and meta-analysis. PLoS ONE 2015, 10, e0129450. [Google Scholar] [CrossRef]
  43. Ruiz Martín, G.; Prieto Prieto, J.; Veiga de Cabo, J.; Gomez Lus, L.; Barberán, J.; González Landa, J.M.; Fernández, C. Plasma fibronectin as a marker of sepsis. Int. J. Infect. Dis. 2004, 8, 236–243. [Google Scholar] [CrossRef] [Green Version]
  44. Reichsoellner, M.; Raggam, R.B.; Wagner, J.; Krause, R.; Hoenigl, M. Clinical evaluation of multiple inflammation biomarkers for diagnosis and prognosis for patients with systemic inflammatory response syndrome. J. Clin. Microbiol. 2014, 52, 4063–4066. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Zeerleder, S.; Hack, C.E.; Wuillemin, W.A. Disseminated intravascular coagulation in sepsis. Chest 2005, 128, 2864–2875. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  46. Mamani, M.; Hashemi, S.H.; Hajilooi, M.; Saedi, F.; Niayesh, A.; Fallah, M. Evaluation of fibronectin and C-reactive protein levels in patients with sepsis: A case-control study. Acta. Med. Iran. 2012, 50, 404–410. [Google Scholar] [PubMed]
  47. Lemańska-Perek, A.; Krzyżanowska-Gołąb, D.; Pupek, M.; Klimeczek, P.; Witkiewicz, W.; Kątnik-Prastowska, I. Analysis of soluble molecular fibronectin-fibrin complexes and EDA-fibronectin concentration in plasma of patients with atherosclerosis. Inflammation 2016, 39, 1059–1068. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  48. Satoi, S.; Kitade, H.; Hiramatsu, Y.; Kwon, A.H.; Takahashi, H.; Sekiguchi, K.; Uehara, M.; Oda, M.; Yanagimoto, Y.; Miyashita, K. Increased extra domain-A containing fibronectin and hepatic dysfunction during septic response: An in vivo and in vitro study. Shock 2000, 13, 492–496. [Google Scholar] [CrossRef] [Green Version]
  49. Lemańska-Perek, A.; Krzyżanowska-Gołąb, D.; Dragan, B.; Tyszko, M.; Adamik, B. Fibronectin as a Marker of Disease Severity in Critically Ill COVID-19 Patients. Cells 2022, 11, 1566. [Google Scholar] [CrossRef]
Figure 1. A workflow diagram. (A) This part presents how the best model was chosen. (B) The workflow of building and explaining the final model. Results are available at: https://stats4med.shinyapps.io/xai2shiny/ (accessed on 8 June 2022).
Figure 1. A workflow diagram. (A) This part presents how the best model was chosen. (B) The workflow of building and explaining the final model. Results are available at: https://stats4med.shinyapps.io/xai2shiny/ (accessed on 8 June 2022).
Cells 11 02433 g001
Figure 2. A comparison of the area-under-the-curve values of the logistic regression, random forest, and gradient boosting models. The mean test AUC was the highest for the random forest model, whereas the lowest mean test AUC was for the boosting model. The blue dot represents the mean and the bold midline represents the median of the AUC results, whereas the upper and lower limits of the boxes correspond to the third and first quartiles. Black dots represent outliers in the data. Each boxplot presents the results of a 5-fold cross-validation procedure repeated five times for a specific model.
Figure 2. A comparison of the area-under-the-curve values of the logistic regression, random forest, and gradient boosting models. The mean test AUC was the highest for the random forest model, whereas the lowest mean test AUC was for the boosting model. The blue dot represents the mean and the bold midline represents the median of the AUC results, whereas the upper and lower limits of the boxes correspond to the third and first quartiles. Black dots represent outliers in the data. Each boxplot presents the results of a 5-fold cross-validation procedure repeated five times for a specific model.
Cells 11 02433 g002
Figure 3. The ROC curve of the machine learning model for predicting the survival of sepsis. The following input variables were included in the model: pFN, the INR value, the APACHE II score, age, the SOFA score, the platelet count, the procalcitonin level, and the d-dimers level.
Figure 3. The ROC curve of the machine learning model for predicting the survival of sepsis. The following input variables were included in the model: pFN, the INR value, the APACHE II score, age, the SOFA score, the platelet count, the procalcitonin level, and the d-dimers level.
Cells 11 02433 g003
Figure 4. The Feature Importance plot showing the most significant variables for the model. The length of the bar indicates the loss in the AUC when a given variable was altered. The bigger the loss, the more important the variable is.
Figure 4. The Feature Importance plot showing the most significant variables for the model. The length of the bar indicates the loss in the AUC when a given variable was altered. The bigger the loss, the more important the variable is.
Cells 11 02433 g004
Figure 5. The survival prediction model for a selected patient. (A) The impact of the values of particular features on the survival prediction based on the Break Down method: an example based on data collected on admission to the ICU. (B) The impact of the values of particular features on the survival prediction based on SHAP values: an example based on data collected on admission to the ICU (C). The calculation of survival when one feature changed: an example based on data collected on admission to the ICU.
Figure 5. The survival prediction model for a selected patient. (A) The impact of the values of particular features on the survival prediction based on the Break Down method: an example based on data collected on admission to the ICU. (B) The impact of the values of particular features on the survival prediction based on SHAP values: an example based on data collected on admission to the ICU (C). The calculation of survival when one feature changed: an example based on data collected on admission to the ICU.
Cells 11 02433 g005
Table 1. Patient characteristics at baseline.
Table 1. Patient characteristics at baseline.
ParameterAllNonsurvivorsSurvivorsp
n = 122n = 54n = 68
Age (years)68.071.064.00.001
(60.0–77.0)(65.0–79.0)(56.0–74.0)
Female/male (n)58/6427/2731/370.627
APACHE II score24.028.020.0<0.001
(points)(17.0–29.0)(22.0–33.0)(15.0–26.0)
SOFA score10.011.59.0<0.001
(points)(8.0–13.0)(10.0–15.0)(7.0–11.0)
Procalcitonin 8.3814.574.47<0.001
(ng/mL)(1.76–30.4)(3.90–34.20)(0.80–15.47)
C-reactive protein 192.3197.3186.70.726
(mg/L)(112.6–302.5)(123.0–307.9)(100.4–302.5)
INR 1.341.491.20<0.001
(1.16–1.60)(1.32–1.80)(1.12–1.43)
Platelet count 182.5138.5209.50.001
(103/μL)(124.0–310.0)(74.0–243.0)(155.0–335.0)
D-dimers 5.685.705.540.294
(μg/mL)(3.64–12.59)(3.97–15.59)(3.37–11.47)
ICU stay 9.55.512.5<0.001
(days)(4.0–18.0)(3.0–12.0)(6.5–29.5)
Leukocytes 15.515.915.00.660
(103/μL)(11.0–22.5)(9.7–22.5)(11.2–22.9)
APACHE II, Acute Physiology and Chronic Health Evaluation II; SOFA, Sequential Organ Failure Assessment; INR, international normalized ratio. Values are presented as median and quartiles; the p-value represents the difference between Nonsurvivors and Survivors.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lemańska-Perek, A.; Krzyżanowska-Gołąb, D.; Kobylińska, K.; Biecek, P.; Skalec, T.; Tyszko, M.; Gozdzik, W.; Adamik, B. Explainable Artificial Intelligence Helps in Understanding the Effect of Fibronectin on Survival of Sepsis. Cells 2022, 11, 2433. https://doi.org/10.3390/cells11152433

AMA Style

Lemańska-Perek A, Krzyżanowska-Gołąb D, Kobylińska K, Biecek P, Skalec T, Tyszko M, Gozdzik W, Adamik B. Explainable Artificial Intelligence Helps in Understanding the Effect of Fibronectin on Survival of Sepsis. Cells. 2022; 11(15):2433. https://doi.org/10.3390/cells11152433

Chicago/Turabian Style

Lemańska-Perek, Anna, Dorota Krzyżanowska-Gołąb, Katarzyna Kobylińska, Przemysław Biecek, Tomasz Skalec, Maciej Tyszko, Waldemar Gozdzik, and Barbara Adamik. 2022. "Explainable Artificial Intelligence Helps in Understanding the Effect of Fibronectin on Survival of Sepsis" Cells 11, no. 15: 2433. https://doi.org/10.3390/cells11152433

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop