Predicting Intraoperative Hypothermia Burden during Non-Cardiac Surgery: A Retrospective Study Comparing Regression to Six Machine Learning Algorithms

Dibiasi, Christoph; Agibetov, Asan; Kapral, Lorenz; Zeiner, Sebastian; Kimberger, Oliver

doi:10.3390/jcm12134434

Open AccessArticle

Predicting Intraoperative Hypothermia Burden during Non-Cardiac Surgery: A Retrospective Study Comparing Regression to Six Machine Learning Algorithms

by

Christoph Dibiasi

^1,2,†

,

Asan Agibetov

^3,†

,

Lorenz Kapral

²,

Sebastian Zeiner

¹

and

Oliver Kimberger

^1,2,*

¹

Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University of Vienna, Währinger Gürtel 18-20, 1090 Vienna, Austria

²

Ludwig Boltzmann Institute Digital Health and Patient Safety, Währinger Straße 104/10, 1180 Vienna, Austria

³

Center for Medical Statistics, Informatics and Intelligent Systems, Institute of Artificial Intelligence, Medical University of Vienna, Währinger Straße 25a, 1090 Vienna, Austria

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

J. Clin. Med. 2023, 12(13), 4434; https://doi.org/10.3390/jcm12134434

Submission received: 15 May 2023 / Revised: 23 June 2023 / Accepted: 28 June 2023 / Published: 30 June 2023

(This article belongs to the Section Anesthesiology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Background: Inadvertent intraoperative hypothermia is a common complication that affects patient comfort and morbidity. As the development of hypothermia is a complex phenomenon, predicting it using machine learning (ML) algorithms may be superior to logistic regression. Methods: We performed a single-center retrospective study and assembled a feature set comprised of 71 variables. The primary outcome was hypothermia burden, defined as the area under the intraoperative temperature curve below 37 °C over time. We built seven prediction models (logistic regression, extreme gradient boosting (XGBoost), random forest (RF), multi-layer perceptron neural network (MLP), linear discriminant analysis (LDA), k-nearest neighbor (KNN), and Gaussian naïve Bayes (GNB)) to predict whether patients would not develop hypothermia or would develop mild, moderate, or severe hypothermia. For each model, we assessed discrimination (F1 score, area under the receiver operating curve, precision, recall) and calibration (calibration-in-the-large, calibration intercept, calibration slope). Results: We included data from 87,116 anesthesia cases. Predicting the hypothermia burden group using logistic regression yielded a weighted F1 score of 0.397. Ranked from highest to lowest weighted F1 score, the ML algorithms performed as follows: XGBoost (0.44), RF (0.418), LDA (0.406), LDA (0.4), KNN (0.362), and GNB (0.32). Conclusions: ML is suitable for predicting intraoperative hypothermia and could be applied in clinical practice.

Keywords:

anesthesia; surgery; hypothermia; prediction; machine learning

1. Introduction

Inadvertent intraoperative hypothermia is a common but often preventable complication during surgery [1]. In addition to greatly affecting patient comfort, hypothermia is thought to be associated with increased perioperative morbidity [2,3]. It also imposes a significant cost on the healthcare system [4]. Potentially underlying mechanisms of the mid- to long-term effects of intraoperative hypothermia include an increased rate of wound infections [5], impaired blood coagulation, and increased use of blood products [6]. Thermal care interventions, such as active warming prior to arrival in the operating room [7], intraoperative active warming [8], and fluid warming [9], have been shown to be effective in alleviating hypothermia and its associated complications [10] and are recommended by most guidelines [11]. Yet, at least mild hypothermia still occurs in the majority of surgical cases, even in fully equipped healthcare systems [12]. However, a recent multicenter trial demonstrated that intraoperative core temperatures as low as 35.5 °C are not associated with adverse clinical outcomes [13].

Estimating the risk of inadvertent hypothermia could enable the targeted use of thermal care interventions and thereby optimize the use of these potentially limited resources. We recently developed several regression models to predict intraoperative hypothermia of less than 36 °C and less than 35.5 °C with sufficient accuracy [14]. However, in this study, hypothermia was defined as point incidence of body temperature below two fixed and arbitrary thresholds. Hypothermia burden, defined by us as the area below the body temperature under 37 °C curve over time, may be a more appropriate outcome parameter. In addition, various studies in recent years have indicated that machine learning (ML) could be advantageous over logistic regression [15,16,17,18]. We therefore hypothesized that ML might be superior to logistic regression in predicting intraoperative hypothermia. Currently, there is no default approach to ML modelling, and each algorithm has unique strengths and weaknesses [18]. For instance, ensemble models—which integrate multiple algorithms—such as random forest (RF) and extreme gradient boosting (XGBoost) are robust to missing values but are weak at analyzing non-linear patterns in structured datasets [19]. In contrast, generative models such as linear discriminant analysis (LDA) models and Gaussian naïve Bayes (GNB) models assume that the predictor variables do not interact with each other [20].

In this study, we aimed (1) to analyze whether ML algorithms would be suitable to predict intraoperative hypothermia and (2) to compare their predictive performance among themselves and to logistic regression. We therefore trained a logistic regression model and six supervised classification ML models and assessed discrimination and calibration.

2. Materials and Methods

2.1. Data Collection

We analyzed electronic anesthesia records of the General Hospital of Vienna, a tertiary academic medical center, from the period September 2013 to March 2021. Patients were eligible when general anesthesia, neuraxial anesthesia, peripheral plexus block, or nerve block was performed for a surgical intervention and when body temperature measurements were available. Patients were excluded from the analysis if extracorporeal circulation (i.e., cardiopulmonary bypass or extracorporeal membrane oxygenation) was used during surgery or when therapeutic hyperthermia (e.g., hyperthermic intraperitoneal chemotherapy) was performed. In addition, we excluded any cases with a total duration of less than 15 min and brain-dead patients scheduled for organ donation. When a patient had multiple anesthesia cases, each case was counted separately. We exported all data from the patient data management system IntelliSpace Critical Care and Anesthesia (Philips Austria GmbH, Vienna, Austria). We performed no formal sample size calculation for this study.

2.2. Outcome Variables

Intraoperative body temperature was recorded at 2 min intervals. After data export, we removed artifacts in body temperature measurements utilizing the algorithm published by Sun et al. [21]. We then calculated the hypothermia burden, the area below 37 °C on the intraoperative time–body temperature curve, as our primary outcome. As we had aimed to use classification ML algorithms in this study, we assigned each patient to one of four groups, namely, no hypothermia, mild hypothermia, moderate hypothermia, and severe hypothermia, corresponding to the respective quarters of patients ranked by hypothermia burden.

We considered the binary variables “hypothermia below 36 °C, 35.5 °C, or 35 °C at a single time point” as secondary outcomes. Primary and secondary outcomes were decided upon prior to data analysis.

2.3. Predictor Variables

The feature set was comprised of 71 variables. Three variables described patient demographics (age (years), weight (kg), sex (male/female)); 2 variables described comorbidities (American Society of Anesthesiologists score (1–6), van Walraven comorbidity score (−19–89)) [22]; 14 variables encoded type of surgery (surgical urgency (elective/emergent/emergency), surgical billing code, and intraoperative positioning (supine, prone, Trendelenburg, anti-Trendelenburg, beachchair, leg extension, Lloyd–Davis, lithotomy, side, flexed side, other, not documented; all yes/no)); and 9 variables gave information on type of anesthesia (endotracheal intubation, laryngeal mask, fiberoptic intubation, spinal anesthesia, epidural anesthesia, upper extremity peripheral nerve block, lower extremity peripheral nerve block, peripheral nerve block at unknown location (all yes/no), and room temperature at induction of anesthesia (°C)). We included 8 preoperative laboratory parameters (hemoglobin (g/dL), leucocyte count (G/L), platelets (G/L), fibrinogen (mg/dL), c-reactive protein (mg/dL), aspartate aminotransferase (U/L), alanine aminotransferase (U/L), and creatinine (mg/dL)); 7 variables for induction medication (dosage of propofol (mg), midazolam (mg), etomidate (mg), esketamine (mg), fentanyl (µg), rocuronium (mg), cisatracurium (mg)); 4 variables for maintenance of anesthesia (propofol, remifentanil, any volatile agent, nitrous oxide (all yes/no)); and 24 variables for vital signs (systolic, mean, and diastolic blood pressure (all mmHg), peripheral transcutaneous oxygen saturation (0–100), heart rate (beats per minute), and pulse frequency (beats per minute)). Using the longitudinal vital sign measurements recorded every 15 s, we extracted the initial (i.e., first available vital signs in the record), minimum, and maximum (up to the documented start of surgery) vital signs. In addition, we collected the first available vital signs after documented induction of anesthesia.

2.4. Imputation of Missing Values and Feature Scaling

Imputation and feature scaling were fitted on the training set as part of a classifier pipeline, and the same pipeline was used to estimate the performance on the holdout test set. We used a single iterative imputer implementation from the scikit-learn Python package. Missing values of each feature were imputed in a round-robin fashion, where at each step each feature with a missing value is modeled as a function of other features. We used the built-in imputation scheme for the tree-based algorithms XGBoost and RF. The imputation step was fitted exclusively on training data to prevent information leakage into the model. For the algorithms that are sensitive to the scale of feature values (i.e., non-tree-based algorithms), we standardized each feature by removing the mean and scaling to unit variance. Categorical variables were label encoded. In the case of decision-tree based ML algorithms, categorical variables were one-hot encoded.

2.5. Prediction Models and Machine Learning Algorithms

We applied logistic regression models with different regularization penalties (L1, L2, elastic net) and the following ML algorithms:

XGBoost, a gradient tree-boosting algorithm comprising a multitude of decision trees.
RF, another decision-tree-based ensemble learning technique where a combined prediction of a forest of multiple random decision trees is used to obtain a more accurate and stable prediction. The key difference to XGBoost is that an RF algorithm is trained using the “bagging” technique as opposed to gradient boosting.
LDA, which discriminates between classes by learning the joint probability distribution of the input and target variables.
GNB, a modification of LDA where the covariance matrix is a diagonal matrix, thus drastically simplifying the computation.
k-nearest neighbor (KNN), a non-parametric algorithm that classifies a new data point based on the similarity to the training set.
Multi-layer perceptron (MLP) neural network, with a standard feed-forward architecture with hidden layers consisting of neurons.

2.6. Model Tuning, Selection, and Evaluation

We randomly split the whole dataset into training and test subsets using a ratio of 70:30 and stratified them to preserve the positive/negative class ratio. We only used the test set for the final evaluation of the optimized models, and no information leaked into the training and model tuning phases. To predict the hypothermia burden group (no, mild, moderate, or severe hypothermia), we measured the discriminative performance of all models using the weighted F1 score, which incorporates two important characteristics of predictive models—precision and recall. Precision (or positive predictive value) refers to the proportion of true positives to all predicted positives (i.e., true and false positives) and—in the context of predicting hypothermia—gives the probability that a patient predicted to develop hypothermia will truly become hypothermic. Recall (or sensitivity) is given by the ratio of true positives to all positives (i.e., true positives and false negatives). It can be interpreted as the chance that a patient will develop hypothermia as detected by a given model. The F1 score is calculated as the harmonic mean of precision and recall:

F1 = 2 × (Precision × Recall)/(Precision + Recall)

(1)

To predict the hypothermia burden group, we used the weighted F1 score, which averages the F1 scores for each class (no, mild, moderate, or severe hypothermia) using the number of occurrences of each class as weight. We predicted the secondary outcomes (occurrence of intraoperative temperature below 35 °C, 35.5 °C, or 36 °C) using binary prediction models and used the area under the receiver operating characteristic curve (AUROC) to assess discriminative performance. We also used AUROC to report the discriminative performance of the primary outcome prediction models for each hypothermia burden category (e.g., no hypothermia vs. the rest; mild hypothermia vs. the rest, etc.).

We assessed calibration using calibration-in-the-large, calibration intercept, and slope. Calibration-in-the-large is defined as the difference in the log-odds of the mean observed value compared to the mean predicted value; a value closer to 0 indicates better calibration. Calibration intercept and slope were calculated using the Cox method, whereby the observed binary outcome is regressed to the log odds of the estimates using a general linear model [23]. The estimated regression intercept represents the overall miscalibration, where 0 indicates good calibration, while the estimated regression slope gives the direction of miscalibration, where 1 denotes perfect calibration.

We performed hyperparameter optimization for each algorithm on the training set by employing a Bayesian search space optimization technique to estimate the best parameters for each model stochastically. For each algorithm, we let the hyperparameter optimization run for 50 trials. In each trial, a hyperparameter configuration was sampled, and a 3-fold cross-validation was used to evaluate this specific configuration. Based on the mean discrimination score (F1 for the primary outcome or AUROC for the secondary outcomes) of a 3-fold cross-validation, the optimization algorithm decided where to sample the next hyperparameter configuration. We provide the search space sampling definition for the hyperparameters as Supplementary Material (Supplementary Table S1). The best configuration of hyperparameter values was chosen based on the cross-validated discrimination score (the higher the better) and was evaluated using the test set.

2.7. Software

Data export was performed using Microsoft SQL Server Management Studio 18 (Microsoft, Redmond, WA, USA). Data analysis was performed using R version 4.1.2 [24] and Python version 3.10.7. We used the official Python package for XGBoost [25] and the scikit-learn Python package [26] to implement the remaining ML algorithms. The Python package skopt [27] was used for the Bayesian hyperparameter optimization.

3. Results

We analyzed data from 140,241 eligible surgical cases performed between September 2013 and March 2021; 87,116 cases were included in the final analysis (Figure 1). Baseline data of included patients are given in Table 1.

The primary outcome was hypothermia burden, calculated as area under the curve of body temperature below 37 °C over the intraoperative course for each individual patient. Overall median (interquartile range (IQR)) hypothermia burden was 1.01 °C·h (0.49–1.87). We then split the whole dataset into four equally sized quarters of patients stratified by hypothermia burden. In total, 19,484 patients were in the quarter with the lowest hypothermia burden (temperature AUC range 0–0.44 °C·h; i.e., “no hypothermia”), 22,220 patients were in the second quarter (temperature AUC range 0.44–0.96 °C·h; i.e., “mild hypothermia”), 22,800 patients were in the third quarter (temperature AUC range 0.96–1.82 °C·h; i.e., “moderate hypothermia”), and 22,612 patients were in the fourth quarter with the highest hypothermia burden (temperature AUC range > 1.82 °C·h; i.e., “severe hypothermia”). The median (IQR) hypothermia burdens in the respective quarters were 0.23 °C·h (0.08–0.34), 0.68 °C·h (0.56–0.81), 1.31 °C·h (1.12–1.54), and 2.73 °C·h (2.18–3.78).

We trained one logistic regression model and six ML algorithms (XGBoost, RF, LDA, GNB, KNN, MLP) to predict the hypothermia burden group of individual patients. Logistic regression had a weighted F1 score of 0.397. Ranked by weighted F1 score, the best performing algorithm was XGBoost with an F1 score of 0.44, an 10.74% increase compared to logistic regression (Table 2).

In addition to hypothermia burden, we predicted hypothermia based on minimum body temperature at three thresholds: <35 °C, <35.5 °C, and <36 °C. In total, 5592 patients (6%) had a minimum body temperature <35 °C; 17,240 patients (20%) below 35.5 °C, and 43,379 patients (50%) below 36 °C. We obtained the following AUROC with logistic regression: 0.701 for <35 °C, 0.692 for <35.5 °C, and 0.703 for <36 °C. Regarding the ML algorithms, XGBoost had the highest AUROC across all three temperature thresholds: 0.736 for <36°C, 0.717 for <35.5 °C, and 0.715 for <35 °C (Supplementary Table S3 and Supplementary Figure S1). In general, AUROC was lower than the values obtained from predicting the hypothermia burden quarter. We obtained good calibration with models trained to predict body temperature <36 °C, but calibration worsened when temperatures of <35.5 °C and <35 °C were to be predicted (Supplementary Table S4 and Supplementary Figure S2).

We then tested the binary predictions of individual hypothermia burden groups (i.e., whether individual patients will develop no hypothermia or not, or mild hypothermia or not, and so on). For this, we calculated AUROC. For all algorithms, AUROC was higher when predicting the occurrence of no and/or severe hypothermia compared to predicting the occurrence of mild and/or moderate hypothermia (Figure 2). Detailed results for model discrimination are given in Supplementary Table S2. Most models had acceptable calibration, with GNB being a notable exception (Table 3 and Figure 3).

In addition to hypothermia burden, we predicted hypothermia based on minimum body temperature at three thresholds: <35 °C, <35.5 °C, and <36 °C. In total, 5592 patients (6%) had a minimum body temperature <35 °C; 17,240 patients (20%) below 35.5 °C, and 43,379 patients (50%) below 36 °C. We obtained the following AUROC with logistic regression: 0.701 for <35 °C, 0.692 for <35.5 °C, and 0.703 for <36 °C. Regarding the ML algorithms, XGBoost had the highest AUROC across all three temperature thresholds: 0.736 for <36 °C, 0.717 for <35.5 °C, and 0.715 for <35 °C (Supplementary Table S3 and Supplementary Figure S1). In general, AUROC was lower than the values obtained from predicting the hypothermia burden quarter. We obtained good calibration with models trained to predict body temperature < 36 °C, but calibration worsened when temperatures of <35.5 °C and <35 °C were to be predicted (Supplementary Table S4 and Supplementary Figure S2).

4. Discussion

In this study, we trained and evaluated a logistic regression model and six ML algorithms to predict whether patients will develop intraoperative hypothermia during non-cardiac surgery. We found that both discrimination and calibration varied considerably between logistic regression and the ML algorithms analyzed, rendering only some techniques, such as XGBoost, RF, and logistic regression, suitable for further evaluation in clinical practice. We defined hypothermia using two approaches: first, as hypothermia burden (degrees below 37 °C·h); and second, as point incidence of body temperature below three thresholds. In general, predicting the extremes of hypothermia burden (i.e., whether a patient will likely experience no hypothermia or severe hypothermia) proved to be superior compared to predicting mild/moderate hypothermia and relying on point incidences. Across all tested combinations, XGBoost had superior discrimination and had good calibration.

It has previously been shown that hypothermia, defined as minimum body temperature below one of three thresholds, is a common occurrence during general anesthesia [21]. Although thermal care interventions, such as active prewarming prior to induction of anesthesia and intraoperative forced air warming, are recommend for routine use [11], they are often not performed, either due to equipment shortage, understaffing, or presumed high costs [28]. The occurrence of intraoperative hypothermia is a complex phenomenon attributable to a variety of patient-specific, surgical, and system-specific risk factors, and interactions between these factors are likely [29]. Predicting the occurrence of hypothermia could be useful for targeted resource allocation and could contribute to economic savings attributed to thermal care interventions [4]. In addition, a predictive model able to identify patients at high risk of intraoperative hypothermia could be used to select patients for prospective trials.

While normal core body temperature is about 37 °C [2], temperatures as low as 36 °C are considered normal in anesthetized patients [1]. Recently, it has been demonstrated that intraoperative body temperatures even as low as 35.5 °C were not associated with adverse outcomes [13]. We recently showed that hypothermia defined by body temperature falling below thresholds of 36 °C and 35.5 °C at least at a single time point can be predicted with acceptable performance using logistic regression [14]. However, the definition of hypothermia used in this study may be imprecise when applied to our dataset, as nearly half of all patients in our cohort crossed the <36 °C threshold, even though average body temperature during anesthesia was above >36°. It is biologically plausible that the detrimental effects of hypothermia are related to both the absolute fall of body temperature as well as the duration of exposure. Hypothermia burden, the area under the body temperature below 37 °C curve over time, relates those features and may be clinically more important than hypothermia at single time points. Indeed, it was shown previously that increasing hypothermia burden is associated with higher odds for perioperative blood product transfusion as well as increased hospital length of stay [21].

In our previous study, we predicted hypothermia using only logistic regression, a statistical method used to analyze the association between data points and binary outcomes. ML, however, is an umbrella term for a heterogenous group of computational algorithms that can be used to predict binary or multilabel outcomes, such as the hypothermia burden group. Advantages commonly attributed to ML include the ability to automatically process a large amount of data and fit models with many predictors [18] while imposing fewer restrictions on the dataset. For instance, co-linearities between predictors are automatically uncovered by some ML techniques (e.g., XGBoost, RF, and MLP) but have to be explicitly specified when logistic regression is used [18,30]. In our dataset, those algorithms that were able to automatically detect non-linear relationships outperformed logistic regression. While MLP showed marginal improvements in F1 score (<1%) over logistic regression, the decision tree ensemble methods XGBoost and RF performed considerably better (11% and 5%, respectively). Interestingly, the outcome definition seems to be of particular importance to obtain good model discrimination, as the discrimination of hypothermia defined via threshold temperatures was considerably worse than the prediction of the extreme hypothermia burden quarters (no hypothermia and severe hypothermia), which were more easily detected as those in-between. In addition, some models (MLP and logistic regression) used to predict hypothermia below the three thresholds were severely miscalibrated, possibly related to the relatively small incidence of hypothermia below 35 °C and 35.5 °C. Regarding the prediction of hypothermia burden, XGBoost and RF had acceptable calibration errors, but logistic regression was the best calibrated algorithm. GNB and KNN performed considerably worse in terms of both discrimination and calibration. A possible explanation for the subpar performance of GNB could be that it assumes the independence of variables in the feature set.

Despite the advantages attributed to ML, there is still an open debate on whether ML is superior to traditional logistic regression-based methods for clinical data. It has been claimed that ML methods show no performance benefit over logistic regression [31] and that ML models often lack transparent reporting [32]. In particular, the lack of calibration assessment of ML models has often been criticized [33]. Our data indicate that discrimination can be improved when ML, particularly XGBoost, is used.

Limitations

Due to the retrospective nature of this study, we had no information on data not routinely recorded at our institution, such as patient core body temperature prior to induction of anesthesia or type and location of the temperature probe. Most anesthesia cases analyzed involved general anesthesia. Only a minuscule proportion of cases had neuraxial or regional anesthesia, which is related to the lack of routine temperature measurements in those cases, and which leads to selection bias in our study. As we only used data from our center, we cannot report on the external validity of our models. In addition, as we primarily intended to compare the predictive performance of ML models to regression analysis using an existing dataset, not all predictor variables are available before surgery. For instance, the use of neuromuscular blockade or vasoactive medication is sometimes not planned for. This will reduce predictive performance if our models are used to predict hypothermia ahead of surgery based on a presumed course of anesthesia. Therefore, the results of our study are not readily translatable into clinical practice and should be externally validated in further studies.

In summary, we built several ML models to predict hypothermia of varying severity and compared their performance to logistic regression. We found better discrimination than logistic regression and acceptable calibration with some models (XGBoost, RF), while others (KNN, GNB) performed significantly worse. Future studies on predicting intraoperative hypothermia using ML should focus on tree-based ML algorithms and validate our findings using a prospective study design.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jcm12134434/s1: Supplementary Table S1: Initial parameters and hyperparameter search space of the applied machine learning algorithms, Supplementary Table S2: Detailed comparison of model discrimination metrics for prediction of hypothermia burden, Supplementary Table S3: Comparison of model discrimination metrics for prediction of hypothermia at a single time point. Supplementary Table S4: Comparison of model calibration for prediction of hypothermia at a single time point, Supplementary Figure S1: Receiver operating characteristic curve for prediction of hypothermia at a single time point. Supplementary Figure S2: Calibration plots for prediction of hypothermia at a single time point.

Author Contributions

Conceptualization, C.D., A.A. and O.K.; methodology, C.D., A.A., L.K. and O.K.; software, A.A. and L.K.; validation, A.A. and L.K.; formal analysis, C.D., A.A., L.K., S.Z. and O.K.; investigation, C.D., A.A., L.K., S.Z. and O.K.; resources, C.D. and O.K.; data curation, C.D., A.A. and L.K.; writing—original draft preparation, C.D.; writing—review and editing, C.D., A.A., L.K., S.Z. and O.K.; visualization, C.D.; supervision, O.K.; project administration, C.D. and O.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee the Medical University of Vienna (protocol code 1402/2021, approved on 11 May 2021).

Informed Consent Statement

Patient consent was waived due to the retrospective study design.

Data Availability Statement

Data are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Forbes, S.S.; Eskicioglu, C.; Nathens, A.B.; Fenech, D.S.; Laflamme, C.; McLean, R.F.; McLeod, R.S. Evidence-Based Guidelines for Prevention of Perioperative Hypothermia. J. Am. Coll. Surg. 2009, 209, 492–503.e1. [Google Scholar] [CrossRef]
Sessler, D.I. Perioperative Thermoregulation and Heat Balance. Lancet 2016, 387, 2655–2664. [Google Scholar] [CrossRef]
Xu, H.; Wang, Z.; Guan, X.; Lu, Y.; Malone, D.C.; Salmon, J.W.; Ma, A.; Tang, W. Safety of Intraoperative Hypothermia for Patients: Meta-Analyses of Randomized Controlled Trials and Observational Studies. BMC Anesth. 2020, 20, 202. [Google Scholar] [CrossRef]
Ralph, N.; Gow, J.; Conway, A.; Duff, J.; Edward, K.-L.; Alexander, K.; Bräuer, A. Costs of Inadvertent Perioperative Hypothermia in Australia: A Cost-of-Illness Study. Collegian 2020, 27, 345–351. [Google Scholar] [CrossRef]
Allegranzi, B.; Zayed, B.; Bischoff, P.; Kubilay, N.Z.; de Jonge, S.; de Vries, F.; Gomes, S.M.; Gans, S.; Wallert, E.D.; Wu, X.; et al. New WHO Recommendations on Intraoperative and Postoperative Measures for Surgical Site Infection Prevention: An Evidence-Based Global Perspective. Lancet Infect. Dis. 2016, 16, e288–e303. [Google Scholar] [CrossRef] [PubMed]
Rajagopalan, S.; Mascha, E.; Na, J.; Sessler, D.I. The Effects of Mild Perioperative Hypothermia on Blood Loss and Transfusion Requirement. Anesthesiology 2008, 108, 71–77. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Andrzejowski, J.; Hoyle, J.; Eapen, G.; Turnbull, D. Effect of Prewarming on Post-Induction Core Temperature and the Incidence of Inadvertent Perioperative Hypothermia in Patients Undergoing General Anaesthesia. Br. J. Anaesth. 2008, 101, 627–631. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Warttig, S.; Alderson, P.; Campbell, G.; Smith, A.F. Interventions for Treating Inadvertent Postoperative Hypothermia. Cochrane Database Syst. Rev. 2014, CD009892. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Campbell, G.; Alderson, P.; Smith, A.F.; Warttig, S. Warming of Intravenous and Irrigation Fluids for Preventing Inadvertent Perioperative Hypothermia. Cochrane Database Syst. Rev. 2015, CD009891. [Google Scholar] [CrossRef] [PubMed]
Balki, I.; Khan, J.S.; Staibano, P.; Duceppe, E.; Bessissow, A.; Sloan, E.N.; Morley, E.E.; Thompson, A.N.; Devereaux, B.; Rojas, C.; et al. Effect of Perioperative Active Body Surface Warming Systems on Analgesic and Clinical Outcomes: A Systematic Review and Meta-Analysis of Randomized Controlled Trials. Anesth. Analg. 2020, 131, 1430–1443. [Google Scholar] [CrossRef]
National Institute for Health and Clinical Excellence. Hypothermia: Prevention and Management in Adults Having Surgery; National Institute for Health and Care Excellence: London, UK, 2016; 18p. [Google Scholar]
Alfonsi, P.; Bekka, S.; Aegerter, P.; SFAR Research Network investigators. Prevalence of Hypothermia on Admission to Recovery Room Remains High despite a Large Use of Forced-Air Warming Devices: Findings of a Non-Randomized Observational Multicenter and Pragmatic Study on Perioperative Hypothermia Prevalence in France. PLoS ONE 2019, 14, e0226038. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Sessler, D.I.; Pei, L.; Li, K.; Cui, S.; Chan, M.T.V.; Huang, Y.; Wu, J.; He, X.; Bajracharya, G.R.; Rivas, E.; et al. Aggressive Intraoperative Warming versus Routine Thermal Management during Non-Cardiac Surgery (PROTECT): A Multicentre, Parallel Group, Superiority Trial. Lancet 2022, 399, 1799–1808. [Google Scholar] [CrossRef]
Wallisch, C.; Zeiner, S.; Scholten, P.; Dibiasi, C.; Kimberger, O. Development and Internal Validation of an Algorithm to Predict Intraoperative Risk of Hypothermia Based on Preoperative Data. Sci. Rep. 2021, 11, 22296. [Google Scholar] [CrossRef] [PubMed]
Yan, X.; Goldsmith, J.; Mohan, S.; Turnbull, Z.A.; Freundlich, R.E.; Billings, F.T.; Kiran, R.P.; Li, G.; Kim, M. Impact of Intraoperative Data on Risk Prediction for Mortality After Intra-Abdominal Surgery. Anesth. Analg. 2022, 134, 102–113. [Google Scholar] [CrossRef]
Hill, B.L.; Brown, R.; Gabel, E.; Rakocz, N.; Lee, C.; Cannesson, M.; Baldi, P.; Olde Loohuis, L.; Johnson, R.; Jew, B.; et al. An Automated Machine Learning-Based Model Predicts Postoperative Mortality Using Readily-Extractable Preoperative Electronic Health Record Data. Br. J. Anaesth. 2019, 123, 877–886. [Google Scholar] [CrossRef] [PubMed]
Kendale, S.; Kulkarni, P.; Rosenberg, A.D.; Wang, J. Supervised Machine-Learning Predictive Analytics for Prediction of Postinduction Hypotension. Anesthesiology 2018, 129, 675–688. [Google Scholar] [CrossRef] [PubMed]
Goldstein, B.A.; Navar, A.M.; Carter, R.E. Moving beyond Regression Techniques in Cardiovascular Risk Prediction: Applying Machine Learning to Address Analytic Challenges. Eur. Heart J. 2017, 38, 1805–1814. [Google Scholar] [CrossRef] [Green Version]
De Bruyne, S.; Speeckaert, M.M.; Van Biesen, W.; Delanghe, J.R. Recent Evolutions of Machine Learning Applications in Clinical Laboratory Medicine. Crit. Rev. Clin. Lab. Sci. 2020, 58, 131–152. [Google Scholar] [CrossRef]
Alloghani, M.; Aljaaf, A.; Hussain, A.; Baker, T.; Mustafina, J.; Al-Jumeily, D.; Khalaf, M. Implementation of Machine Learning Algorithms to Create Diabetic Patient Re-Admission Profiles. BMC Med. Inf. Decis. Mak. 2019, 19, 253. [Google Scholar] [CrossRef] [Green Version]
Sun, Z.; Honar, H.; Sessler, D.I.; Dalton, J.E.; Yang, D.; Panjasawatwong, K.; Deroee, A.F.; Salmasi, V.; Saager, L.; Kurz, A. Intraoperative Core Temperature Patterns, Transfusion Requirement, and Hospital Duration in Patients Warmed with Forced Air. Anesthesiology 2015, 122, 276–285. [Google Scholar] [CrossRef] [Green Version]
Van Walraven, C.; Austin, P.C.; Jennings, A.; Quan, H.; Forster, A.J. A Modification of the Elixhauser Comorbidity Measures into a Point System for Hospital Death Using Administrative Data. Med. Care 2009, 47, 626–633. [Google Scholar] [CrossRef]
Huang, Y.; Li, W.; Macheret, F.; Gabriel, R.A.; Ohno-Machado, L. A Tutorial on Calibration Measurements and Calibration Models for Clinical Prediction Models. J. Am. Med. Inf. Assoc. 2020, 27, 621–633. [Google Scholar] [CrossRef] [PubMed]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Head, T.; Kumar, M.; Nahrstaedt, H.; Louppe, G.; Shcherbatyi, I. Scikit-Optimize/Scikit-Optimize 2020, v0. 8.1. Available online: https://scikit-optimize.github.io/stable/ (accessed on 14 May 2023).
Harper, C.M.; Andrzejowski, J.C.; Alexander, R. NICE and Warm. Br. J. Anaesth. 2008, 101, 293–295. [Google Scholar] [CrossRef] [Green Version]
Collins, S.; Budds, M.; Raines, C.; Hooper, V. Risk Factors for Perioperative Hypothermia: A Literature Review. J. Perianesthesia Nurs. 2019, 34, 338–346. [Google Scholar] [CrossRef] [PubMed]
Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2009; ISBN 978-0-387-84857-0. [Google Scholar]
Christodoulou, E.; Ma, J.; Collins, G.S.; Steyerberg, E.W.; Verbakel, J.Y.; Van Calster, B. A Systematic Review Shows No Performance Benefit of Machine Learning over Logistic Regression for Clinical Prediction Models. J. Clin. Epidemiol. 2019, 110, 12–22. [Google Scholar] [CrossRef]
Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G.M.; TRIPOD Group. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement. Circulation 2015, 131, 211–219. [Google Scholar] [CrossRef] [Green Version]
Van Calster, B.; Nieboer, D.; Vergouwe, Y.; De Cock, B.; Pencina, M.J.; Steyerberg, E.W. A Calibration Hierarchy for Risk Models Was Defined: From Utopia to Empirical Data. J. Clin. Epidemiol. 2016, 74, 167–176. [Google Scholar] [CrossRef]

Figure 1. Patient inclusion flow chart. ASA: American Society of Anesthesiologists. HIPEC: Hyperthermic intraperitoneal chemotherapy.

Figure 2. Receiver operating characteristic curve for the prediction of hypothermia burden quartile. MLP: multi-layer perceptron neural network, LDA: linear discriminant analysis, KNN: K-nearest neighbor, GNB: Gaussian naïve Bayes.

Figure 3. Calibration plots for prediction of hypothermia burden. MLP: multi-layer perceptron neural network; LDA: linear discriminant analysis; KNN: K-nearest neighbor; GNB: Gaussian naïve Bayes.

Table 1. Baseline characteristics of study participants.

	No Hypothermia n = 19,484	Mild Hypothermia n = 22,220	Moderate Hypothermia n = 22,800	Severe Hypothermia n = 22,612
Age (years)	52 (37–66)	53 (39–67)	55 (41–68)	58 (45–70)
Male sex	9076 (47%)	10,547 (47%)	11,520 (51%)	11,785 (52%)
Weight (kg)	77 (65–90)	75 (64–88)	75 (65–87)	74 (63–85)
Unknown	2372	2484	2063	1342
Height (cm)	170 (164–178)	170 (164–178)	170 (164–178)	170 (164–178)
Unknown	9961	10,791	10,646	9838
ASA score
I	4702 (24%)	5885 (26%)	5509 (24%)	3947 (17%)
II	7788 (40%)	10,110 (45%)	10,396 (46%)	9688 (43%)
III	5856 (30%)	5528 (25%)	6045 (27%)	7709 (34%)
IV	883 (4.5%)	541 (2.4%)	615 (2.7%)	839 (3.7%)
V	255 (1.3%)	156 (0.7%)	235 (1.0%)	429 (1.9%)
Surgical urgency
Elective	11,285 (68%)	14,689 (81%)	15,296 (83%)	14,962 (81%)
Urgent	4071 (25%)	2741 (15%)	2439 (13%)	2335 (13%)
Emergency	1121 (6.8%)	691 (3.8%)	745 (4.0%)	1076 (5.9%)
Unknown	3007	4099	4320	4239
van Walraven comorbidity score	0.0 (0.0–4.0)	0.0 (0.0–4.0)	0.0 (0.0–4.0)	0.0 (0.0–4.0)
Unknown	296	377	380	399
Surgical discipline
Oral and maxillofacial surgery	1002 (5.1%)	1131 (5.1%)	1303 (5.7%)	997 (4.4%)
Plastic surgery	903 (4.6%)	937 (4.2%)	1052 (4.6%)	1172 (5.2%)
Head and neck surgery	2320 (12%)	1945 (8.8%)	1465 (6.4%)	661 (2.9%)
Dermatology	136 (0.7%)	269 (1.2%)	239 (1.0%)	100 (0.4%)
Orthopedic and/or trauma surgery	2870 (15%)	4503 (20%)	5789 (25%)	5697 (25%)
Ophthalmology	1266 (6.5%)	1667 (7.5%)	941 (4.1%)	230 (1.0%)
Urology	1504 (7.7%)	1747 (7.9%)	1551 (6.8%)	1610 (7.1%)
General surgery	5822 (30%)	5519 (25%)	5220 (23%)	5235 (23%)
Gynecology	1888 (9.7%)	2369 (11%)	1915 (8.4%)	1440 (6.4%)
Obstetrics	72 (0.4%)	61 (0.3%)	39 (0.2%)	33 (0.1%)
Vascular surgery	613 (3.1%)	512 (2.3%)	706 (3.1%)	1377 (6.1%)
Thoracic surgery	409 (2.1%)	595 (2.7%)	942 (4.1%)	1548 (6.8%)
Neurosurgery	834 (4.3%)	1143 (5.1%)	1888 (8.3%)	2930 (13%)
Ambient room temperature (°C)	19.99 (19.02–20.95)	19.98 (19.00–20.87)	19.90 (18.99–20.44)	19.18 (18.99–20.05)
Unknown	3008	3560	4546	5793
Operating room time (min)	83 (61–121)	98 (76–136)	135 (106–182)	222 (166–310)

All data are given as absolute and relative frequencies or the median and interquartile range. ASA: American Society of Anesthesiologists, kg: kilogram, cm: centimeter.

Table 2. Baseline characteristics of study participants. Comparison of model discrimination as measured by area and the receiver operating characteristics curve (AUROC) to predict hypothermia burden. The relative change with respect to the AUROC of the corresponding logistic regression model is given in parentheses.

	Weighted F1 Score	AUROC
		No Hypothermia	Mild Hypothermia	Moderate Hypothermia	Severe Hypothermia
XGBoost	0.44 (10.74%)	0.781 (6.22%)	0.655 (4.49%)	0.617 (3.75%)	0.812 (8.44%)
Random Forest	0.418 (5.15%)	0.756 (2.85%)	0.641 (2.36%)	0.604 (1.55%)	0.784 (4.59%)
LDA	0.406 (2.16%)	0.735 (−0.05%)	0.626 (−0.06%)	0.592 (−0.39%)	0.748 (−0.2%)
MLP	0.4 (0.58%)	0.738 (0.45%)	0.607 (−3.11%)	0.582 (−2.06%)	0.761 (1.6%)
Logistic Regression	0.397 (0.00%)	0.735 (0%)	0.627 (0%)	0.594 (0%)	0.749 (0%)
KNN	0.362 (−8.97%)	0.676 (−7.98%)	0.568 (−9.38%)	0.542 (−8.8%)	0.699 (−6.7%)
GNB	0.32 (−19.50%)	0.673 (−8.48%)	0.58 (−7.45%)	0.558 (−6.19%)	0.694 (−7.34%)

XGBoost: extreme gradient boosting, MLP: multi-layer perceptron neural network, LDA: linear discriminant analysis, KNN: k-nearest neighbor, GNB: Gaussian naïve Bayes.

Table 3. Comparison of model calibration to predict hypothermia burden quarter.

	No Hypothermia			Mild Hypothermia			Moderate Hypothermia			Severe Hypothermia
	Mean	Intercept	Slope	Mean	Intercept	Slope	Mean	Intercept	Slope	Mean	Intercept	Slope
XGBoost	0.044	0.118	1.064	−0.033	−0.093	0.939	−0.012	−0.184	0.824	0.005	0.090	1.102
Random Forest	0.044	0.521	1.423	−0.037	0.269	1.307	−0.017	0.242	1.260	0.014	0.483	1.510
MLP	0.024	−0.331	0.640	−0.093	−0.621	0.429	0.058	−0.591	0.379	0.013	−0.276	0.656
LDA	0.061	−0.021	0.920	−0.041	−0.148	0.894	−0.021	−0.250	0.768	0.006	−0.051	0.937
Logistic Regression	−0.094	−0.094	1.015	0.011	0.017	1.006	0.049	−0.094	0.864	0.030	0.015	0.978
KNN	0.058	−0.846	0.131	−0.124	−0.964	0.071	−0.058	−0.971	0.038	0.135	−0.644	0.152
GNB	−0.366	−1.135	0.020	−0.946	−1.059	0.006	1.395	−0.974	0.010	0.635	−0.627	0.062

XGBoost: extreme gradient boosting, MLP: multi-layer perceptron neural network, LDA: linear discriminant analysis, KNN: k-nearest neighbor, GNB: Gaussian naïve Bayes.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dibiasi, C.; Agibetov, A.; Kapral, L.; Zeiner, S.; Kimberger, O. Predicting Intraoperative Hypothermia Burden during Non-Cardiac Surgery: A Retrospective Study Comparing Regression to Six Machine Learning Algorithms. J. Clin. Med. 2023, 12, 4434. https://doi.org/10.3390/jcm12134434

AMA Style

Dibiasi C, Agibetov A, Kapral L, Zeiner S, Kimberger O. Predicting Intraoperative Hypothermia Burden during Non-Cardiac Surgery: A Retrospective Study Comparing Regression to Six Machine Learning Algorithms. Journal of Clinical Medicine. 2023; 12(13):4434. https://doi.org/10.3390/jcm12134434

Chicago/Turabian Style

Dibiasi, Christoph, Asan Agibetov, Lorenz Kapral, Sebastian Zeiner, and Oliver Kimberger. 2023. "Predicting Intraoperative Hypothermia Burden during Non-Cardiac Surgery: A Retrospective Study Comparing Regression to Six Machine Learning Algorithms" Journal of Clinical Medicine 12, no. 13: 4434. https://doi.org/10.3390/jcm12134434

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Intraoperative Hypothermia Burden during Non-Cardiac Surgery: A Retrospective Study Comparing Regression to Six Machine Learning Algorithms

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Outcome Variables

2.3. Predictor Variables

2.4. Imputation of Missing Values and Feature Scaling

2.5. Prediction Models and Machine Learning Algorithms

2.6. Model Tuning, Selection, and Evaluation

2.7. Software

3. Results

4. Discussion

Limitations

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI