2. Materials and Methods
2.1. Hemorrhagic Shock Swine Injury Model
Datasets captured in a previously performed swine (
Sus scrofa domestica) hemorrhagic shock injury and fluid resuscitation model were used for training and developing the ML models in this research effort [
17]. Research was conducted in compliance with the Animal Welfare Act, implementing Animal Welfare regulations and the principles of the Guide for the Care and Use of Laboratory Animals. This study was approved by the Institutional Animal Care and Use Committee (IACUC). The facility where this research was conducted is fully accredited by AAALAC International.
Briefly, animals were maintained under a surgical plane of anesthesia using 0–5% isoflurane titrated to effect. Analgesia was provided throughout the study with buprenorphine SR. In this study, swine subjects were first instrumented with femoral catheters for controlled hemorrhage (artery) and resuscitation (vein). A carotid artery catheter was placed for arterial pressure readings (Arrow International, Morrisville, NC, USA), and an 8.5 Fr. percutaneous sheath introducer was placed through which a pulmonary artery Swan-Ganz catheter (Edwards Lifesciences, Irvine, CA, USA) was advanced into the pulmonary artery for cardiac output monitoring. Next, an open splenectomy was performed, followed by a 30-min stabilization period.
Then, a controlled hemorrhage to a target mean arterial pressure (MAP) of 35 mmHg was performed to induce hypovolemic shock, wherein an automated hemorrhage decision table (AutoBleed) controlled the rates of blood removal to reach this MAP target. Removed blood was immediately mixed with CPDA-1 solution (citrate, phosphate, dextrose, adenine at a 1:7 volumetric ratio) for anticoagulation. Animals were held at this pressure target under AutoBleed control until blood lactate levels reached 4 mmol/L. During this variable hold window, AutoBleed continued to remove blood or reinfuse blood to maintain pressure at 35 mmHg. After the blood lactate hemorrhagic shock target was reached, animals received a calcium chloride bolus (1 g/10 mL) and were resuscitated using an adaptive resuscitation controller (ARC) with whole blood to a target MAP of 65 mmHg [
20]. Whole blood was infused for 10 min using ARC to the target MAP, followed by switching the infusate to lactated Ringer’s (LR) solution for an additional 2 hours.
After the 2-hour resuscitation hold at target MAP was completed, animals underwent a re-bleed of identical magnitude and duration to the initial hemorrhage event. The animals were re-infused with LR using ARC to reach the target MAP and held for an additional 2 hours, followed by euthanasia with sodium pentobarbital (FatalPlus®).
2.2. Data Processing
For this study, only the baseline region through the whole blood resuscitation region was used in the data processing and ML model development methodology. MAP and volumetric blood hemorrhage and infusion data were recorded at 500 Hz and 1/5 Hz, respectively. The analog data were downsampled to a frequency of 100 Hz, and the digital data were upsampled to a frequency of 100 Hz for use in this study. The arterial waveform was filtered using a 512th-order finite impulse response (FIR) window lowpass filter with a cutoff frequency of 6 Hz. The pulse foot (diastolic trough), systolic peak, half-rise between the diastolic trough and systolic peak, the first inflection point, and the end of the waveform segment were calculated and identified for each waveform segment present in the arterial waveform [
18]. In the absence of an inflection point of a waveform segment, the half-drop between the systolic peak and the following diastolic trough was calculated and identified [
16]. The identified landmarks were then used to produce features using various mathematical manipulation techniques, based on previous research efforts [
21,
22,
23,
24], as well as features developed internally at the U.S. Army Institute of Surgical Research (USAISR), resulting in over 4200 features at each waveform segment of the arterial signal for each respective swine subject. In addition, a secondary set of features was extracted from an arterial waveform signal that was detrended using a fifth-order polynomial to eliminate baseline drift that may be present in the signal [
25]. This detrending procedure removes the fluctuation of blood pressure seen by the pulse waves in the arterial waveform signal [
26].
For the DL model, the analog data was downsampled, and the digital data was upsampled to a frequency of 100 Hz. This was done to match the sampling frequency of the training data used to create the DL model in previous works. To preprocess the data, it was segmented into intervals of 5 s, matching the pretraining data previously used to develop the DL model. No manual feature extraction was performed, as the neural network trains with convolutional neural network layers, which automatically extract features from the input signal [
27,
28].
2.3. Machine Learning Models
2.3.1. Updates to Prior ML Prediction Models
The extracted features were used to develop ML models to predict calculated metrics for tracking both hemorrhage and resuscitation events in the animal study. Previous work developed blood loss metrics and quantified both time- and magnitude-sensitive hemorrhage metrics [
16]. The three previously developed metrics were BLVM, PEBL, and HemArea. These metrics were designed to quantify different aspects of blood loss and its physiological impact. In brief, BLVM and PEBL quantified, on a 0–1 scale, how much blood was lost compared to the maximum volume of blood lost or the estimated blood volume of a swine, respectively. HemArea quantified blood loss over time by taking the area under the curve of a blood loss versus time plot. These prediction metrics are further outlined below.
In brief, BLVM was a 0–1 scale blood loss metric aimed at quantifying volumetric blood loss. PEBL quantified the percent blood loss in a subject based on standard blood volumes per weight constants. HemArea quantified the hemorrhage-time magnitude by summing area slices of BLVM and linearizing the results to estimate this magnitude.
Due to the inclusion of the resuscitation portion of the swine study, these original metrics based on blood loss had to be reworked to accurately reflect the desired prediction. BLVM was calculated using the original BLVM equation, but at the point of maximum blood loss, any infused whole blood volume was added instead of subtracted. This resulted in the blood balance “recovering”, as the metric no longer decreased and increased from the maximum volume of blood loss. The BLVM was still valued between 0 and 1, as the whole blood infusion volume necessary to resuscitate the swine back to the target MAP was never equal to the volume of whole blood hemorrhaged. However, its value could extend above one in studies where excess whole blood was infused.
The PEBL prediction required little modification from the original development in Gonzalez et al. [
16]. The infusion of whole blood decreased the total blood loss of the swine by “giving back” the hemorrhaged whole blood, allowing the originally developed equation to be used. The subject’s estimated blood volume was changed from 80 mL/kg for canines to 60 mL/kg for swine [
29].
The HemArea prediction was previously calculated indirectly by taking slices of area under the BLVM curve, summing them, and performing a linear regression. This was originally done because the ML models performed poorly in predicting HemArea directly. Due to the addition of new features, a direct ML model was developed that could track HemArea, eliminating the need for the area under the curve process previously required.
All prediction metrics developed in this study were smoothed using a moving mean with a window size of 50 data points to reduce noise and generalize trends. Each prediction underwent a linear regression vs. ground truth calculation to determine any data shifts, if necessary.
2.3.2. Compensatory Reserve Measurement
The CRM underwent ML and DL developments in prior efforts [
18,
30]. These efforts define the compensatory reserve as the sum of all mechanisms of the body that act to protect against insufficient delivery of oxygen (DO
2). To measure the compensatory reserve, a study was conducted where subjects were sealed inside a lower body negative pressure (LBNP) chamber, capable of redistributing blood volume from the upper body to the lower extremities due to vacuum pressure. This created a hypovolemic condition in the upper extremities of the participant inside the LBNP chamber. Increasing the vacuum pressure of the chamber in steps over time brought participants to the point of hemodynamic decompensation (HDD). This LBNP model allowed for CRM to be defined as follows:
LBNP is the vacuum pressure inside the chamber at a given point in time, and LBNP
HDD is the vacuum pressure in the chamber where the patient has reached HDD. This formula places CRM on a 0–100% scale, where 0% is the point of HDD and 100% places a patient at a full reserve. For the DL CRM model, a convolutional neural network (CNN) was developed as described in R.W. Techentin et al. [
30]. Compared to ML models, a CNN does not require extracted features to be fed to it, as the convolutional layers with optimized hyperparameters, perform the feature extraction on their own. The CNN model was optimized with several hyperparameters in mind including but not limited to the number of layers, the number of filters, and kernel sizes. The optimization step led to the creation of a model with eight 1-D convolutional/pooling layers and other parameters provided in the publication from Techentin et al. The ML models developed to predict CRM consisted of a smaller pool of features than this study. The features were ranked using a ranking algorithm and then used as inputs to a bagged tree ML model for the prediction of CRM. The models in this study were pretrained using data that had ground truth CRM labels. Ground truth CRM could not to be defined, as the swine were not placed inside an LBNP chamber, and the point of HDD does not have a definition in this swine model.
2.3.3. Machine Learning Model Development
Previous work used bagged tree ML models, as they were the highest-performing models compared in the study [
18]. When retrained using the previously developed ML development framework (CRM-ML) for the current study, bagged tree ML models were compared to boosted tree ML models, and boosted trees performed better and were quicker to run. Prior work comparing boosted tree ML models to bagged tree ML models confirmed that boosted tree models provide better results in efficiency testing [
31], which is important for real-time implementation of developed algorithms. The boosted tree ML model was the model chosen for the development of all ML models. Four groups of swine were created for the acquisition of the features used as inputs to the ML models. To prevent features from biasing towards a specific swine, three of the four groups of swine were concatenated, while leaving one group out. This was done four times to incorporate all the swine data. Each group’s features were ranked using the MRMR criterion in the MATLAB (v2023a, MathWorks, Natick, MA, USA) Regression Learner Toolbox. All the ML models were selected to use the top 20 features for consistency between the ML models developed from the different groups of swine data. Additionally, the boosted tree ML models all had a minimum leaf size of eight, a learning rate of 0.1, and went through 30 learning cycles. Once the features were obtained and the boosted tree models were trained with their respective groups of swine, they were tested using a cross-validation technique known as leave one subject out (LOSO) to account for bias in the testing. A swine group, which consisted of three swine subjects, was left out of the training process to blind test the ML model developed from the three other swine groups. This process was repeated four times in total, with different swine group combinations being used in the training, and the final swine group was split into individual swine subjects to be blind tested on each model (
Figure 1). This resulted in a total of 12 LOSO processes being completed (four ML models, three blind tests each). The entire process was repeated for each prediction metric as well as on detrended data to identify if the ML models required overall trends in the signal for accurate predictions.
2.4. Machine Learning Model Analysis
After ML models were tuned for each application, their results were evaluated with blind subject holdouts (
n = 3 subjects) for each LOSO model. Predictions were compared against ground truth calculations, except for the CRM models, as defining the point of decompensation in an anesthetized animal could not accurately be determined. Instead, the models were compared against MAP to provide some level of comparison. Model goodness of fit (R-Squared) and root mean squared error (RMSE) were used to compare predictions to ground truth calculations, and the results were averaged across all blind subject holdouts and LOSO models to obtain a generalized performance score. This was performed for the baseline, hemorrhage, and whole blood resuscitation datasets. Only trends were evaluated for the resuscitation phase, as after LR fluid was infused, the fluid balance would likely shift compared to whole blood based on characterized hemodynamic trends during hemorrhage resuscitation [
32].
In addition, receiver operating characteristic curves (ROC) were calculated for the baseline and initial hemorrhage event to evaluate the performance of each model in accurately calculating hemorrhage status. The 95th and 5th percentiles for each model over this region were calculated using MATLAB to identify the range of possible values from each model. This range was subdivided into 100 threshold values for distinguishing hemorrhage and baseline regions, and true positive rate and false positive rate were calculated for each to generate the ROC curve. The area under the ROC (AUROC) was calculated for each model as well.
Models were further evaluated for the time taken to detect hemorrhage in each blind test subject. This was performed by identifying the 5th percentile for the baseline region of each dataset and determining at which time during the hemorrhage event this threshold value was reached for 100 consecutive readings, indicating a significant change from the baseline recording. This time was calculated for each predictive metric and MAP and averaged across all blind test subjects. Lastly, the effect of whole blood resuscitation on each metric was assessed by finding the average metric value in the 5 min prior to resuscitation compared to the final 5 min of the whole blood resuscitation event to quantify the change resulting from the resuscitation phase on each predictive metric as well as MAP.
Statistical analysis was performed to assess significant differences between each predictive model for AUROC, hemorrhage prediction time, and resuscitation responsiveness. This was done using Prism 10.3 (GraphPad, La Jolla, CA, USA). Normality was assessed by Shapiro-Wilk tests, and portions of each dataset were found to be non-normally distributed. As such, Friedman’s tests were used to compare datasets with the data from each swine paired across all metrics. A post-hoc Dunn’s test was used to compare CRM-DL, CRM-ML, BLVM, PEBL, HemArea, and MAP groups for AUROC, hemorrhage prediction time, and resuscitation responsiveness.
p-values below 0.05 were considered significant, and a table of differences between each statistical test is summarized in
Supplementary Tables S1–S3.
4. Discussion
In both military and civilian trauma settings, timely definitive surgical control of hemorrhage and appropriate fluid resuscitation to restore circulating blood volume remain the most effective treatments for hemorrhagic shock, while delays to definitive care are associated with worse outcomes [
33]. However, the appropriate application of these treatments requires the early and accurate identification of patients in impending hemorrhagic shock, which may not be readily apparent by traditional vital signs. Thus, reliance on traditional vital signs, such as blood pressure or heart rate, may delay the recognition of impending hemorrhagic shock and, consequently, delay definitive surgical and resuscitative treatment, leading to worse clinical outcomes. This could be especially important in the management of patients with internal bleeding, such as hemothorax and intraabdominal-hemorrhage, where ongoing blood loss may not always be readily apparent to clinicians. Additionally, traditional vital signs may fail to accurately stratify patients by acuity, leading to incorrect triaging of patients in mass casualty incidents when prioritizing patients for care or evacuation is critical. Accordingly, we propose that these novel metrics may be able to more accurately identify critically ill patients earlier than traditional vital signs, thus allowing the appropriate selection of patients for intervention and the earlier application of appropriate treatments, improving clinical outcomes compared to care that relies on traditional vital signs.
For this primary motivation, we evaluated different ML approaches to develop methods to track metrics that can measure hemorrhagic blood volume loss and resuscitation during a hemorrhage, as well as detect the occurrence of a hemorrhagic event earlier than traditional vital signs using a swine-controlled hemorrhage model. By tracking both hemorrhage and resuscitation, these advanced metrics could simultaneously assist in the accurate diagnosis and targeted treatment of hemorrhagic shock, leading to better overall patient outcomes. Two approaches were taken for this study—(i) evaluate previously developed blood loss-based metrics using these new swine hemorrhage datasets and (ii) develop ML models tuned for the swine hemorrhage and resuscitation datasets.
The previously developed ML feature selection framework was utilized to develop models for the prediction of BLVM, PEBL, and HemArea, which were validated using a LOSO cross-validation setup. This was critical due to the limited dataset size to conduct blind subject testing with the maximum amount of noise to ensure models were not overfitting noise and data artifacts. Overall, each metric showed success in tracking the onset of hemorrhage as well as the resuscitation that followed. BLVM and PEBL showed a higher goodness of fit metric between the ML prediction and calculated ground truth. HemArea performed worse by comparison to these metrics but still generally tracked the experimental phases. We previously calculated HemArea from BLVM; in this effort, we predicted HemArea [
16]. The reduced performance of this metric may be due to this difference, and, thus, calculating of HemArea from BLVM may be a more suitable prediction approach. Similarly, BLVM and PEBL provided earlier prediction time compared to MAP, a more traditional metric for tracking hemorrhage onset, but HemArea took slightly longer to detect hemorrhage. It is worth noting that all metrics provided much earlier hemorrhage detection compared to results from our prior canine hemorrhage model (i.e., 12 min in canines vs. 82 s in swine for PEBL). This is likely due to the splenectomy performed in swine prior to hemorrhage, which reduced physiological compensation and was not performed in canines. The canines likely had a larger compensatory reserve at the onset of hemorrhage, allowing for more effective masking of an impending hemorrhage event.
Further work will be required to increase the correlation of HemArea to its ground truth, likely by adding features that tend to correlate with HemArea based on the current top-ranking features for HemArea. One approach will be to calculate HemArea from BLVM predictions, as the BLVM predictive models had strong correlation scores to derive HemArea more accurately as we have previously done [
16]. In addition, spectral features have been used for the estimation of cardiovascular parameters [
34] and are currently not used in the ML model development for this work. Adding these features may further improve ML models for directly predicting HemArea. Future development of ML models to track blood loss and resuscitation would entail using non-invasively obtained physiological data, such as a photoplethysmography (PPG) waveform, as input to the feature extraction framework developed in previous studies. Due to the PPG signal having fewer or no overall trends like the arterial waveform data, the results from the metrics obtained from the detrended arterial waveform may provide insight into the feasibility of transferring this feature extraction framework into non-invasive methods. However, as observed in the results, models that used detrended data generally performed worse than models using non-detrended data. A range of limitations could have caused this shortcoming, including a lack of subject variability and model complexity. This study utilized 12 different swine subjects, where only 9 were used for training in the cross-validation setup. There is reason to believe that 9 subjects were not sufficient for the model to generalize the data enough to accurately track blind data. This issue has a straightforward solution—obtaining more datasets from multiple different swine subjects—but protocols to do so are costly and time-consuming.
We also evaluated the use of previously developed metrics for compensatory reserve measurement. CRM was trained using hundreds of human subjects experiencing simulated hemorrhage through LBNP exposure [
35]. The original CRM model uses a DL framework, but we have also previously evaluated the use of feature extraction and decision tree models to track CRM in subjects undergoing an LBNP procedure to simulate central hypovolemia [
18]. Due to the similarities in physiology between humans and swine, we took both the DL and ML models trained using human data and made blind predictions on swine datasets. This was also done because the point of decompensation, needed for training the CRM model, cannot be readily defined in an anesthetized animal. While CRM models showed trends for tracking hemorrhage onset, the results were more variable across each swine subject, with lower signal-to-noise.
Further steps to address the limitations of this study would include addressing model complexity. The boosted tree models were chosen for their efficiency, with the prospect that they would be able to operate in real time. However, to effectively track blood loss, DL may prove to be a more worthwhile approach. The boosted tree models require feature extraction to be performed on the data, which led to the development and ranking of the top 20 features used in this study. While those top 20 features originated from a 4200+ feature pool it is possible that an even greater feature pool exists, but this would require a higher degree of expertise to derive additional features. This may lead to the use of CNNs, such as the CRM DL model, for future development. The nature of performing convolutions on a piece of data allows the model to extract features itself, and when trained adequately, the model can rank the features necessary to track the data accurately. While CNN models can certainly reach levels of complexity that could make them computationally burdensome and unlikely to be used in real time, the power of even a single convolutional layer has the potential to extract the necessary features from a waveform to make accurate predictions.
Finally, there are some limitations in using swine as a research model for humans. While swine share many physiological similarities with humans, they are still a different species, leading to subtle but important differences. Since machine learning algorithms are something of a “black box”, it is not immediately clear what aspects of swine physiology the algorithms are using to determine blood pressure, and it is therefore unclear whether the algorithm could make the leap from one species to another. Still, large animal research provides an avenue for robustly demonstrating proof of concept for these machine learning approaches, such that even if the algorithms themselves cannot translate directly from one species to another, the methods used to develop those algorithms should be able to translate between species.