1. Introduction
General anesthesia for surgery can be divided into three phases (i.e., induction, maintenance and emergence from anesthesia). Especially in the anesthesia-induction phase, the changes in blood pressure is rapid and can range variously from hypotension to hypertension. This usually caused not only by the administration of intravenous anesthetic agents (propofol and remifentanil), volatile anesthetic agents, neuromuscular blocking agents but also by airway manipulation to intubate patient’s trachea for the mechanical ventilation. It has been reported that hypotension, even a short duration of mean arterial pressure less than 55 mmHg, is associated with acute kidney injury and myocardial injury [
1]. On the other hand, hypertension, if left untreated, increases risks of bleeding, cerebrovascular events, and myocardial infarction postoperatively [
2]. If blood pressure can be accurately predicted, anesthesiologist may proactively search for possible causes to prevent severe hemodynamic changes. This may enable early intervention such as adjustment of anesthetic agents, fluids, and vasoactive drugs. Thus, patient may not experience harmful consequences caused by hypotension or hypertension as such events can be prevented ahead. However, intraoperative accurate blood pressure predictions is nearly impossible because of complex mechanisms that cause blood pressure changes, and at least similar predictions require much experience and knowledge of factors that affect blood pressure fluctuations during surgery.
The volume of modern anesthesia data has increased with the use of electronic medical records. It is difficult for anesthesiologist to use these data to judge patients hemodynamic status during operation, so it will be helpful if there is a tool to support clinical decision making for anesthesiologist based on these data. Machine learning technique is one of the tools for supporting clinical decision making, as it is known to be effective in learning arbitrary patterns of data. There have been few studies on hypotension prediction using machine learning models. One study used various machine learning models based on existing information in the electronic health records to predict hypotension within 10 min of anesthesia induction [
3]. It should be noted that it does not predict when exactly the hypotension event will occur, but predicts whether the event will occur within 10 min; so it is not applicable to real-time service because the event may occur right after 1–2 s. There was another study that predicts the probability of developing hypotension 15 min before its actual occurrence by applying a machine learning model to the waveform of invasive arterial blood pressure [
4]. Although it predicts the potential hypotension event before its actual occurrence, it is not suitable to real-time prediction because it assumes that the hypotension event does not occur again within 20 min. It is obvious that we do not know when future events will occur, so it is impossible to use it for real-time prediction based on such assumption. As far as we know, most previous studies basically aimed at a classification problem (i.e., predict the potential event), but there was no study for a real-time regression problem (i.e., predict the actual blood pressure).
Machine learning models have been widely adopted for the regression problem. Support vector regression (SVR) [
5], which is based on the same principles of the support vector machine (SVM) [
6], has been used to predict various real-numbers (e.g., stock price, demand/supply of pulpwood) [
7,
8]. Random forest (RF) [
9] is one of ensemble learning methods. It is known to keep low bias of decision trees and avoid overfitting by controlled variance. The RF can be used for either classification or regression [
10]. These traditional machine learning models have shown quite successful results, but they suffer from a common limitation; they strongly depend on a hand-crafted feature set that requires much effort of experts.
Deep learning is one of solutions for the limitation as it automatically extracts arbitrary patterns (i.e., features) beneath the observed data. The deep learning is rooted from artificial neural network (ANN) [
11], and it can be used for the regression problem by adopting a particular loss function (e.g., mean squared error). The deep learning is theoretically capable of modeling all non-linear patterns by stacking many layers. As it is discovered that stacking too many layers might cause worse outcome (e.g., low accuracy, high error) due to gradient vanishing [
12], there have been several approaches to effectively building deeper layers: rectified linear unit (ReLU) [
13], residual connection [
14], shortcut connection [
15], Inception module [
16], and pretraining concept [
17,
18]. Thanks to these studies, convolutional neural network (CNN), which mainly consists of convolutional layers and pooling layers, was widely used for detection and recognition problem [
19,
20,
21]. The convolutional layer extracts latent local features, and the pooling layer picks the most meaningful feature among the extracted local features; the CNN effectively captures local patterns and makes a decision by summarizing the most meaningful patterns. On the other hand, recurrent neural network (RNN) [
22] allows a layer to have a recursive connection to itself, so that the RNN effectively captures sequential patterns by memorizing previous inputs. Such property makes the RNN to be widely used for machine translation [
23,
24] and real-time prediction for sequences [
25,
26].
In this paper, we aim at real-time prediction of blood pressure between the induction of anesthesia and the beginning of operation. This is basically a problem of real-time regression for blood pressure. Please note that we do not predict the current blood pressure, but the blood pressure of future (e.g., three minutes later). We adopt the RNN to capture arbitrary features from the sequential vital signs, and it makes prediction based on the features. As far as we know, this is the first study of applying the RNN to the real-time prediction of future blood pressure. We believe that this might be helpful for preventing some patients from falling into a critical condition.
This paper is structured as follows.
Section 2 describes the characteristics of target data (e.g., vital signs) and how we preprocess the data. It also provides details of our proposed approach as well as the definition of input and output.
Section 3 demonstrates the performance of our approach by experimental results, and
Section 4 interprets some sample results and discuss about additional experiments with different settings. Finally,
Section 5 summarizes and concludes this paper.
2. Materials and Methods
This paper aims at solving a new problem that predicts future blood pressures in real time. We basically follow Data Science (DS) methodology from problem to approach. As mentioned so far, real-time prediction of blood pressures will help to prevent patients from falling into critical condition. Practically, we follow Cross Industry Standard Process for Data Mining (CRISP-DM) methodology that is an iterative process of several steps such as business understanding, data understanding, data preparation, modeling, evaluation, and deployment. We collect and examine the data of vital signs, and preprocess the data to feed them to train our proposed model as shown in
Figure 1. The model is designed to incorporate underlying sequential patterns of the vital signs, and evaluated by averaged absolute errors of 10-fold cross-validation. At the running phase, future blood pressures will be predicted given the vital signs for previous few minutes (e.g., 3 min). As this paper is a feasibility study, it is not ready for deployment; it must be carefully deployed because this is a life and death situation. We will keep collecting more data and improving the model for deployment.
This retrospective study was approved by the institutional review board of Soonchunhyang University Bucheon hospital (approval No. 2019-08-016). We collect data from three operation rooms of Soonchunhyang University Bucheon Hospital, where the operations are performed between 29 October 2018 and 18 January 2019. The data are obtained from various devices using Vital Recorder: B × 50 (patient monitor), Solar 8000M (patient monitor), Datex–Ohmeda (anesthesia machine), Primus (anesthesia machine), BIS (brain monitor) and Orchestra (infusion pump), which results in a
K-dimensional real-valued vector. As the vector has few missing values, we employ two strategies: (1) replacing the missing values with the mean of surrounding values, and (2) replacing the missing values with the lastly observed previous value. We apply the first strategy to the vital signs obtainable from the BIS (e.g., signal quality index (SQI)), and make use of the second strategy to other values. In this paper, the vector dimension
K is 27, and the detail of the vital signs is described in
Table 1. Each dimension of the vector has a distinct sampling rate; for example, BIS/SQI and BIS/BIS are collected every second, whereas TV and MV are collected every six seconds. To address this issue, we assume that all dimensions have the common sampling rate (i.e., three seconds). For example, the blood pressure values (e.g., mean blood pressure (MBP), systolic blood pressure (SBP), diastolic blood pressure (DBP)) are obtained every 1∼3 min (mostly every minute), so these values are assumed to be fixed until their new values come in. That is, if the MBP value is sampled every minute, then the MBP values for every 20 timesteps will be the same.
For each
r-th surgery operation, we collect
K-dimensional vectors for
seconds, where
and
R denotes the number of operations. As we assume that all vital signs are sampled every three seconds, the total data becomes
tensor (i.e.,
R sequence of
matrices). Please note that the
for different operation will be different because different operations probably have different operation time. For our collected data, the number of patients (i.e., the number of operations)
R is 102.The statistics of the collected data are summarized in
Table 2.
Figure 2 depicts a sample sequence of the collected data. Please note that the three blood pressure values are fixed for 20 timesteps (i.e., one minute) while other values change.
We transform the sequence of
matrices into a shape for the real-time sequential prediction of future blood pressure as follows. First, for each
t-th timestep where
, we define the sequence of vital signs excluding blood pressures for previous
W timesteps as an input
; in other words, the input
is a
matrix for the timesteps between [
,
t]. We also add the timestep
t into the
, so finally the
becomes a
matrix. Second, we define a normalized blood pressure at the
t-th timestep as a supplementary input
. If the blood pressure value is 125, then it is divided by 250 to be normalized (e.g., 125/250). We take only the latest observed blood pressure, but not the blood pressure values for
W timesteps because the inconsistent sampling rate (e.g., every minute) of the blood pressure may harm the results of the RNN. Third, we define the blood pressure value of the timestep
as an output
. It is important that the output
is not the blood pressure at the timestep
t, but the future blood pressure at the timestep
. Through the steps above, for each timestep
t, we generate a triple of the input
, the supplementary input
, and the output
. Assuming that
t = 100,
W = 60, and
G = 20. The input
will be a
matrix and the
will be a real-number of the normalized blood pressure at the 100-th timestep. The output
will be a blood pressure value at the 120-th timestep. This can be interpreted that it predicts the blood pressure of one minute later (i.e., after 20 timesteps) given the lastly observed vital signs for three minutes (i.e., 60 timesteps). As we generate the triple (
,
,
) for every timestep, the total number of triples for the
r-th operation will be
. We conduct the above transformation process to the three blood pressures (e.g., MBP, DBP, and SBP) independently, and got the triples for each of them. The transformation process is summarized in
Figure 3. In short, the input consists of
W vital signs
and a current blood pressure
, while the output is a future blood pressure
after
G timesteps.
We observe that different operations exhibit different sequential patterns (e.g., different aspect of heart-rate changes). To incorporate such diversity of sequential patterns, we design an RNN model followed by fully connected layers as shown in
Figure 4. Given the input
for
r-th operation, the
W vital vectors are sequentially injected to the RNN. Please note that our RNN has bidirectional and hierarchical structure. The bidirectional RNN consists of a forward RNN and a backward RNN, where the forward RNN and the backward RNN can capture forward patterns and backward patterns, respectively. There might be sequential patterns of a forward direction and a backward direction, so we take the bidirectional RNN to incorporate such patterns.
Meanwhile, both forward and backward RNN are hierarchical as they have two stacked layers. The first RNN layer may capture sequential correlations between different vital signs (e.g., a propofol rate and the heart-rate), and the second RNN layer catches high-level sequential correlations between the correlations found at the first layer. Thanks to the bidirectional and hierarchical structure, the RNN will memorize high-level sequential patterns in both directions. The forward RNN yields a -dimensional summary vector , and the backward RNN also gives a -dimensional summary . These two summary vectors are then concatenated and the supplementary input comes into the vector, resulting in a -dimensional vector. The concatenated vector is passed to the fully connected layers that are supposed to find some correlations between the and . For example, when the RNN layers may capture ‘increasing trend of heart rates’ and ‘fluctuating ETCO2’ patterns, the fully connected layers may find how positive or negative correlation they have. Finally, given the -dimensional vector generated by the second fully connected layer, the output layer predicts the future blood pressure.
For the cell of the RNN layers, we adopt the Gated Recurrent Unit (GRU) [
27] that is one of the most widely used RNN cells. The most important aspect of the RNN cell is that it remembers previously observed information. Although the RNN cell must be capable of preserving every previous information theoretically, it loses long-term information practically. The GRU is one of solutions to settle such issue by two types of gates (i.e., an update gate and a reset gate). These two types of gates help to preserve important long-term information while discarding unnecessary information. Thanks to the GRU cells, the bidirectional RNN layers give two vectors (e.g.,
and
) that capture important sequential patterns in both directions. For the two fully connected layers, we adopt the rectified linear unit (ReLU) [
13] as an activation function as it is known to prevent from the gradient vanishing problem. For the output layer, we take mean squared error (MSE), which is widely used for regression, as a loss function.
3. Results
We set W = 60 and G = 60, which implies that we predict the blood pressure of three minutes later, given the observed vital signs for latest three minutes. Please note that we use the vital data only obtained between the induction of anesthesia and the beginning of operation; we do not employ any other information (e.g., age, sex, base blood pressures, ASA). The total number of transformed data is 26,887. Each dimension of the transformed data is normalized except for the timestep value. The normalization, of course, is done with only training data. We take 10-fold cross-validation and compute mean absolute error (MAE). We conduct three independent experiments: SBP prediction, MBP prediction, and DBP prediction. All experiments are performed using a computer with eight Central Processing Units (CPU) of i7-7700 3.6 GHz and two NVIDIA GeForce 1080 Ti. The proposed model is implemented with Python3 language with Google TensorFlow packages.
The training recipe and parameter setting are as follows. The dimensions of RNN layers
and
are equally 15, and the dimensions of the fully connected (FC) layers
and
are 100 and 50, respectively. We applied the drop out [
28] with a keep probability 0.1 to the RNN layers, and the decov [
29] with a weight 0.1 to the FC layers. Both the drop out and the decov are known to have a regularization effect, which prevents from overfitting. In terms of the parameter initialization, the weight matrices of the FC layers are initialized using He initialization [
30], and the biases are initialized as zero. The weight matrices of the RNN layers are initialized using Xavier initialization [
31], and the initial bias value is one. We use Adam optimizer [
32] with an initial learning rate 0.001 to train the model parameters, and the number of epochs is 60. For training phase, it computes a predicted blood pressure by feed-forward propagation; the RNN layers generate two vectors given a input, and the fully connected layers take the concatenation of the two vectors as an input and generate an output. It computes a cost (error) by comparing the predicted blood pressure and a true blood pressure, and all weights and bias values are updated via back propagation algorithm. For each epoch, the feed-forward and back propagation are conducted throughout all data with a mini-batch as a unit. In this paper, we set the size of mini-batch as 100.
Table 3 summarizes the mean and standard deviation of the absolute errors obtained from the three predictions. Small mean and standard deviation mean that it predicts the blood pressure accurately. Among the three predictions, the DBP prediction is the most accurate while the SBP prediction exhibits the worst results. The
Figure 5,
Figure 6 and
Figure 7 depict histograms of errors, where horizontal axis represents error bins; for example, a bin [1–2) represents the range
where
e indicates an error. The three figures seem to have a form of Gaussian distribution, and they generally follow the trend of the true blood pressures. For instance, in
Figure 6, the peak of distribution is located around the interval [0–1), which implies that the predicted MBP values are mostly correct compared to the true MBP values. However, the shapes of three figures are a bit left skewed, so the overall mean is between 8.2 mmHg and 11.1 mmHg while the standard deviation is between 8.7 mmHg and 12.7 mmHg.
Figure 8 shows Bland-Altman diagrams of the three blood pressures. The diagrams imply that errors tend to grow when the average of a predicted blood pressure and a true blood pressure is high. This can be interpreted that it is hard to correctly predict the true blood pressure when the average is abnormally high because such cases were barely seen in the data.
4. Discussion
We investigated whether RNN could predict future blood pressure (e.g., 3 min ahead) during anesthesia-induction period. We found that our model could predict 3-min ahead blood pressure with absolute error around 10 mmHg for each SBP, DBP, and MBP. Although this error seems to be large for helping clinicians to use our model as decision support tool in the hemodynamic management during anesthesia for now, we suggest it is feasible for RNN to predict future blood pressure using only features those obtained from various anesthesia monitors, ventilator and drug infusion pump in relatively short periods.
We examine the plots of predicted blood pressure and true blood pressure. To do so, we trained the model with 90% of shuffled data, and the remaining data is used for examination.
Figure 9 shows three plots of SBP prediction, where the two upper examples are relatively well predicted cases and the bottom example shows a poorly predicted case. Please note that the model gives its first prediction at the 120-th timestep because it sees the sequential data of three minutes (i.e., 60 timesteps) and predicts three minutes later (i.e., 60 timesteps). Because the SBP is sampled every minute, the plot of true SBP looks like stairs. Generally speaking, the three figures in
Figure 9 show that the model well predicts the trend of future blood pressure; it captures when the SBP will arise, keep or fall. Interestingly, as shown in the second figure of
Figure 9, the predicted SBP fluctuates as the true SBP even though it predicts the SBP of three minutes after. On the other hand, in the bottom figure, the predicted SBP follows the trend of true SBP but there is a steady gap between them. We believe that such gap will be reduced if we collect more data to incorporate various patterns of blood pressure.
Among the hemodynamic changes occurring during surgery, hypotension is known to be frequent and has been reported to cause adverse outcomes after surgery [
1]. Definition of intraoperative hypotension varies among investigators which ranges from MBP of 55 mmHg to 65 mmHg. In [
33], it was revealed that MBP less than 60 mmHg for 11 to 20 min and MBP less than 55 mmHg for more than 10 min are associated with acute kidney injury. The mean absolute error of MBP predicted by our proposed model was 9 mmHg, which may not helpful to clinicians in some critical situations. For example, if the actual MBP is 58 mmHg, then MBP predicted by our model may range from 49 mmHg to 67 mmHg. Such variation of the predicted MBP might cause two opposite ways of management. If the predicted MBP is 49 mmHg, one will explore possible causes for hypotension, whereas one just observes blood pressures and do nothing if the predicted MBP is 67 mmHg. Of course, there can be another case that the predicted MBP is helpful. Assuming that actual MBP is around 75 mmHg, and predicted MBP may range from 66 mmHg to 84 mmHg. This is generally not harmful to most surgical patients. The Association for the Advancement of Medical Instrumentation (AAMI) established standards for the validation of automatic arterial pressure monitoring. It was defined as acceptable if error (e.g., mean absolute error) is not greater than 5 mmHg and standard deviation of errors is not greater than 8 mmHg for SAP and DAP [
34]. In this regard, as the mean absolute errors of our model for SBP and MBP were 11 mmHg and 9 mmHg, respectively, which does not meet the AAMI standards. However, there is no consensus on the accuracy of clinically acceptable blood pressure because the AAMI standards are for the approving clinical validation of new automated blood pressure devices.
One may argue that there might be better parameter settings or better structure of the model. The training recipe and parameter setting used in this paper is obtained via a grid-searching. We varied the number of RNN layers and fully connected layers, and tried various dimensions. A part of the grid-searching result is shown in
Table 4, where the relative change of MAE is computed using the best MAE 11.056 of
Table 3; the relative change is (current MAE − 11.056)/11.056 × 100, so greater value means worse result. It seems that the bidirectional RNNs generally work better than the unidirectional RNNs. The FC dimensions represent
and
; [100, 50] means
and
are 100 and 50, respectively, and [100] implies that it uses a single FC layer with
. It seems that using two FC layers is much better than using a single FC layer, and the regularization methods (e.g., drop out, decov) prevent the model from overfitting.
This study aims at a real-time prediction of blood pressure, so one may ask ‘Does this model really work in real time?,’ because our model has a quite complex structure (e.g., a composite model of RNN and fully connected layers). We found that the average elapsed time for prediction of a batch of 100 unseen data is about 26.56 milliseconds. As our model must give a prediction result every three seconds, it is definitely capable of the real-time prediction.
Although our model exhibits its potential as a real-time predictor of future blood pressure, there is a room for improvement, especially about the error. About the SBP prediction, its mean absolute error 11.056 indicates that we still have a lot to do. The main reason for this is that our data is obtained from only 102 operations, which is not much enough for incorporating diverse patterns of operations. Thus, this study can be a first step that proves the feasibility of the real-time prediction of future blood pressure. We believe that our model will achieve further improvement as we will keep collecting more data. Another minor limitation of our work is that it gives its first result after some timesteps (e.g., 120 timesteps), which can be addressed if we collect the vital signs before the induction of anesthesia.