2.1. Data Preparation
Datasets were collected from nine operating rooms at Soonchunhyang University Bucheon Hospital, Bucheon city, Republic of Korea, between 29 October 2018, and 30 September 2019. The data were collected from patients who had received total intravenous anesthesia (TIVA), and were at least 18 years old. The datasets do not contain any direct patient identifiers and additional approval from our institutional review board was obtained for this retrospective study (No. 2019-10-024-002).
The datasets consisted of electronic medical records (EMRs) and vital signs. The EMR data were manually collected and included age, sex, height, weight, body mass index (BMI), American Society of Anesthesiologists (ASA) physical status grade, and types of coexisting diseases. Vital signs were collected automatically between the time of entering the operating room and the beginning of operation. Vital signs were collected using a Vital Recorder [
14] connected to several devices in operating rooms: Bx50 (patient monitor), Solar 8000M (patient monitor), Datex-Ohmeda (anesthesia machine), Primus (anesthesia machine), BIS (brain monitor) and Orchestra (infusion pump). Vital signs included 35 variables such as heart rate (HR), systolic blood pressure (SBP), tidal volume (TV), carbon dioxide (CO2), end-tidal CO2 partial pressure (ETCO2), positive endexpiratory pressure (PEEP), bispectral index (BIS), and so forth.
Figure 1 shows a sample of the two types of data; the EMR is collected once for each operation, whereas the vital signs are supposed to be gathered every second continuously.
Figure 1 shows a sample of the two types of data; the EMR data were collected once for each operation, whereas vital signs were ideally gathered every second continuously. Details of the two types of data are summarized in
Table 1, where the baseline vital values are collected only before anesthesia induction.
The datasets required preprocessing because of missing values (e.g., signal quality index (SQI) of BIS), inconsistent sampling rates, outliers (e.g., height), and incorrect time-logs of intubation. Missing values from the BIS were replaced by the mean of the surrounding values. Other missing values were replaced by the last observed values. Different vital signs had inconsistent sampling rates; for example, generally noninvasive blood pressure was recorded every minute, whereas the respiratory rate was recorded every 3 s. To handle this, as the fastest sampling rate was 1 s, we assume that all vital signs had the same sampling rate of 1 s. If a sampling rate of a particular variable was 3 s, then the corresponding variable values were copied twice.
After the missing values and inconsistent sampling rates were addressed, it was necessary to filter some data as depicted in
Figure 2. Of the original 1931 patients collected between 29 October2018, and 30 September 2019, both vital signs and EMR were available for only 1093. Outlier values (e.g., height of 1554.2 cm, remifentanil volume of 538,976.25
g) as determined by a predefined range of variables were removed. The patients without BIS, neuromuscular transmission train-of-four count (NMT_TOF_CNT), or CO2 were filtered out. The patients without any intubation annotation (i.e., “EVENT”) were also removed because our purpose was to predict post-intubation tachycardia. Furthermore, because the intubation annotation may have been wrong for several reasons, we designed Algorithm 1 to correct wrong annotations. The algorithm also found patient data that could not be corrected, and 271 were discarded. Based on the corrected annotations of intubation, we filtered 64 patients data of two cases: (1) intubation only 10 s after anesthesia induction, and (2) operation begins within 10 min after intubation.
The major purpose of Algorithm 1 was to correct annotations of intubation. The time-step (or point) of tracheal intubation was annotated manually by the perioperative nurse (or anesthesiologist) in the operating room. Therefore, the annotated points may have been incorrect for several reasons (e.g., a mistake or delay). For example, the EVENT column of
Figure 1b has the incorrect annotation “intubation” at 00:08:02, because the CO2 value must remain at zero during tracheal intubation. Our purpose was to predict post-intubation tachycardia, so it was critical to find the correct intubation points. Algorithm 1 describes four steps of finding the start and end points of tracheal intubation. More formally, in
Figure 3, the algorithm identifies the beginning point of intubation
and the ending point of intubation
.
The first step of Algorithm 1 was to find candidate points of tracheal intubation using three conditions, as shown in line 4. The first condition checked if the bispectral index value (BIS) was lower than 70. When patients entered the operating room, TIVA using propofol and remifentanil via a target controlled infusion pump (Orchestra Base Primea with module DPS; Fresenius Kabi AG, Germany) was administered. After anesthesia, patients gradually lost consciousness as BIS decreased. In
Figure 1b, this condition is met at 00:07:01. The second condition checks muscle state; neuromuscular transmission train of four count (NMT_TOF_CNT) is 0 or 1 when the muscle is relaxed. The rocuronium was injected intravenously, and we electrically stimulated the nerves dominating the patient’s thumb and measured their movements to determine if the muscles were relaxed. The last condition checks if the carbon dioxide (CO2) values for 10 s remain at zero. After receiving the muscle relaxant, the patients could not breathe on their own, so CO2 values were zero. In
Figure 1b, the annotated event “intubation” violates this condition, meaning that the annotation must be incorrect. By the above three conditions, the first step finds a candidate set of beginning points of intubation. For example, the candidate set included the point 00:07:01 in
Figure 1b. If no candidate was found, then the corresponding data were discarded, as described in line 12. The lines between 14 and 18 were to take only the first candidate point if some candidate points are consecutive; for example, given four candidate points [10, 20, 30, 50], the two points 20 and 30 were to be filtered out, resulting in two remaining candidate points [10, 50].
The second step of the algorithm was to find a manually annotated intubation point in the operating room. Unless there was only one annotated intubation point, the corresponding data were discarded. In other words, if there were multiple annotated points or no points at all, then the corresponding data were regarded as incorrect. Such wrong data were discarded even multiple candidate points
were obtained from step 1, which means that the
and
were required to find the intubation points
and
in the following steps. The third step was to find the beginning point
. The
was found by comparing
and multiple candidate points
. By comparing the difference of
with each element of
, the nearest element of
to
became
. For example, in
Figure 1b, assume that
and
are [00:07:01, 00:12:04] and [00:08:02], respectively. The
will be 00:07:01 because it is the nearest candidate point to 00:08:02. The last step was to find the ending point
of intubation. It assumes that the post-intubation CO2 emission was greater than the pre-intubation CO2 emission. For example, in
Figure 1b, the
will be 2.4 at 00:07:00. The
will be 00:08:01 because it is the earliest point with a greater CO2 value than
. If the
was not found by this step, then the corresponding data were to be discarded; but we did not observe such a case.
Algorithm 1: Finding the start and end points of tracheal intubation for a target patient. |
|
2.2. Feature Selection
Our purpose was to predict a post-intubation tachycardia, for which the input feature was obtained from the “input area” in
Figure 3, and the output was 1 (or true) if the tachycardia occurred in the “output area”; otherwise it was 0 (or false). In this paper, tachycardia is defined as a HR greater than 100 bpm for more than 1 min. The input feature is defined using two types of data (e.g., EMR and vital signs). First, a 24-dimensional feature vector
was obtained from EMR, including age, sex, BMI, and so forth; details of the 24-dimensional feature can be found in the first two rows of
Table A1 in the Appendix. For example, if the patient was female and had a cardiovascular disease (e.g., hypertension), then
was […,
= 1, …,
= 1, …]. Second, a 129-dimensional feature vector
was obtained from the vital signs. The
contains mainly min, max, mean, and standard deviation (sd) of each vital sign; details can be found in
Table A1 of the Appendix. For example, if systolic blood pressure (SBP) of the baseline was 130 and the minimum value of respiratory rate (RR) was 4.9, then
was […,
= 130, …,
= 4.9, …]. The two feature vectors
and
were merged into a 153-dimensional feature vector
. Every numerical element of the
was normalized between 0 and 1.
Feature selection finds a promising set of features that has a potential to contribute performance improvement. Feature selection is known to shorten training time, reduce overfitting, and improve accuracy [
15]. It has produced useful results for ventricular tachycardia and arrhythmia in previous studies [
12,
13]. We compared feature selection with three different measurements: recursive feature elimination (RFE), Gini index (GI), and a univariate statistical test (UST) using mutual information. The RFE and GI-based feature selection were performed with a random forest (RF) classifier using scikit-learn [
16]. By grid searching, RFE, GI, and UST-based feature selection resulted in 10, 15, and 15 promising feature sets, respectively. Details of the selected features are listed in
Table 2.
In addition to feature selection with the three measurements, we also prepared two additional feature sets: a
p-value based feature set and a manually designed feature set. In
Table 2, the “
P-based” and “Hand-crafted” feature sets indicate the two feature sets, respectively. The
P-based feature set was defined by statistical clues; we conducted a t-test or Wilcox test for continuous variables (e.g., height, tidal volume), and chi-squared or Fisher tests for categorical variables (e.g., sex, ephedrine). We observed significant statistical differences in baseline heart rate, noninvasive heart rate, and remifentanil values, as shown in
Table 3. However, the hand-crafted feature set was carefully designed through an exploratory data analysis process. That is, by manually examining a group of patients, we picked eight promising features, including sex, HR_mean, remifentanil_CE_max, remifentanil_CP_max, remifentanil_VOL_max, remifentanil_VOL_mean, BIS_min, and MV_max.
The five feature sets were compared using various machine learning models: random forest (RF) [
17], logistic regression (LR) [
18], naïve Bayes classifiers (NB) [
19], support vector machine (SVM) [
20], extreme gradient boosting (XGB) [
21], and artificial neural networks (ANN) [
22]. The ANN had two hidden layers, in which the first and second layers had 15 and 20 nodes, respectively.