We first describe the patient data and data preprocessing used in machine learning and then detail the various LSTM models we developed in this study.
2.1. Data Acquisition and Preprocessing
With the institutional review board’s approval (Yale HIC#1604017609), five cancer patients aged 8 to 17 years old, who have received radiation treatments at Yale-New Haven Hospital between 2013 and 2021, were identified and their metabolic panel results were extracted from the EPIC electronic medical record (EMR) system per the HIPPA regulations. Nineteen metabolic indices were measured during each metabolic procedure for all the cancer patients: Glucose, BUN, Creatinine, BUN/Creatinine Ratio, Anion Gap, CO2, Chloride, Sodium, Potassium, Calcium, Total Protein, Albumin, Globulin, A/G Ratio, Aspartate Aminotransferase (AST), Alanine Aminotransferase (ALT), AST/ALT Ratio, Alkaline Phosphatase, and Total Bilirubin. A series of metabolic panel assessments was performed on each patient at multiple time points (ranging from 20 to 50 times) before, during, and after their cancer radiotherapy.
We selected 9 biomarkers/indices in the patients’ metabolic panel data with the minimal missing data rate as the time series dataset to establish the time-dependent discrete dynamical system. These indices include Glucose, BUN, Creatinine, Anion Gap, CO2, Chloride, Sodium, Potassium, and Calcium. The chosen data for the first patient, whom we name the original patient, was acquired on 42 consecutive days spanning the period from 10 May 2013 to 14 May 2014. We filled the few missing entries in the dataset by using the mean of the two nearest neighbors, i.e., if index is missing, we “assign it” the value of . The missing data involve the Creatinine index on 26 June 2013 and the Calcium index on 2 January 2014, 3 January 2014, and 4 January 2014, respectively. After making up the missing data points, we computed the mean and standard deviation of the entire 42 data points for every index. If an index value was larger than or less than , we replaced it by or , respectively, in the dataset to rule out the so-called outlier effect.
Since the clinical data were acquired at non-uniform time intervals, to derive an approximate discrete dynamical system to describe the time series using recurrent neural networks, we needed to generate more data with equal time intervals that included the initial 42 9-dimension/index clinical dataset. We used the first-order linear interpolation to to obtain 739 9-dimensional data vectors, in which adjacent data points are separated from each other by 0.5 days. The correlation coefficients of the 9 indices at the 739 data points are tabulated in
Table 1. It shows that no pair of indices in the dataset was highly correlated with a correlation coefficient ≥80%. Hence, we will built the discrete dynamical system using all indices as the input variables of the dynamical system. We chose the LSTM RNN as the framework to build the discrete dynamical system from the preprocessed dataset.
2.2. One-Step Predictive LSTM Model
An LSTM model provides a versatile recurrent neural network architecture with a great deal of flexibility to overcome the gradient vanishing and explosion problems inherent in the conventional recurrent neural networks, while capturing transient dynamics underlying the time series data [
19,
21,
22]. When designing its architecture for our applications, we paid close attention to the input and output data structure to make sure they could describe the underlying time-dependent dynamics since the input and output of an LSTM model do not need to be in the same data format or structure as those in the original metabolic dataset. The general structure of an LSTM cell is shown in
Figure 1, where
is the input to the recurrent neural network cell,
is the state in the LSTM to enhance the memory effect, and
is the output of the cell.
A generic LSTM cell is given by the following mathematical formula, where ⨀ indicates the Hadamard product.
In the LSTM cell, is called the forget gate and the input gate. We multiply the cell state by the forget gate to control the propagation of the previous information. Then, we add to update the current cell state . In this process, we combine the previous information with the new information to obtain the final current cell state. Then, we calculate and output from and the previous hidden state .
In our design of the model, we used a stacked LSTM architecture coupled with a fully connected neural network for each cell.
Figure 2 is a schematic portrait of the stacked LSTM architecture that we adopted in our LSTM model.
The pair of LSTM cells stacked in series and connected to a fully connected output neural network was aimed to achieve a better memory effect. In fact, we can stack more LSTM cells intercalated with fully connected neural networks to form a more complex composite LSTM cell, in which the final output layer taking output as the input is a fully connected neural network.
For time series data
, where
represents the
i-th 9-dimensional data vector from the original dataset, and a given time step
, we concatenate the input 9-dimensional metabolic index data points into a large vector
=
to define our input to the stacked LSTM cell and define the output of the LSTM cell as
=
. The data structure is depicted in
Figure 3. The number of total new input–output data pairs
in the new data structure is
, given as follows
We designed the LSTM with time step T, input vector and output . By applying the LSTM model through the concatenated dataset as an RNN, we predict the next output using the previous T input vectors from the original dataset. We thus name this the one-step predictive LSTM model.
The loss function in the model is defined by
where
M is the number of output data vectors in the batch of data.
For the 739 9-dimensional data points in the original time series, we first divided them into the training set and test set in a 9:1 ratio sequentially. With this division, the number of data points in the training set and test set was 665 and 74, respectively. For the data sets, we carried out the zero-mean standardization in each index. Namely, for the
j-th index, we computed the mean
and unbiased standard deviation
of the 665 data points in the training set. Then, we standardized the training data and test data as follows:
where
i represents the
i-th data point and
j represents the vector’s
j-th entry. Then, we used the standardized training data for the model. The number of input and output pairs of the LSTM model is
, and the data pairs are given by
. The number of training data pairs is
given by
to
. The number of the data pairs used for the test is 74 ranging from
to
.
First, we set a loss tolerance
in the training of the neural network. If the training loss was lower than
, we saved the current model parameters and applied the model to the test data. After a prescribed number of epochs, we chose the model that gave the best outcome over the test data as our chosen model. Through an extensive experimentation, we adopted the following hyperparameters for the LSTM model as tabulated in
Table 2.
The time step is one of the key hyperparameters in the LSTM model. We trained the model with respect to time step
. The final value of the step is determined by the one that gives the best performance. To evaluate the performance of the model, we define the following metrics:
Note that the LSTM is a versatile recurrent neural network (RNN); there is a great deal of flexibility in its design, especially in its input and output data structures. We showcase some other LSTM designs here and compare their performance with the previous one.