Figure 1.
High-level view of the content of this article. We synthesized time series with certain delay lengths, frequencies, and sequence lengths to create datasets where only one of these characteristics varies. For each dataset, we trained a long short-term memory (LSTM) network, convolutional neural network (CNN), and transformer architecture to evaluate the performance concerning the given characteristic.
Figure 1.
High-level view of the content of this article. We synthesized time series with certain delay lengths, frequencies, and sequence lengths to create datasets where only one of these characteristics varies. For each dataset, we trained a long short-term memory (LSTM) network, convolutional neural network (CNN), and transformer architecture to evaluate the performance concerning the given characteristic.
Figure 2.
Conceptional comparison of different deep learning paradigms to model time dependencies across input samples: LSTM (
left), CNNs (
middle), and attention (
right). Figure adjusted from [
21].
Figure 2.
Conceptional comparison of different deep learning paradigms to model time dependencies across input samples: LSTM (
left), CNNs (
middle), and attention (
right). Figure adjusted from [
21].
Figure 3.
Encoder-only transformer architecture.
Figure 3.
Encoder-only transformer architecture.
Figure 4.
Sample from synthetic dataset with , , , , , and .
Figure 4.
Sample from synthetic dataset with , , , , , and .
Figure 5.
Distribution of the amplitudes in training and test datasets.
Figure 5.
Distribution of the amplitudes in training and test datasets.
Figure 6.
Training times during hyperparameter optimization in Experiment 1.
Figure 6.
Training times during hyperparameter optimization in Experiment 1.
Figure 7.
Single-batch forward pass averaged over five test iterations.
Figure 7.
Single-batch forward pass averaged over five test iterations.
Figure 8.
Validation loss curve of the best runs for each model architecture in Experiment 1.
Figure 8.
Validation loss curve of the best runs for each model architecture in Experiment 1.
Figure 9.
Validation MAE of each training session as a boxplot. The dashed line indicates the performance when taking the mean of the input as the output prediction.
Figure 9.
Validation MAE of each training session as a boxplot. The dashed line indicates the performance when taking the mean of the input as the output prediction.
Figure 10.
Confidence intervals for different delay lengths. Delay length , marked by a brown rectangle, was not present during training and validation time.
Figure 10.
Confidence intervals for different delay lengths. Delay length , marked by a brown rectangle, was not present during training and validation time.
Figure 11.
Boxplot of distributions of MAEs of single samples, separated by their delay length. Delay length , marked by a brown rectangle, was not present during training and validation time.
Figure 11.
Boxplot of distributions of MAEs of single samples, separated by their delay length. Delay length , marked by a brown rectangle, was not present during training and validation time.
Figure 12.
Boxplot of MAEs of samples with − −, separated by their values for .
Figure 12.
Boxplot of MAEs of samples with − −, separated by their values for .
Figure 13.
Heatmap of average MAEs of each model separated by amplitude.
Figure 13.
Heatmap of average MAEs of each model separated by amplitude.
Figure 14.
Critical difference of the models when separating the dataset by their delay lengths.
Figure 14.
Critical difference of the models when separating the dataset by their delay lengths.
Figure 15.
Training times during hyperparameter optimization in Experiment 2.
Figure 15.
Training times during hyperparameter optimization in Experiment 2.
Figure 16.
Validation loss of all models. A dashed horizontal line indicates the performance of the mean estimator.
Figure 16.
Validation loss of all models. A dashed horizontal line indicates the performance of the mean estimator.
Figure 17.
Validation loss curve of the best run for each model architecture in Experiment 2.
Figure 17.
Validation loss curve of the best run for each model architecture in Experiment 2.
Figure 18.
Inference time of a single forward pass with a batch size of 64 averaged over five runs on the complete test dataset.
Figure 18.
Inference time of a single forward pass with a batch size of 64 averaged over five runs on the complete test dataset.
Figure 19.
Boxplot of the impact of the noise level on the prediction performance.
Figure 19.
Boxplot of the impact of the noise level on the prediction performance.
Figure 20.
MAEs, separated by frequencies. Here, , marked by a brown rectangle, was not present during training or validation.
Figure 20.
MAEs, separated by frequencies. Here, , marked by a brown rectangle, was not present during training or validation.
Figure 21.
Confidence interval of the median for various frequencies. Here, , marked by a brown rectangle, was not present during training or validation.
Figure 21.
Confidence interval of the median for various frequencies. Here, , marked by a brown rectangle, was not present during training or validation.
Figure 22.
Mean MAEs of all three models, depending on the amplitude only for data with the frequency that did not appear during training.
Figure 22.
Mean MAEs of all three models, depending on the amplitude only for data with the frequency that did not appear during training.
Figure 23.
Critical differences of the architectures on the test dataset.
Figure 23.
Critical differences of the architectures on the test dataset.
Figure 24.
Training time during hyperparameter optimization in Experiment 3.
Figure 24.
Training time during hyperparameter optimization in Experiment 3.
Figure 25.
Best validation loss of all models.
Figure 25.
Best validation loss of all models.
Figure 26.
Validation loss curve of the best-performing models for each type in Experiment 3.
Figure 26.
Validation loss curve of the best-performing models for each type in Experiment 3.
Figure 27.
Inference time of a single forward pass with a batch size of four, averaged over five runs over the entire test dataset.
Figure 27.
Inference time of a single forward pass with a batch size of four, averaged over five runs over the entire test dataset.
Figure 28.
MAEs of different sample sequence lengths. Here, , marked by a brown rectangle, was not present during training or validation.
Figure 28.
MAEs of different sample sequence lengths. Here, , marked by a brown rectangle, was not present during training or validation.
Figure 29.
Best predicted sequence coming from a sample sequence length unseen during testing.
Figure 29.
Best predicted sequence coming from a sample sequence length unseen during testing.
Figure 30.
Critical difference of the architectures on all different datasets.
Figure 30.
Critical difference of the architectures on all different datasets.
Table 1.
Optuna hyperparameters and ranges which we optimized. Log-uniform means that we sampled the values uniformly in the logarithmic domain.
Table 1.
Optuna hyperparameters and ranges which we optimized. Log-uniform means that we sampled the values uniformly in the logarithmic domain.
Hyperparameter | LSTM | CNN | Transformer | Range | Sampling Type |
---|
Model dimensionality | No | Yes | No | , | Uniform integer |
| Yes | No | No | | Uniform integer |
| No | No | Yes | {} | Uniform |
Heads | No | No | Yes | {} | Uniform |
Architecture depth N | No | Yes | No | | Uniform integer |
| Yes | No | Yes | | Uniform integer |
Feed-forward layer dimensionality | No | No | Yes | | Log-uniform integer |
Kernel size | No | Yes | No | {} | Uniform |
Learning rate | No | Yes | No | | Log-uniform |
| Yes | No | No | | Log-uniform |
Optimization factor | No | No | Yes | | Log-uniform |
Optimization warmup steps | No | No | Yes | | Uniform integer |
Dropout | Yes | Yes | Yes | | Uniform float |
Weight decay | Yes | Yes | Yes | | Uniform float |
∑ | 6 | 5 | 8 | | |
Table 2.
Best LSTM hyperparameter settings.
Table 2.
Best LSTM hyperparameter settings.
LSTM | N | | Dropout | Learning Rate | Weight Decay |
---|
Experiment 1 | 4 | 38 | 0.36886 | 0.00170 | 0.00002 |
Experiment 2 | 5 | 47 | 0.35929 | 0.00172 | 0.00001 |
Experiment 3 | 4 | 47 | 0.25551 | 0.00331 | 0.00001 |
Table 3.
Best CNN hyperparameter settings.
Table 3.
Best CNN hyperparameter settings.
CNN | N | | Dropout | Learning Rate | Weight Decay | Kernel Size |
---|
Experiment 1 | 2 | 45 | 0.31065 | 0.00016 | 0.00028 | 5 |
Experiment 2 | 2 | 53 | 0.24150 | 0.00011 | 0.00016 | 3 |
Experiment 3 | 1 | 47 | 0.28798 | 0.00029 | 0.00001 | 3 |
Table 4.
Best Transformer hyperparameter settings.
Table 4.
Best Transformer hyperparameter settings.
Transformer | N | | Dropout | Opt Factor | Warmup | Weigth Decay | h | |
---|
Experiment 1 | 4 | 64 | 0.36731 | 0.62271 | 430 | 0.00002 | 8 | 489 |
Experiment 2 | 3 | 32 | 0.33472 | 1.92401 | 797 | 0.00002 | 8 | 42 |
Experiment 3 | 5 | 64 | 0.26745 | 0.42063 | 453 | 0.00002 | 8 | 171 |
Table 5.
Description of the dataset sampling for the first experiment. The left-out delay length is highlighted in bold font.
Table 5.
Description of the dataset sampling for the first experiment. The left-out delay length is highlighted in bold font.
Characteristic | Dataset | Possible Values | Sampling Type |
---|
| All | 512 | |
d | Train, validation | 0, 16, 32, 48, 64, 80, | Uniform |
112, 128, 144, 160, 176 |
Test | 0, 16, 32, 48, 64, 80, 96, |
112, 128, 144, 160, 176 |
f | All | 1/32 | |
| All | 2 | |
| Train | | |
Validation, test | |
Table 6.
Description of the dataset sampling for the second experiment. The left-out frequency is highlighted in bold font.
Table 6.
Description of the dataset sampling for the second experiment. The left-out frequency is highlighted in bold font.
Characteristic | Dataset | Possible Values | Sampling Type |
---|
| All | 512 | |
d | All | 0, 64, 128 | Uniform |
f | Train, validation | | Uniform |
Test | |
| All | 2, 3, 4, 5, 6, 7, 8, 9 | Uniform |
| Train | | |
Validation, test | |
Table 7.
Description of the dataset sampling for the third experiment. The left-out sequence length is highlighted in bold font.
Table 7.
Description of the dataset sampling for the third experiment. The left-out sequence length is highlighted in bold font.
Characteristic | Dataset | Possible Values | Sampling Type |
---|
| Train, validation | 128, 256, 1024, 2048 | Uniform |
Test | 128, 256, 512, 1024, 2048 |
d | Test | 0, 32 | Uniform |
f | All | , | Uniform |
| All | 2, 5 | Uniform |
| Train | | |
Validation, test | |
Table 8.
Average errors of the test dataset for the first experiment. Lowest error values are highlighted in bold font.
Table 8.
Average errors of the test dataset for the first experiment. Lowest error values are highlighted in bold font.
Model | Average MAE | Average RMSE | Average MAPE |
---|
Transformer | 0.040 | 0.066 | 9.925 |
CNN | 0.054 | 0.074 | 14.209 |
LSTM | 0.043 | 0.069 | 11.317 |
Table 9.
Average errors of the test dataset for the second experiment. The lowest error values are highlighted in bold font.
Table 9.
Average errors of the test dataset for the second experiment. The lowest error values are highlighted in bold font.
Model | Average MAE | Average RMSE | Average MAPE |
---|
Transformer | 0.073 | 0.102 | 16.731 |
CNN | 0.071 | 0.097 | 15.734 |
LSTM | 0.054 | 0.087 | 12.825 |
Table 10.
Average errors of the test dataset for the third experiment. The lowest error values are highlighted in bold font.
Table 10.
Average errors of the test dataset for the third experiment. The lowest error values are highlighted in bold font.
Model | Average MAE | Average RMSE | Average MAPE |
---|
Transformer | 0.049 | 0.079 | 11.23 |
CNN | 0.044 | 0.067 | 11.615 |
LSTM | 0.034 | 0.059 | 8.161 |