Determination of Optimal Batch Size of Deep Learning Models with Time Series Data

Hwang, Jae-Seong; Lee, Sang-Soo; Gil, Jeong-Won; Lee, Choul-Ki

doi:10.3390/su16145936

Open AccessArticle

Determination of Optimal Batch Size of Deep Learning Models with Time Series Data

¹

Transportation Research Institute, Ajou University, Suwon 16499, Republic of Korea

²

Department of Transportation System Engineering, Ajou University, Suwon 16499, Republic of Korea

³

Department of DNA+ Convergence, Ajou University, Suwon 16499, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(14), 5936; https://doi.org/10.3390/su16145936

Submission received: 24 April 2024 / Revised: 4 July 2024 / Accepted: 9 July 2024 / Published: 12 July 2024

(This article belongs to the Special Issue New Techniques to Promote Sustainable Mobility: Evaluation, Optimization and Behavioral Adaptation)

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents a new method to determine the optimal batch size for applying deep learning models with time series data. A set of batch sizes is determined by considering the length of the repetition pattern of the data using the Fast Fourier Transform (FFT). A comparative analysis is conducted to identify the impact of varying batch sizes on prediction errors for the three deep learning models. The results show that the RNN model has the optimal batch size that produces the minimum prediction error. In the DNN and CNN models, the optimal batch size is not correlated with the repetition pattern of time series data. Therefore, it is not recommended to apply CNN and DNN models of time series data. However, if used, a small batch size can be selected to reduce training time. In addition, the range of prediction error according to batch size is significantly larger for RNN models compared to DNN and CNN models.

Keywords:

batch size; time series data; deep learning; FFT; hyper-parameter

1. Introduction

Deep learning is a prominent technique in machine learning used for analyzing and predicting outcomes in many fields by examining features and correlations present in large-scale data. Numerous studies are underway to enhance the performance of deep learning models. For example, a deep feature synthesis algorithm was proposed to generate the final features by applying mathematical functions [1]; the number of layers, weights between layers, and the quantity of neurons in each layer were optimized by considering the accuracy and learning time [2]; batch size and learning rate were considered to enhance the model’s generalization ability and to prevent overfitting problems [3,4]; and a hybrid model was proposed to cope with time series data containing both linear and nonlinear temporal patterns [5].

Batch size refers to the set of sample data used to update the weights of a deep learning model at once. It is considered to be one of the important hyper-parameters that control learning rate and model performance [6]. Several studies have investigated batch size characteristics in terms of generalization performance, learning stability, and training time [4,7,8]. Most studies advocate for the use of a small batch size in the range between 2 and 64, but there is a study that does not recommend a small batch size. Overall, researchers currently apply a batch size empirically in a range of 2 to 128 by trial and error. However, it is uncertain whether these recommended values can optimize the predictive performance of the deep learning models. In addition, there is no clear explanation as to whether the value selected by the analyst is the appropriate batch size for the models.

Time series data are observed in a sequence over time and exhibit a pattern that repeats at regular intervals. In particular, time series data are widely used in the field of transportation modeling and prediction. Therefore, it is expected that the magnitude of the batch size may affect the performance of the deep learning models using time series data. However, there is currently no systematic way to determine the appropriate batch size for the time series data during the model building process. Therefore, the modeling is conducted using a heuristic method.

The purpose of this study is to propose a new method to determine the optimal batch size for applying deep learning models with time series data. To achieve this objective, traffic volume and speed data are collected from three distinct links of urban arterial exhibiting varying traffic patterns. Next, an analysis is conducted to assess the variation in prediction accuracy of three deep learning models based on the predetermined batch size. The batch size for evaluation is determined by considering the length of the repetition pattern using the Fast Fourier Transform (FFT) technique. The Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) are utilized as evaluation indicators to compare the prediction errors of each model.

2. Literature Review

Deep learning models are broadly categorized into Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs) based on their learning methods and structural characteristics [6]. Han et al. classified deep learning models used for time series prediction based on the perspective of probability characteristics. Six models including CNN, RNN, and hybrid were evaluated using actual data, and the hybrid model produced more stable results than other models [9]. Wang et al. also introduced a DNN, CNN, and RNN as deep learning models used in the Intelligent Transportation System (ITS). The three models are currently being used in time series prediction, but it was recommended to use CNN and RNN models [10].

Kang et al. showed that the short-term flow prediction accuracy of an RNN model could be improved by incorporating additional input variables [11]. It was shown that the prediction accuracy improved with the inclusion of occupancy and speed data. The accuracy was further improved when upstream and downstream traffic variables were included. Lee presented a queue length estimation model for urban roadways based on a DNN [12]. Kim developed a deep learning model for short-term speed prediction, which exhibited smaller prediction errors than statistical models [13]. Both Lee and Kim attempted to boost model performance by expanding a wide range of variables based on correlation and feature analysis. Ma et al. developed a CNN-based model by adjusting the depth of the hidden layer and the number of nodes [14]. It was shown that the proposed model outperformed other algorithms by an average accuracy improvement of 42.91% with a reasonable training time. Gnana and Deepa demonstrated that the min–max normalization technique could be applied to improve the accuracy of numeric computation because higher-valued data tend to suppress the influence of smaller variables during training [15]. Kim and Chung proposed a method to optimize the learning rate parameter by utilizing a Gamma distribution for the training performance in image discrimination [16].

Many fusion models have also been investigated to enhance model performance. Liu et al. compared the performance of a combined DNN and RNN model with other models such as linear regression, autoregressive integrated moving average (ARIMA), and DNN [17]. Several insights were discussed to optimize the structure of a combined model. Fouladgar et al. proposed a decentralized deep learning-based method to predict traffic congestion [18]. The fusion model involved identifying features with a CNN and making predictions with an RNN.

In general, deep learning models are trained using the mini-batch method, which involves dividing the data into small batches and updating the model incrementally. The mini-batch method is most often used because it can converge stably depending on the batch size, and it has the advantage of being able to select the batch size according to the performance of the data [6].

Most research on batch size has focused on enhancing training performance by investigating learning speed and generalization performance based on the batch size. Keskar et al. demonstrated that a large batch size increased the probability of converging to sharp minima of the training function, which impaired generalization performance. An explanation was also provided that a small batch size increased gradient noise, leading to convergence to a flat minimizer [8]. Iwana suggested the need for the normalization of data length when using various time series data together in mini-batch training [19]. This was because all time series data might have different lengths. Sutskeverm et al. [20] improved a short-term dependency in the order of words for predicting sentences using an RNN-based encoder and decoder structure. The Bilingual Evaluation Understudy (BLEU) score increased from 25.9 to 30.6, indicating that the learning performance of time series data increased.

Masters and Luschi demonstrated that the advantages of small batch size were better generalization performance, better training stability, and the ability to choose a much larger range of learning rates [7]. It was recommended to use small batches of 2 to 32 to build the models. Kandel and Castelli investigated the accuracy of a CNN model in image classification with many hyper-parameters [21]. It was recommended to use small batch sizes of 32 to 64 with a low learning rate to improve the performance of the network. Ghosh et al. analyzed the elapsed training time in relation to batch size in a Generative Adversarial Network [22]. They observed a reduction in training time for batch sizes from 8 to 16, and the training time became longer as the batch size increased. Wu and Johnson proposed that BatchNorm can be used to stabilize learning, enhance learning speed, and prevent overfitting [4]. They also recommended a batch size of 32 to 128 and claimed that a small batch size might cause problems.

Summarizing these studies, DNN, CNN, and RNN models were used to predict time series data, and attempts were made to introduce additional variables and hybrid models to improve the performance of these models. Additionally, studies were conducted on learning speed and performance according to batch size and on recommended batch sizes. Previous studies indicate that a batch size has a significant impact on model performance in terms of learning speed and training time. Although the batch size is recognized as an important hyper-parameter, it has been determined empirically by researchers to be between 2 and 128. However, it is uncertain whether these recommended values can produce the best performance in deep learning models for time series data.

3. Study Methodology

3.1. Selection of a Study Site

To assess the predictive performance of deep learning models, real-time field data were collected. In general, traffic volume and speed were considered to represent the time series characteristics well. The selection of the study site was guided by the following three criteria:

□: The site is located in an urban area where a vehicle detection system (VDS) is installed to collect traffic volume and speed.
□: The site has no oversaturated traffic conditions during peak hours. When oversaturation occurs, it is difficult to evaluate the characteristics of the study section.
□: The site shows the time series characteristics indicating changes in speed and traffic volume throughout the day.

Considering these characteristics, three links of the urban arterials were selected as the data collection sites in Dongtan District in Hwa-Seong City, Republic of Korea. A diagram of the study site is shown in Figure 1. The site was part of signalized arterials that passed through Dongtan District from north to south. A freeway leading to Seoul City was located to the north and east of the study site. In addition, industrial complexes were located to the west and south of the study site. Therefore, a significant commuter traffic volume was found with the directional peak flow patterns at the site.

3.2. Data Collection

Traffic speed was collected from the VDS installed in the middle of the link. Traffic volume was obtained using a closed-circuit television (CCTV) detector installed at the intersection. The traffic volume was counted separately for each movement such as through, left turn, and right turn for each link. The collected average speed and traffic volumes are summarized in Table 1.

The site has a morning and afternoon peak time. The morning peak time occurs at 08:00~09:00 and the afternoon peak is at 18:00~19:00. Currently, the eastbound direction of link 3 provided a good progression showing high traffic speed. The southbound direction was characterized by low traffic speed and high left-turn traffic volume.

The data used in the analysis were aggregated from real-time data collected at 5 min intervals. Since one hour has 12 data lengths, one day has 288 data lengths. A total of 3 months of data were collected for each link in this study.

3.3. Methodology

To determine the optimal batch size, a set of appropriate batch sizes should be determined first. Since time series data were used in this study, the performance of deep learning models might be correlated with a repeated cyclic pattern of the data. Therefore, initial batch size was investigated using FFT to test a pattern of periodization.

The FFT is a computational algorithm that quickly transforms the Discrete Fourier Transform (DFT). The DFT transforms a signal in the time domain to the frequency domain, analyzing the amplitude and phase of each frequency component. The DFT converts a time-domain signal x(n) to a frequency-domain signal X(k) as follows.

X (k) = \sum_{n = 0}^{N - 1} x (n) e^{- \frac{j 2 π k n}{N}}, 0 \leq k < N

(1)

where N is the number of samples,

x (n)

is the n-th sample of a time-domain signal,

X (k)

is the k-th component of a frequency-domain signal, and j is the imaginary unit.

The FFT is an algorithm that reduces the time complexity of computing the DFT above by splitting the signal into odd and even samples and computing the DFT recursively to increase efficiency [23]. The FFT analyzes the spectral density by decomposing the data into frequency components. A high spectral density indicates that a frequency component is consistently prominent. Therefore, the duration of that frequency is the length of the recurring pattern. By applying the FFT to traffic time series data, if a particular frequency has a high amplitude, it can be identified as a highly repetitive pattern cycle [24]. A study concluded that the FFT technique produced better performance in identifying the trends and seasonality in time series data compared to the Autocorrelation Function (ACF) [25].

Following an examination of the FFT analysis, an evaluation was carried out to assess predictive errors of three deep learning models such as DNN, CNN, and RNN. The three models were employed with a set of appropriate batch sizes. Then, the results were compared with a baseline model, which was a simple statistical prediction model. The baseline model was set to predict by moving average with the data length set as the batch size.

The RNN used a Long Short-Term Memory (LSTM), which was developed to address the issue of gradient decay commonly found in basic RNNs. Input variables for the three models were speed and traffic volume by each movement. The three models produced 12 data points at a time, representing one hour ahead using a single-shot model instead of a single-step prediction.

The deep learning model was implemented using libraries provided by Keras. To analyze the impact of batch size on the performance of the models for each deep learning technique, the parameters of the models were set to be as identical as possible. In general, the more layers and nodes in a model, the better the model performance, but it also increases the training time. The complexity of the model was minimized to focus on the impact of batch size.

For all deep learning models, the parameters were set to layer 64 and node 10, and the activation function was Rectified Linear Unit (ReLU). The loss function was MAE, and the weights were updated with mini-batch gradient descent. In addition, early stopping was set to 50 in order to prevent overfitting. Epoch 50 was not reached in all model analysis processes, and training was stopped between 30 and 40 epochs. The training data were separated from the test data to check the prediction performance. The input data were divided into 70% training, 20% validation, and 10% testing.

As for the evaluation metrics, MAE and MAPE were selected. The two indicators are widely used in traffic flow prediction [6] and are calculated as follows.

MAE = (\sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |) \div n

(2)

MAPE = \frac{100}{n} \sum_{i = 1}^{n} | (y_{i} - {\hat{y}}_{i}) \div y_{i} |

(3)

where

y

is the actual output value,

\hat{y}

is the predicted output value, and

n

is the total number of data.

4. Data Analysis

4.1. FFT Analysis and Batch Size

By using the speed data collected for each link, the FFT analysis was performed to determine the spectral density over frequency. A graphical representation of the FFT analysis result of link 1 is shown in Figure 2. The x-axis represents the frequency period, and the y-axis represents the spectral density value. The fluctuations in spectrum density values were very large depending on the frequency period. The highest density value was found at 288, and the second-highest density was located at 144.

Similarly, the FFT analysis was completed for link 2 and link 3. The spectral density values were also checked, and the two periods with the highest spectral density values were identified. These results are summarized in Table 2. The FFT analysis results indicated that a period length of 288 had the highest density value for the three links tested. A period length of 144 showed the second-highest density value for the three links tested. A period length of 288 represented 24 h, and a period length of 144 corresponded to 12 h, as listed in Table 2. From these results, it was found that there was a period length that maximized the density value in time series data.

In this study, cycle length 1 (CL1) denoted a period length of the highest density, and cycle length 2 (CL2) indicated a period length of the second-highest density. Intuitively, a batch size for deep learning models could be determined by applying either CL1 or CL2.

To analyze the performance of the three models, a set of appropriate batch sizes was determined. Referring to the values in Table 2, several batch sizes could be selected proportionally. The batch sizes applied in this evaluation were 36, 72, 144, 288, 864, 1440, 2016, and 2592. These are 1/8, 1/4, 1/2, 1, 3, 5, 7, and 9 times the length of the repeating pattern.

4.2. Comparison Results by Models

The evaluation results of link 1 are summarized in Table 3. The baseline model had the smallest prediction error with an MAE of 7.30 at a batch size of 72. The DNN and CNN had the lowest error rate at a batch size of 1440. The RNN had the smallest error with a batch size of 288. The RNN had the lowest error in CL1 at link 1, whereas the DNN and CNN had the smallest error values at different batch sizes other than CL1. The MAPE indicator also confirmed the similar analysis results.

The range value of MAE showed a maximum value of 3.04 for baseline, and 1.54 for the RNN. The CNN and DNN, on the other hand, were found to have very small values of 0.27 and 0.18. Based on the findings, it is evident that the RNN exhibits significant fluctuations in error magnitude for different batch sizes, while the CNN and DNN produce very small fluctuations for different batch sizes.

The evaluation results of link 2 are presented in Table 4. The baseline had the smallest prediction error with an MAE of 3.94 at a batch size of 36. The DNN had the smallest error with an MAE of 3.62 at a batch size of 144, while the CNN had the smallest error with an MAE of 3.53 at a batch size of 72. The RNN produced a minimum MAE of 3.31 with a batch size of 288. The CL1 value at link 2 was 288 from the FFT analysis. Therefore, the RNN produced the lowest error in CL1 at link 2, but the DNN and CNN could not obtain the smallest error values in CL1. In addition, the error of the baseline model increased as the batch size increased.

The range value of MAE was calculated to be 0.37 for RNN, 0.14 and 0.11 for the CNN and DNN, respectively. These results were similar to the analysis results for link 1. The RNN exhibited significant fluctuation in error size across batch sizes. Both the CNN and DNN produced stable error values regardless of batch size.

The analysis results of link 3 are listed in Table 5. The baseline model had a prediction error of MAE 3.93 with a batch size of 36. The DNN had an MAE of 3.91 with a batch size of 864, while the CNN showed an MAE of 3.82 with a batch size of 2016. The RNN produced an MAE of 3.67 with a batch size of 288. Like the results of link 1 and link 2, the RNN showed the lowest error in CL1, whereas the DNN and CNN could not obtain the smallest error values in CL1.

The range value of MAE was calculated to be 0.70 for the RNN, and 0.09 and 0.06 for CNN and DNN, respectively. The implication of these results was the same as those of link 1 and link 2.

The MAE values for CL1 and CL2 at three links are summarized in Table 6. The baseline model showed a pattern of increasing prediction error as the batch size increased. The DNN showed a small error in CL1 at link 1 and link 3, but small error was found in CL2 at link 2. The CNN showed a small error in CL1 at link 1 and link 3, and the same value was calculated at link 2. Therefore, it is judged that a batch size obtained from the time series data does not affect the magnitude of errors of the DNN and CNN. This is probably because the RNN model is affected by time dependence, which is influenced by data from previous points in time. However, the RNN showed a small error in CL1 at all links, indicating that a batch size obtained from the time series data had a significant impact on the error.

Based on the above analysis, it was confirmed that the prediction error of the deep learning models according to batch size was affected by each model. It was very difficult to find an optimal batch size in the DNN and CNN because the prediction error of the DNN and CNN might be independent of the batch size. However, there was an optimal batch size in the RNN, and it could be found by identifying the period length of the highest density of the time series data.

5. Conclusions

This paper evaluates the impact of batch size on the prediction error of deep learning models using time series data. For this purpose, traffic volume and speed data were collected from three links of urban arterials. Then, the prediction errors of three deep learning models were compared using predetermined batch sizes. Several conclusions can be drawn from the analysis results.

Firstly, the RNN model had an optimal batch size for time series data, and it could be easily found by applying FFT. The prediction error was minimal at the optimal batch size for all links tested. Therefore, the optimal batch size must be identified and applied when analyzing time series data with an RNN model.

Secondly, an optimal batch size could not be easily found in the DNN and CNN models. Therefore, it was not recommended to apply the CNN and DNN models for time series data. However, if used, it was recommended to use a small batch size to reduce training time, as indicated in previous studies [7,21,22]. For the baseline model, the prediction error decreased as the batch size decreased.

Lastly, the RNN model showed a large range value of prediction error depending on the batch size. However, the DNN and CNN models showed very small fluctuations in prediction error regardless of batch size.

The batch size is determined using a period length calculated from the field data in this study. The period length of the data collected in this study is the same at all links. However, the value may vary depending on the traffic characteristics and patterns of the study sites. It is recommended that the results be confirmed with additional data collected from other locations. In addition, this study was conducted based on transportation-related time series data, but additional research is needed based on data from other fields.

Author Contributions

Methodology, J.-S.H.; Validation, S.-S.L. and C.-K.L.; Formal analysis, J.-W.G.; Resources, J.-S.H.; Data curation, J.-S.H. and J.-W.G.; Writing—original draft, J.-S.H. and J.-W.G.; Writing—review & editing, S.-S.L.; Supervision, S.-S.L.; Project administration, C.-K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Institute of Police Technology (KIPoT) Grant funded by the Korean government (KNPA) (No. 092021C28S01000).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We would like to thank the officials in the Advanced Transportation Division of Hwa-Seong City Hall for providing the data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kanter, J.M.; Veeramachaneni, K. Deep Feature Synthesis: Towards Automating Data Science Endeavors. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA), Paris, France, 19–21 October 2015. [Google Scholar]
Karsoliya, S. Approximating Number of Hidden layer neurons in Multiple Hidden Layer BPNN Architecture. Int. J. Eng. Trends Technol. 2012, 3, 714–717. [Google Scholar]
He, F.; Liu, T.; Tao, D. Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence. In Proceedings of the Advanced in Neural Information Processing System 32, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Wu, Y.; Johnson, J. Rethinking “Batch” in BatchNorm. arXiv 2021, arXiv:2105.07576. [Google Scholar]
Hossain, M.A.; Karim, R.; Thulasiram, R.; Bruce, N.D.B.; Wang, Y. Learning Model for Stock Price Prediction. In Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence, Bangalore, India, 31 January 2019. [Google Scholar]
Geron, A. Machine learning at a glance. In Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow; Authorized Korean Translation of the English Edition; Park, H.S., Ed.; Hanbit Media Inc.: Seoul, Republic of Korea, 2020; pp. 44–46, 405–406. [Google Scholar]
Masters, D.; Luschi, C. Revisiting Small Batch Training for Deep Neural Networks. arXiv 2018, arXiv:1804.07612. [Google Scholar]
Keskar, N.S.; Mudigere, D.; Nocedal, J.; Smelyanskiy, M.; Tang, P.T. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
Han, Z.; Zhao, J.; Leung, H.; Ma, K.F.; Wang, W. A Review of Deep Learning Models for Time Series Prediction. IEEE Sens. J. 2021, 21, 7833–7848. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, D.; Liu, Y.; Dai, B.; Lee, L. Enhancing transportation systems via deep learning: A survey. Transp. Res. Part C Emerg. Technol. 2019, 99, 144–163. [Google Scholar] [CrossRef]
Kang, D.; Lu, Y.; Chen, Y.Y. Short-term Traffic Flow prediction with LSTM Recurrent Neural Network. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems, Yokohama, Japan, 16–19 October 2017. [Google Scholar]
Lee, Y.J. Development of an Estimation Model of Vehicle Queue Length Using Link Travel Time Based on Deep Learning. Doctoral Thesis, Ajou University, Suwon, Republic of Korea, February 2018. [Google Scholar]
Kim, H.Y. Development of a Speed Prediction Model for Signalized Intersections Based on Gated Recurrent Unit. Master’s Thesis, Ajou University, Suwon, Republic of Korea, February 2023. [Google Scholar]
Ma, X.; Dai, Z.; He, Z.; Ma, J.; Wang, Y.; Wang, Y. Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction. Sensors 2017, 17, 818. [Google Scholar] [CrossRef] [PubMed]
Gnana, S.K.; Deepa, S.N. Review on Methods to Fix Number of Hidden Neurons in Neural Networks. Math. Probl. Eng. 2013, 2013, 425740. [Google Scholar]
Kim, Y.H.; Chung, M.D. An Approach to Hyperparameter Optimization for the Objective Function in Machine Learning. Electronics 2019, 8, 1267. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Yang, X.; Zhang, L. Short-term travel time prediction by deep learning A comparison of different LSTM-DNN models. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems, Yokohama, Japan, 16–19 October 2017. [Google Scholar]
Fouladgar, M.; Parchami, M.; Elmasri, R.; Ghaderi, A. Scalable deep traffic flow neural networks for urban traffic congestion prediction. In Proceedings of the International Joint Conference on Neural Networks, Anchorage, AK, USA, 14–19 May 2017; pp. 2251–2258. [Google Scholar]
Iwana, B.K. On Mini-Batch Training with varying length time series. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Singapore, 23–27 May 2022. [Google Scholar]
Sutskeverm, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Kandel, I.; Castelli, M. The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset. ICT Express 2020, 6, 312–315. [Google Scholar] [CrossRef]
Ghosh, B.; Dutta, I.K.; Carlson, A.; Totaro, M.; Bayoumi, M. An Empirical Analysis of Generative Adversarial Network Training Times with Varying Batch Sizes. In Proceedings of the IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference, UEMCON, New York, NY, USA, 28–31 October 2020. [Google Scholar]
Park, M.S.; Kim, J.Y. Acoustic Characteristic Analysis of the accident for Automatic Traffic Accident Detection and Intersection. J. Korea Acad.-Ind. Coop. Soc. 2006, 7, 1142–1148. [Google Scholar]
Ng, M.; Waller, S.T. A Computationally Efficient Methodology to Characterize Travel Time Reliability using the Fast Fourier Transform. Transp. Res. Part B Methodol. 2010, 44, 1149–1290. [Google Scholar] [CrossRef]
Musbah, H.; El-Hawary, M.; Aly, H. Identifying Seasonality in Time Series by Applying Fast Fourier Transform. In Proceedings of the IEEE Electrical Power and Energy Conference (EPEC), Montreal, QC, Canada, 16–18 October 2019. [Google Scholar]

Figure 1. Layout of the study site.

Figure 2. FFT outputs of link 1.

Table 1. Average speed and traffic volume data collected.

Link ID	Peak Time	Speed (km/h)	Volume (vph)
Link ID	Peak Time	Speed (km/h)	Through	Left_Turn	Right_Turn	Total
Link 1	Morning	19.8	857	53	89	999
Link 1	Afternoon	16.0	754	61	110	925
Link 2	Morning	13.4	576	541	34	1151
Link 2	Afternoon	12.6	508	272	71	851
Link 3	Morning	33.2	821	23	187	1031
Link 3	Afternoon	39.4	582	31	153	766

Table 2. Summary of FFT analysis results by link.

Link ID		Link 1	Link 2	Link 3
CL1	Period	288	288	288
CL1	Hour	24	24	24
CL2	Period	144	144	144
CL2	Hour	12	12	12

Table 3. Analysis result by batch size at link 1.

Batch Size	Baseline		DNN		CNN		RNN
Batch Size	MAE	MAPE (%)	MAE	MAPE (%)	MAE	MAPE (%)	MAE	MAPE (%)
36	7.31	37.1	7.06	37.3	7.06	36.7	6.37	33.7
72	7.30	36.8	7.09	37.6	6.95	35.9	6.85	36.1
144	7.33	37.3	7.03	37.4	6.87	35.9	6.41	33.8
288	7.41	39.3	6.93	35.8	6.82	35.5	6.15	32.4
864	8.80	44.2	7.01	35.7	6.82	34.6	7.29	36.5
1440	9.54	49.7	6.91	36.0	6.79	34.2	7.20	37.7
2016	9.97	50.4	6.98	36.7	6.86	35.5	6.92	36.0
2592	10.34	53.2	6.98	36.7	6.86	35.6	7.69	40.4
Range	3.04	16.4	0.18	1.9	0.27	2.5	1.54	8.0

Table 4. Analysis result by batch size at link 2.

Batch Size	Baseline		DNN		CNN		RNN
Batch Size	MAE	MAPE (%)	MAE	MAPE (%)	MAE	MAPE (%)	MAE	MAPE (%)
36	3.94	24.4	3.68	21.6	3.54	21.3	3.42	20.6
72	3.95	22.9	3.63	22.0	3.53	21.6	3.46	20.9
144	3.97	23.1	3.62	22.1	3.54	21.9	3.44	20.6
288	4.04	23.5	3.65	22.6	3.54	21.6	3.31	20.0
864	4.94	29.3	3.65	22.3	3.55	21.6	3.41	20.4
1440	5.47	32.7	3.69	22.5	3.55	21.9	3.55	20.9
2016	5.75	35.9	3.64	22.1	3.57	21.1	3.62	21.5
2592	5.87	35.7	3.73	22.7	3.67	21.9	3.68	22.7
Range	1.93	13.0	0.11	1.1	0.14	0.8	0.37	2.7

Table 5. Analysis result by batch size at link 3.

Batch Size	Baseline		DNN		CNN		RNN
Batch Size	MAE	MAPE (%)	MAE	MAPE (%)	MAE	MAPE (%)	MAE	MAPE (%)
36	3.93	9.8	3.92	9.8	3.91	10.1	3.83	9.8
72	3.95	9.5	3.97	10.1	3.88	9.6	3.94	9.9
144	3.95	10.0	3.97	10.2	3.87	9.7	3.87	9.6
288	4.05	10.2	3.94	9.7	3.85	9.6	3.67	8.9
864	5.50	13.4	3.91	9.6	3.86	9.9	4.09	10.2
1440	6.23	15.0	3.92	9.7	3.86	9.3	4.04	10.1
2016	4.06	10.4	3.95	10.1	3.82	9.4	3.82	9.2
2592	6.86	16.6	3.93	9.7	3.83	9.7	4.37	10.9
Range	2.93	7.1	0.06	0.6	0.09	0.8	0.70	2.0

Table 6. Comparison of MAE values for CL1 and CL2 by models.

		Link 1		Link 2		Link 3
FFT Rank		CL1	CL2	CL1	CL2	CL1	CL2
Batch Size		288	144	288	144	288	144
MAE	Baseline	7.41	7.33	4.04	3.97	4.05	3.95
	DNN	6.93	7.03	3.65	3.62	3.94	3.97
	CNN	6.82	6.87	3.54	3.54	3.85	3.87
	RNN	6.15	6.41	3.31	3.44	3.67	3.87

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hwang, J.-S.; Lee, S.-S.; Gil, J.-W.; Lee, C.-K. Determination of Optimal Batch Size of Deep Learning Models with Time Series Data. Sustainability 2024, 16, 5936. https://doi.org/10.3390/su16145936

AMA Style

Hwang J-S, Lee S-S, Gil J-W, Lee C-K. Determination of Optimal Batch Size of Deep Learning Models with Time Series Data. Sustainability. 2024; 16(14):5936. https://doi.org/10.3390/su16145936

Chicago/Turabian Style

Hwang, Jae-Seong, Sang-Soo Lee, Jeong-Won Gil, and Choul-Ki Lee. 2024. "Determination of Optimal Batch Size of Deep Learning Models with Time Series Data" Sustainability 16, no. 14: 5936. https://doi.org/10.3390/su16145936

APA Style

Hwang, J.-S., Lee, S.-S., Gil, J.-W., & Lee, C.-K. (2024). Determination of Optimal Batch Size of Deep Learning Models with Time Series Data. Sustainability, 16(14), 5936. https://doi.org/10.3390/su16145936

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Determination of Optimal Batch Size of Deep Learning Models with Time Series Data

Abstract

1. Introduction

2. Literature Review

3. Study Methodology

3.1. Selection of a Study Site

3.2. Data Collection

3.3. Methodology

4. Data Analysis

4.1. FFT Analysis and Batch Size

4.2. Comparison Results by Models

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI