Back to Basics: The Power of the Multilayer Perceptron in Financial Time Series Forecasting
Abstract
:1. Introduction
2. Artificial Neural Networks for Time Series Forecasting
2.1. Transformers Networks
2.2. Long Short-Term Memory (LSTM) Networks
2.3. Bidirectional LSTM (BiLSTM) Networks
2.4. Multilayer Perceptron (MLP) Networks
3. Experimentation
3.1. Datasets
3.2. Error Metrics
- -
- Root mean squared error (RMSE):
- -
- Mean squared error (MSE):
- -
- Mean absolute percentage error (MAPE):
- -
- R-squared (R2):
3.3. Parameters
- -
- Learning rate: it is related to the number of epochs, since a value that is too low will make it necessary to increase the number of epochs and, as a consequence, result in slower training.
- -
- Batch size: it specifies the number of samples to be analyzed before updating the model parameters.
- -
- Epoch: it is the number of times the complete dataset will be trained in the model.
- -
- Optimization algorithm: This parameter can have a significant impact on the model since it will update the parameters based on the learning rate. In the experiments carried out in this research, Adam was used. This algorithm combines the benefits of RMSProp, which has certain similarities with gradient descent with momentum [46].
- -
- num_heads: It represents the number of attention heads in the attention layer of the Transformer network. This model with a multi-head attention layer allows the network to focus on different parts of the input set simultaneously, while the number of heads controls how many different characteristics should be considered by the network when processing the information.
- -
- ff_dim: This parameter determines the dimension that the feedforward layer will have within the Transformer network structure. It is a dense layer that is applied after the care layer. The selection of this parameter will affect the network’s ability to perform learning with more complex patterns.
3.4. Experimentation Results
4. Conclusions and Future Lines of Investigation
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Borghi, P.H.; Zakordonets, O.; Teixeira, J.P. A COVID-19 time series forecasting model based on MLP ANN. Procedia Comput. Sci. 2021, 181, 940–947. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.A.; Li, C.L.; Yoder, N.; Arik, S.O.; Pfister, T. TSMixer: An All-MLP Architecture for Time Series Forecasting. arXiv 2023, arXiv:2303.06053. [Google Scholar]
- Voyant, C.; Nivet, M.L.; Paoli, C.; Muselli, M.; Notton, G. Meteorological time series forecasting based on MLP modelling using heterogeneous transfer functions. J. Phys. Conf. Ser. 2015, 574, 012064. [Google Scholar] [CrossRef]
- Shiblee, M.; Kalra, P.K.; Chandra, B. Time Series Prediction with Multilayer Perceptron (MLP): A New Generalized Error Based Approach. In Advances in Neuro-Information Processing; Köppen, M., Kasabov, N., Coghill, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 37–44. [Google Scholar]
- Kamijo, K.; Tanigawa, T. Stock price pattern recognition-a recurrent neural network approach. In Proceedings of the 1990 IJCNN International Joint Conference on Neural Networks, San Diego, CA, USA, 17–21 June 1990; Volume 1, pp. 215–221. Available online: https://ieeexplore.ieee.org/abstract/document/5726532 (accessed on 18 March 2024).
- Chakraborty, K.; Mehrotra, K.; Mohan, C.K.; Ranka, S. Forecasting the behavior of multivariate time series using neural networks. Neural Netw. 1992, 5, 961–970. [Google Scholar] [CrossRef]
- de Rojas, A.L.; Jaramillo-Morán, M.A.; Sandubete, J.E. EMDFormer model for time series forecasting. AIMS Math. 2024, 9, 9419–9434. [Google Scholar] [CrossRef]
- Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015; pp. 88–126. ISBN 978-111-867-4925. [Google Scholar]
- Zerveas, G.; Jayaraman, S.; Patel, D.; Bhamidipaty, A.; Eickhoff, C. A Transformer-based Framework for Multivariate Time Series Representation Learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 14–18 August 2021. [Google Scholar]
- Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in Time Series: A Survey. arXiv 2023, arXiv:2202.07125. [Google Scholar]
- Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? Proc. AAAI Conf. Artif. Intell. 2023, 37, 11121–11128. [Google Scholar] [CrossRef]
- Ahmed, S.; Nielsen, I.E.; Tripathi, A.; Siddiqui, S.; Ramachandran, R.P.; Rasool, G. Transformers in Time-Series Analysis: A Tutorial. Circuits Syst. Signal Process. 2023, 42, 7433–7466. [Google Scholar] [CrossRef]
- Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Appl. Soft Comput. 2020, 90, 106181. [Google Scholar] [CrossRef]
- Krollner, B.; Vanstone, B.; Finnie, G. Financial time series forecasting with machine learning techniques: A survey. Comput. Intell. 2010, 8, 25–30. [Google Scholar]
- Zhang, G.; Qi, M. Neural network forecasting for seasonal and trend time series. Eur. J. Oper. Res. 2005, 160, 501–514. [Google Scholar] [CrossRef]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Hill, T.; O’Connor, M.; Remus, W. Neural Network Models for Time Series Forecasts. Manag. Sci. 1996, 42, 1082–1092. [Google Scholar] [CrossRef]
- Khashei, M.; Bijari, M. An artificial neural network (p,d,q) model for timeseries forecasting. Expert Syst. Appl. 2010, 37, 479–489. [Google Scholar] [CrossRef]
- Bhardwaj, S.; Chandrasekhar, E.; Padiyar, P.; Gadre, V.M. A comparative study of wavelet-based ANN and classical techniques for geophysical time-series forecasting. Comput. Geosci. 2020, 138, 104461. [Google Scholar] [CrossRef]
- Di Piazza, A.; Di Piazza, M.C.; La Tona, G.; Luna, M. An artificial neural network-based forecasting model of energy-related time series for electrical grid management. Math Comput. Simul. 2021, 184, 294–305. [Google Scholar] [CrossRef]
- Kumar, B.; Sunil Yadav, N. A novel hybrid model combining βSARMAβSARMA and LSTM for time series forecasting. Appl. Soft Comput. 2023, 134, 110019. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. Available online: https://arxiv.org/abs/1810.04805 (accessed on 7 October 2020).
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. Available online: http://arxiv.org/abs/2010.11929 (accessed on 18 March 2024).
- Cholakov, R.; Kolev, T. Transformers predicting the future. Applying attention in next-frame and time series forecasting. arXiv 2021, arXiv:2108.08224. Available online: http://arxiv.org/abs/2108.08224 (accessed on 19 March 2024).
- Lim, B.; Arık, S.; Loeff, N.; Pfister, T. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2021; pp. 22419–22430. Available online: https://proceedings.neurips.cc/paper/2021/hash/bcc0d400288793e8bdcd7c19a8ac0c2b-Abstract.html (accessed on 19 March 2024).
- Zeyer, A.; Bahar, P.; Irie, K.; Schlüter, R.; Ney, H. A Comparison of Transformer and LSTM Encoder Decoder Models for ASR. In Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore, 14–18 December 2019; pp. 8–15. Available online: https://ieeexplore.ieee.org/abstract/document/9004025 (accessed on 19 March 2024).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2017; Available online: https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed on 18 March 2024).
- Li, C.; Qian, G. Stock Price Prediction Using a Frequency Decomposition Based GRU Transformer Neural Net-work. Appl Sci. 2023, 13, 222. [Google Scholar] [CrossRef]
- Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. DeepAR: Probabilistic forecasting with autoregressive re-current networks. Int. J. Forecast. 2020, 36, 1181–1191. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Sagheer, A.; Kotb, M. Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing 2018, 323, 203–213. [Google Scholar] [CrossRef]
- Yamak, P.T.; Yujian, L.; Gadosey, P.K. A Comparison between ARIMA, LSTM, and GRU for Time Series Forecasting. In Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 20–22 December 2019. [Google Scholar]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The Performance of LSTM and BiLSTM in Forecasting Time Series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar] [CrossRef]
- Malhotra, P.; Ramakrishnan, A.; Anand, G.; Vig, L.; Agarwal, P.; Shroff, G. LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection. arXiv 2016, arXiv:1607.00148. Available online: http://arxiv.org/abs/1607.00148 (accessed on 19 March 2024).
- Laptev, N.; Yu, J.; Rajagopal, R. Applied timeseries Transfer learning. 2018. Available online: https://openreview.net/forum?id=BklhkI1wz (accessed on 19 March 2024).
- Kim, J.; Moon, N. BiLSTM model based on multivariate time series data in multiple field for forecasting trading area. J. Ambient. Intell. Humaniz. Comput. 2019, 1–10. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, Y.; Cao, W.; Bian, J.; Yi, X.; Zheng, S.; Li, J. Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures. arXiv 2022, arXiv:2207.01186. Available online: http://arxiv.org/abs/2207.01186 (accessed on 4 March 2024).
- Yi, K.; Zhang, Q.; Fan, W.; Wang, S.; Wang, P.; He, H.; Lian, D.; An, N.; Cao, L.; Niu, Z. Frequency-domain MLPs are More Effective Learners in Time Series Forecasting. Adv. Neural Inf. Process Syst. 2023, 36, 76656–76679. [Google Scholar]
- Madhusudhanan, K.; Jawed, S.; Schmidt-Thieme, L. Hyperparameter Tuning MLPs for Probabilistic Time Series Forecasting. arXiv 2024, arXiv:2403.04477. Available online: http://arxiv.org/abs/2403.04477 (accessed on 19 March 2024).
- Shen, F.; Chao, J.; Zhao, J. Forecasting exchange rate using deep belief networks and conjugate gradient method. Neurocomputing 2015, 167, 243–253. [Google Scholar] [CrossRef]
- Frechtling, D. Forecasting Tourism Demand; Routledge: London, UK, 2001; 279p. [Google Scholar]
- Olawoyin, A.; Chen, Y. Predicting the Future with Artificial Neural Network. Procedia Comput. Sci. 2018, 140, 383–392. [Google Scholar] [CrossRef]
- Pierce, D.A. R 2 Measures for Time Series. J. Am. Stat. Assoc. 1979, 74, 901–910. [Google Scholar] [CrossRef]
- Sun, R. Optimization for deep learning: Theory and algorithms. arXiv 2019, arXiv:1912.08957. Available online: http://arxiv.org/abs/1912.08957 (accessed on 24 March 2024).
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun ACM. 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Beck, J.V.; Arnold, K.J. Parameter Estimation in Engineering and Science; Wiley: Hoboken, NY, USA, 1977; 540p. [Google Scholar]
- Smith, L.N. A disciplined approach to neural network hyper-parameters: Part 1-learning rate, batch size, momentum, and weight decay. arXiv 2018, arXiv:1803.09820. Available online: http://arxiv.org/abs/1803.09820 (accessed on 24 March 2024).
- Pirani, M.; Thakkar, P.; Jivrani, P.; Bohara, M.H.; Garg, D. A Comparative Analysis of ARIMA, GRU, LSTM and BiLSTM on Financial Time Series Forecasting. In Proceedings of the 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballari, India, 23–24 April 2022; pp. 1–6. [Google Scholar]
- Yang, M.; Wang, J. Adaptability of Financial Time Series Prediction Based on BiLSTM. Procedia Comput. Sci. 2022, 199, 18–25. [Google Scholar] [CrossRef]
- Xiong, T.; Bao, Y.; Hu, Z. Beyond one-step-ahead forecasting: Evaluation of alternative multi-step-ahead forecasting models for crude oil prices. Energy Econ. 2013, 40, 405–415. [Google Scholar] [CrossRef]
- Fan, L.; Pan, S.; Li, Z.; Li, H. An ICA-based support vector regression scheme for forecasting crude oil prices. Technol. Forecast. Soc. Chang. 2016, 112, 245–253. [Google Scholar] [CrossRef]
- Aldabagh, H.; Zheng, X.; Mukkamala, R. A Hybrid Deep Learning Approach for Crude Oil Price Prediction. J. Risk Financ. Manag. 2023, 16, 503. [Google Scholar] [CrossRef]
- Prediction of Gold Price with ARIMA and SVM—IOPscience. Available online: https://iopscience.iop.org/article/10.1088/1742-6596/1767/1/012022/meta (accessed on 12 May 2024).
- Wang, J.; Lei, C.; Guo, M. Daily natural gas price forecasting by a weighted hybrid data-driven model. J. Pet. Sci. Eng. 2020, 192, 107240. [Google Scholar] [CrossRef]
- Fildes, R.; Petropoulos, F. Simple versus complex selection rules for forecasting many time series. J. Bus. Res. 2015, 68, 1692–1701. [Google Scholar] [CrossRef]
Model | Learning Rate | Batch | Epoch | Hidden Layers | Optimizer | Ff_dim | Num_heads |
---|---|---|---|---|---|---|---|
Transformer | 0.001 | 150 | 150 | 1 | Adam | 75 | 6 |
BiLSTM | 0.001 | 25 | 100 | 2 | Adam | - | - |
LSTM | 0.001 | 25 | 100 | 2 | Adam | - | - |
MLP | 0.001 | 25 | 75 | 2 | Adam | - | - |
Model | Neurons/Layer | Input Window | Loss | Activation Functions |
---|---|---|---|---|
BiLSTM | 25/25 | 25 | MSE | Tanh |
LSTM | 25/25 | 25 | MSE | Tanh |
MLP | 100/50 | - | MSE | relu |
Model | Sequence Length | Drop Out | Loss | Activation Functions |
---|---|---|---|---|
Transformer | 150 | 0.1 | MSE | relu |
Metrics | ||||||
---|---|---|---|---|---|---|
Time Serie | ANN | RMSE | MSE | MAPE | R2 | Time |
Brent | Transformer | 7.528 | 51.677 | 54.337 | 0.881 | 0.585 |
LSTM | 4.741 | 22.479 | 31.260 | 0.964 | 0.436 | |
MLP | 1.767 | 3.122 | 1.860 | 0.992 | 0.255 | |
BiLSTM | 4.033 | 16.263 | 29.413 | 0.974 | 1.86 | |
Gold | Transformer | 0.188 | 0.035 | 22.308 | 0.884 | 1.453 |
LSTM | 0.260 | 0.067 | 33.943 | 0.808 | 1.686 | |
MLP | 0.090 | 0.009 | 2.303 | 0.969 | 0.211 | |
BiLSTM | 0.240 | 0.058 | 23. 628 | 0.836 | 1.999 | |
GN | Transformer | 1.056 | 1.115 | 18.492 | 0.736 | 0.509 |
LSTM | 0.879 | 0.772 | 14.183 | 0.737 | 0.993 | |
MLP | 0.729 | 0.532 | 4.550 | 0.853 | 0.308 | |
BiLSTM | 0.867 | 0.752 | 14.857 | 0.743 | 1.541 | |
OMIE | Transformer | 21.273 | 452.531 | inf | 0.683 | 1.500 |
LSTM | 23.521 | 553.234 | inf | 0.733 | 2.177 | |
MLP | 12.940 | 167.530 | inf | 0.882 | 0.733 | |
BiLSTM | 24.198 | 585.547 | inf | 0.717 | 3.809 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lazcano, A.; Jaramillo-Morán, M.A.; Sandubete, J.E. Back to Basics: The Power of the Multilayer Perceptron in Financial Time Series Forecasting. Mathematics 2024, 12, 1920. https://doi.org/10.3390/math12121920
Lazcano A, Jaramillo-Morán MA, Sandubete JE. Back to Basics: The Power of the Multilayer Perceptron in Financial Time Series Forecasting. Mathematics. 2024; 12(12):1920. https://doi.org/10.3390/math12121920
Chicago/Turabian StyleLazcano, Ana, Miguel A. Jaramillo-Morán, and Julio E. Sandubete. 2024. "Back to Basics: The Power of the Multilayer Perceptron in Financial Time Series Forecasting" Mathematics 12, no. 12: 1920. https://doi.org/10.3390/math12121920
APA StyleLazcano, A., Jaramillo-Morán, M. A., & Sandubete, J. E. (2024). Back to Basics: The Power of the Multilayer Perceptron in Financial Time Series Forecasting. Mathematics, 12(12), 1920. https://doi.org/10.3390/math12121920