An Adaptive Learning Time Series Forecasting Model Based on Decoder Framework
Abstract
:1. Introduction
- The ALD model employs a decoder-only architecture. In LSTF tasks, the masking operation in the decoder-only architecture converts the attention matrix from a non-full-rank matrix to a full-rank one, which theoretically offers greater expressive ability [7]. In contrast to other architectures, the decoder-only structure proves to be more efficacious in extracting, learning, and representing useful information from a given quantity of data, addressing the challenge of deep modeling in LSTF tasks while enhancing data utilization efficiency. Owing to the decoder-only design, the ALD model demonstrates considerable advantages in forecasting accuracy.
- An adaptive statistical forecasting layer is designed. This layer models the evolving trends in the initial time series data dynamically. Specifically, the model acquires and predicts the key statistical features of the time series independently through this layer. When concept drift takes place, the model is capable of adjusting its outputs adaptively in accordance with the learned trends. This mechanism efficiently captures alterations in the data distribution and promptly adjusts forecasting when abrupt changes arise, significantly enhancing the model’s adaptability and robustness in complex dynamic environments. This design guarantees that the model retains high predictive performance even in the event of concept drift.
2. Related Work
3. Model Construction
- Compute , , with a complexity of .
- Compute , with a complexity of .
- Compute , , with a complexity of .
4. Experimental Setup and Analysis of Results
4.1. Datasets
4.2. Benchmark Models and Experimental Setup
4.3. Model Parameters
4.4. Results and Analysis
4.5. Ablation Study
4.6. Hyperparameter Sensitivity
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sun, J.; Cao, Z.; Li, H.; Qian, S.; Wang, X.; Yan, L.; Xue, W. Application of artificial intelligence technology to numerical weather forecasting. J. Appl. Meteorol. Sci. 2021, 32, 1–11. [Google Scholar]
- Ding, F.; Jiang, M.Y. Housing price forecasting based on improved lion swarm algorithm and BP neural network model. J. Shandong Univ. (Eng. Sci.) 2021, 51, 8–16. [Google Scholar]
- Zhao, W.Z.; Yuan, G.; Zhang, Y.M.; Qiao, S.; Wang, S.; Zhang, L. Multi-view Fused Spatial-temporal Dynamic GCN for Urban Traffic Flow Forecasting. J. Softw. 2024, 35, 1751–1773. [Google Scholar]
- Wang, C.; Wang, Y.; Zheng, T.; Dai, Z.M.; Zhang, K.F. Multi-Energy Load Forecasting in Integrated Energy System Based on ResNet-LSTM Network and Attention Mechanism. Trans. China Electrotech. Soc. 2022, 37, 1789–1799. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Tang, S.Q.; Feng, C.; Gao, K. Recurrent concept drift data stream classification based on online transfer learning. J. Comput. Res. Dev. 2016, 53, 1781–1791. [Google Scholar]
- Dong, Y.H.; Cordonnier, J.B.; Loukas, A. Attention is not all you need: Pure attention loses rank doubly exponentially with depth. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 21–27 July 2021; pp. 2793–2803. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
- Liu, S.; Yu, H.; Liao, C.; Li, J.; Lin, W.; Liu, A.X.; Dustdar, S. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In Proceedings of the International Conference on Learning Representations, Virtual Conference, 3–7 May 2021. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. Preprint. 2018. Available online: https://scholar.google.com/citations?view_op=view_citation&hl=zh-CN&user=dOad5HoAAAAJ&citation_for_view=dOad5HoAAAAJ:W7OEmFMy1HYC (accessed on 10 December 2024).
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.V.; Salakhutdinov, R. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
- Su, J.; Lu, Y.; Pan, S.; Murtadha, A.; Wen, B.; Liu, Y. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing 2024, 568, 127063. [Google Scholar] [CrossRef]
- Kim, T.; Kim, J.; Tae, Y.; Park, C.; Choi, J.H.; Choo, J. Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Li, W.; Yang, X.; Liu, W.; Xia, Y.; Bian, J. Ddg-da: Data distribution generation for predictable concept drift adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, Arlington, VA, USA, 17–19 November 2022; Volume 36, pp. 4092–4100. [Google Scholar]
- Bai, G.J.; Ling, C.; Zhao, L. Temporal Domain Generalization with Drift-Aware Dynamic Neural Network. arXiv 2022, arXiv:2205.10664. [Google Scholar]
- Liu, Z.; Cheng, M.; Li, Z.; Huang, Z.; Liu, Q.; Xie, Y.; Chen, E. Adaptive normalization for non-stationary time series forecasting: A temporal slice perspective. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
- Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Zhang, Y.; Yan, J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In Proceedings of the International conference on learning representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Das, A.; Kong, W.; Leach, A.; Mathur, S.; Sen, R.; Yu, R. Long-term forecasting with tide: Time-series dense encoder. arXiv 2023, arXiv:2304.08424. [Google Scholar]
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. In Proceedings of the International conference on learning representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington DC, USA, 7–14 February 2023; pp. 11121–11128. [Google Scholar]
- Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. Scinet: Time series modeling and forecasting with sample convolution and interaction. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
Dataset | Variables | Forecasting Length | Dataset Size | Frequency | Information |
---|---|---|---|---|---|
ETTh1, ETTh2 | 7 | {96, 192, 336, 720} | (8545, 2881, 2881) | Hourly | Electricity |
ETTm1 | 7 | {96, 192, 336, 720} | (34,465, 11,521, 11,521) | 15 min | Electricity |
Exchange | 8 | {96, 192, 336, 720} | (5120, 665, 1422) | Daily | Economy |
ECL | 321 | {96, 192, 336, 720} | (18,317, 2633, 5261) | Hourly | Electricity |
Models | ALD | iTransformer | PatchTST | Crossformer | TiDE | TimesNet | DLinear | SCINet | FEDformer | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | |
ETTh1 | 96 | 0.387 | 0.397 | 0.386 | 0.405 | 0.414 | 0.419 | 0.423 | 0.448 | 0.479 | 0.464 | 0.384 | 0.402 | 0.386 | 0.400 | 0.654 | 0.599 | 0.376 | 0.419 |
192 | 0.452 | 0.437 | 0.441 | 0.436 | 0.460 | 0.445 | 0.471 | 0.474 | 0.525 | 0.492 | 0.436 | 0.429 | 0.437 | 0.432 | 0.719 | 0.631 | 0.420 | 0.448 | |
336 | 0.509 | 0.469 | 0.487 | 0.458 | 0.501 | 0.446 | 0.570 | 0.546 | 0.565 | 0.515 | 0.491 | 0.469 | 0.481 | 0.459 | 0.778 | 0.659 | 0.459 | 0.465 | |
720 | 0.565 | 0.522 | 0.503 | 0.491 | 0.500 | 0.488 | 0.653 | 0.621 | 0.594 | 0.558 | 0.521 | 0.500 | 0.519 | 0.516 | 0.836 | 0.699 | 0.506 | 0.507 | |
ETTh2 | 96 | 0.304 | 0.358 | 0.297 | 0.349 | 0.302 | 0.348 | 0.745 | 0.584 | 0.400 | 0.440 | 0.340 | 0.374 | 0.333 | 0.387 | 0.707 | 0.621 | 0.358 | 0.397 |
192 | 0.377 | 0.403 | 0.380 | 0.400 | 0.388 | 0.400 | 0.877 | 0.656 | 0.528 | 0.509 | 0.402 | 0.414 | 0.477 | 0.476 | 0.860 | 0.689 | 0.429 | 0.439 | |
336 | 0.418 | 0.437 | 0.428 | 0.432 | 0.426 | 0.433 | 1.043 | 0.731 | 0.643 | 0.571 | 0.452 | 0.452 | 0.594 | 0.541 | 1.000 | 0.744 | 0.496 | 0.487 | |
720 | 0.434 | 0.456 | 0.427 | 0.445 | 0.431 | 0.446 | 1.104 | 0.763 | 0.874 | 0.679 | 0.462 | 0.468 | 0.831 | 0.657 | 1.249 | 0.838 | 0.463 | 0.474 | |
ETTm1 | 96 | 0.184 | 0.277 | 0.334 | 0.368 | 0.329 | 0.367 | 0.404 | 0.426 | 0.364 | 0.387 | 0.338 | 0.375 | 0.345 | 0.372 | 0.418 | 0.438 | 0.379 | 0.419 |
192 | 0.250 | 0.322 | 0.377 | 0.391 | 0.367 | 0.385 | 0.450 | 0.451 | 0.398 | 0.404 | 0.374 | 0.387 | 0.380 | 0.389 | 0.439 | 0.450 | 0.426 | 0.411 | |
336 | 0.314 | 0.364 | 0.426 | 0.420 | 0.399 | 0.410 | 0.532 | 0.515 | 0.428 | 0.425 | 0.410 | 0.411 | 0.413 | 0.413 | 0.490 | 0.485 | 0.445 | 0.459 | |
720 | 0.415 | 0.425 | 0.491 | 0.459 | 0.454 | 0.439 | 0.666 | 0.589 | 0.487 | 0.461 | 0.478 | 0.450 | 0.474 | 0.453 | 0.595 | 0.550 | 0.543 | 0.490 | |
Exchange | 96 | 0.077 | 0.198 | 0.086 | 0.206 | 0.088 | 0.205 | 0.256 | 0.367 | 0.094 | 0.218 | 0.107 | 0.234 | 0.088 | 0.218 | 0.267 | 0.396 | 0.148 | 0.278 |
192 | 0.164 | 0.290 | 0.177 | 0.299 | 0.176 | 0.299 | 0.470 | 0.509 | 0.184 | 0.307 | 0.266 | 0.344 | 0.176 | 0.315 | 0.351 | 0.459 | 0.271 | 0.315 | |
336 | 0.317 | 0.409 | 0.331 | 0.417 | 0.301 | 0.397 | 1.268 | 0.883 | 0.349 | 0.431 | 0.367 | 0.448 | 0.313 | 0.427 | 1.324 | 0.853 | 0.460 | 0.427 | |
720 | 0.843 | 0.688 | 0.847 | 0.691 | 0.901 | 0.714 | 1.767 | 1.068 | 0.852 | 0.698 | 0.964 | 0.746 | 0.839 | 0.695 | 1.058 | 0.797 | 1.195 | 0.695 | |
ECL | 96 | 0.176 | 0.261 | 0.148 | 0.240 | 0.195 | 0.285 | 0.219 | 0.314 | 0.237 | 0.329 | 0.168 | 0.272 | 0.197 | 0.282 | 0.247 | 0.345 | 0.193 | 0.308 |
192 | 0.175 | 0.262 | 0.162 | 0.253 | 0.199 | 0.289 | 0.231 | 0.322 | 0.236 | 0.330 | 0.184 | 0.289 | 0.196 | 0.285 | 0.257 | 0.355 | 0.201 | 0.315 | |
336 | 0.185 | 0.270 | 0.178 | 0.269 | 0.215 | 0.305 | 0.246 | 0.337 | 0.249 | 0.344 | 0.198 | 0.300 | 0.209 | 0.301 | 0.269 | 0.369 | 0.214 | 0.329 | |
720 | 0.218 | 0.294 | 0.225 | 0.317 | 0.256 | 0.337 | 0.280 | 0.363 | 0.284 | 0.373 | 0.220 | 0.320 | 0.245 | 0.333 | 0.299 | 0.390 | 0.246 | 0.355 |
Models | Transformer Decoder Layer + Statistical Forecasting Layer | Transformer Decoder Layer | FEDformer | ||||
---|---|---|---|---|---|---|---|
Metrics | MSE | MAE | MSE | MAE | MSE | MAE | |
Exchange | 96 | 0.077 | 0.198 | 0.084 | 0.203 | 0.148 | 0.278 |
192 | 0.164 | 0.290 | 0.184 | 0.305 | 0.271 | 0.315 | |
336 | 0.317 | 0.409 | 0.334 | 0.418 | 0.460 | 0.427 | |
720 | 0.843 | 0.688 | 0.873 | 0.702 | 1.195 | 0.695 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hao, J.; Sun, Q. An Adaptive Learning Time Series Forecasting Model Based on Decoder Framework. Mathematics 2025, 13, 490. https://doi.org/10.3390/math13030490
Hao J, Sun Q. An Adaptive Learning Time Series Forecasting Model Based on Decoder Framework. Mathematics. 2025; 13(3):490. https://doi.org/10.3390/math13030490
Chicago/Turabian StyleHao, Jianlong, and Qiwei Sun. 2025. "An Adaptive Learning Time Series Forecasting Model Based on Decoder Framework" Mathematics 13, no. 3: 490. https://doi.org/10.3390/math13030490
APA StyleHao, J., & Sun, Q. (2025). An Adaptive Learning Time Series Forecasting Model Based on Decoder Framework. Mathematics, 13(3), 490. https://doi.org/10.3390/math13030490