WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction
Abstract
:1. Introduction
- 1.
- Innovative model architecture: WindFormer utilizes a transformer-based architecture adapted for multivariate time series prediction. This model intricately processes and integrates multiple meteorological data streams like temperature, humidity, and power to capture their complex spatiotemporal dynamics with wind speed.
- 2.
- Robust training strategy: Our approach combines unsupervised pre-training with multitask fine-tuning. Initially, WindFormer learns general feature representations from extensive unlabeled time series data, significantly enhancing the fine-tuning using labeled wind speed data.
- 3.
- Exceptional predictive performance: Comparative assessments using multiple public datasets show that WindFormer significantly outperforms existing statistical and deep learning models in short-term wind speed prediction, confirming its superiority.
2. Related Work
3. Methods
3.1. Problem Definition
3.2. Model Architecture
3.2.1. Temporal Encoder
3.2.2. Temporal and Spatial Embeddings
3.2.3. Transformer Encoder
3.3. Neural Tokenizer Training
- Neural Tokenizer. This tokenizer transforms wind speed data into discrete symbols. It uses a neural codebook with K discrete symbols of dimension D, represented as . For each data segment , the tokenizer encodes it into chunk representations , where N is the number of chunks. Each is then mapped to the nearest vector in the codebook via the following:
- Wind speed prediction. Rather than employing traditional Fourier transforms, this advanced method leverages high-resolution temporal data chunks to predict future wind speeds:
- Advanced neural decoder training. The decoder, implemented within a transformer architecture specifically optimized for temporal sequences, is meticulously trained to predict the quantified features from the processed data chunks. This approach enhances the model’s ability to generalize across different temporal dynamics, thereby improving prediction accuracy.
- Training objective with enhanced metrics. The comprehensive training of the neural tokenizer and prediction model focuses on minimizing a mean squared error (MSE) loss, meticulously designed to enhance model performance:
3.4. Pre-Training Module
4. Experiments
4.1. Baseline Models for Time Series Forecasting
4.1.1. Fourier-Enhanced Dformer
4.1.2. Autoformer
4.1.3. Informer
4.1.4. Pyraformer
4.2. Performance Metrics’ Introduction
4.3. Datasets
4.3.1. ERA5 Reanalysis Data
4.3.2. NOAA’s Integrated Surface Data (ISD)
4.3.3. Wind Integration National Dataset Toolkit (WIND Toolkit)
4.4. Main Results
4.5. Efficiency Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Vaninsky, A. Efficiency of electric power generation in the United States: Analysis and forecast based on data envelopment analysis. Energy Econ. 2006, 28, 326–338. [Google Scholar] [CrossRef]
- Li, Z.F.; Li, J.H.; Wu, J.Z.; Chong, K.L.; Wang, B.F.; Zhou, Q.; Liu, Y.L. Numerical simulation of flow instability induced by a fixed cylinder placed near a plane wall in oscillating flow. Ocean. Eng. 2023, 288, 116115. [Google Scholar] [CrossRef]
- Li, J.; Wang, B.; Qiu, X.; Wu, J.; Zhou, Q.; Fu, S.; Liu, Y. Three-dimensional vortex dynamics and transitional flow induced by a circular cylinder placed near a plane wall with small gap ratios. J. Fluid Mech. 2022, 953, A2. [Google Scholar] [CrossRef]
- Meng, W.S.; Zhao, C.B.; Wu, J.Z.; Wang, B.F.; Zhou, Q.; Chong, K.L. Simulation of flow and debris migration in extreme ultraviolet source vessel. Phys. Fluids 2024, 36, 023322. [Google Scholar] [CrossRef]
- Masini, R.P.; Medeiros, M.C.; Mendes, E.F. Machine learning advances for time series forecasting. J. Econ. Surv. 2023, 37, 76–111. [Google Scholar] [CrossRef]
- Torres, J.F.; Hadjout, D.; Sebaa, A.; Martínez-Álvarez, F.; Troncoso, A. Deep learning for time series forecasting: A survey. Big Data 2021, 9, 3–21. [Google Scholar] [CrossRef] [PubMed]
- Li, J.H.; Wang, B.F.; Qiu, X.; Zhou, Q.; Fu, S.X.; Liu, Y.L. Vortex dynamics and boundary layer transition in flow around a rectangular cylinder with different aspect ratios at medium Reynolds number. J. Fluid Mech. 2024, 982, A5. [Google Scholar] [CrossRef]
- Zhou, Q.; Lu, H.; Liu, B.; Zhong, B. Measurements of heat transport by turbulent Rayleigh-Bénard convection in rectangular cells of widely varying aspect ratios. Sci. China Physics, Mech. Astron. 2013, 56, 989–994. [Google Scholar] [CrossRef]
- Shen, Z.; Zhang, Y.; Lu, J.; Xu, J.; Xiao, G. A novel time series forecasting model with deep learning. Neurocomputing 2020, 396, 302–313. [Google Scholar] [CrossRef]
- Challu, C.; Olivares, K.G.; Oreshkin, B.N.; Ramirez, F.G.; Canseco, M.M.; Dubrawski, A. Nhits: Neural hierarchical interpolation for time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 6989–6997. [Google Scholar]
- Stankeviciute, K.; M Alaa, A.; van der Schaar, M. Conformal time-series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 6216–6228. [Google Scholar]
- Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 23–27 August 2020; pp. 753–763. [Google Scholar]
- Livieris, I.E.; Pintelas, E.; Pintelas, P. A CNN–LSTM model for gold price time-series forecasting. Neural Comput. Appl. 2020, 32, 17351–17360. [Google Scholar] [CrossRef]
- Gasparin, A.; Lukovic, S.; Alippi, C. Deep learning for time series forecasting: The electric load case. CAAI Trans. Intell. Technol. 2022, 7, 1–25. [Google Scholar] [CrossRef]
- Du, S.; Li, T.; Yang, Y.; Horng, S.J. Multivariate time series forecasting via attention-based encoder–decoder framework. Neurocomputing 2020, 388, 269–279. [Google Scholar] [CrossRef]
- Fan, C.; Zhang, Y.; Pan, Y.; Li, X.; Zhang, C.; Yuan, R.; Wu, D.; Wang, W.; Pei, J.; Huang, H. Multi-horizon time series forecasting with temporal attention learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2527–2535. [Google Scholar]
- Elsworth, S.; Güttel, S. Time series forecasting using LSTM networks: A symbolic approach. arXiv 2020, arXiv:2003.05672. [Google Scholar]
- Le Guen, V.; Thome, N. Shape and time distortion loss for training deep time series forecasting models. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Lara-Benítez, P.; Carranza-García, M.; Luna-Romera, J.M.; Riquelme, J.C. Temporal convolutional networks applied to energy-related time series forecasting. Appl. Sci. 2020, 10, 2322. [Google Scholar] [CrossRef]
- Cirstea, R.G.; Yang, B.; Guo, C.; Kieu, T.; Pan, S. Towards spatio-temporal aware traffic time series forecasting. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 2900–2913. [Google Scholar]
- Bose, M.; Mali, K. Designing fuzzy time series forecasting models: A survey. Int. J. Approx. Reason. 2019, 111, 78–99. [Google Scholar] [CrossRef]
- Sahoo, B.B.; Jha, R.; Singh, A.; Kumar, D. Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting. Acta Geophys. 2019, 67, 1471–1481. [Google Scholar] [CrossRef]
- Kurle, R.; Rangapuram, S.S.; de Bézenac, E.; Günnemann, S.; Gasthaus, J. Deep rao-blackwellised particle filters for time series forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 15371–15382. [Google Scholar]
- Hajirahimi, Z.; Khashei, M. Hybrid structures in time series modeling and forecasting: A review. Eng. Appl. Artif. Intell. 2019, 86, 83–106. [Google Scholar] [CrossRef]
- Godahewa, R.; Bandara, K.; Webb, G.I.; Smyl, S.; Bergmeir, C. Ensembles of localised models for time series forecasting. Knowl.-Based Syst. 2021, 233, 107518. [Google Scholar] [CrossRef]
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning. PMLR, Baltimore, MA, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
- Sirisha, U.M.; Belavagi, M.C.; Attigeri, G. Profit prediction using Arima, Sarima and LSTM models in time series forecasting: A Comparison. IEEE Access 2022, 10, 124715–124727. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.X.; Yan, X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural Inf. Process. Syst. 2019, 32, 5243–5253. [Google Scholar]
- Cao, D.; Wang, Y.; Duan, J.; Zhang, C.; Zhu, X.; Huang, C.; Tong, Y.; Xu, B.; Bai, J.; Tong, J.; et al. Spectral temporal graph neural network for multivariate time-series forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 17766–17778. [Google Scholar]
- Zhao, C.B.; Wu, J.Z.; Wang, B.F.; Chang, T.; Zhou, Q.; Chong, K.L. Human body heat shapes the pattern of indoor disease transmission. Phys. Fluids 2024, 36, 035149. [Google Scholar] [CrossRef]
- Kumar, A.; Raghunathan, A.; Jones, R.; Ma, T.; Liang, P. Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv 2022, arXiv:2202.10054. [Google Scholar]
- Eldele, E.; Ragab, M.; Chen, Z.; Wu, M.; Kwoh, C.K.; Li, X.; Guan, C. Time-series representation learning via temporal and contextual contrasting. arXiv 2021, arXiv:2106.14112. [Google Scholar]
- Kim, T.; Kim, J.; Tae, Y.; Park, C.; Choi, J.H.; Choo, J. Reversible instance normalization for accurate time-series forecasting against distribution shift. In Proceedings of the International Conference on Learning Representations, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef]
- Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11121–11128. [Google Scholar]
- Zerveas, G.; Jayaraman, S.; Patel, D.; Bhamidipaty, A.; Eickhoff, C. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 2114–2124. [Google Scholar]
- Wu, H.; He, Z.; Zhang, W.; Hu, Y.; Wu, Y.; Yue, Y. Multi-class text classification model based on weighted word vector and bilstm-attention optimization. In Proceedings of the Intelligent Computing Theories and Application: 17th International Conference, ICIC 2021, Shenzhen, China, 12–15 August 2021; pp. 393–400. [Google Scholar]
- Wu, H.; Xion, W.; Xu, F.; Luo, X.; Chen, C.; Hua, X.S.; Wang, H. PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction. arXiv 2023, arXiv:2305.11421. [Google Scholar]
- Xu, F.; Wang, N.; Wu, H.; Wen, X.; Zhao, X. Revisiting Graph-based Fraud Detection in Sight of Heterophily and Spectrum. arXiv 2023, arXiv:2312.06441. [Google Scholar] [CrossRef]
- Xu, F.; Wang, N.; Wen, X.; Gao, M.; Guo, C.; Zhao, X. Few-shot Message-Enhanced Contrastive Learning for Graph Anomaly Detection. arXiv 2023, arXiv:2311.10370. [Google Scholar]
- Xu, F.; Wang, N.; Zhao, X. Exploring Global and Local Information for Anomaly Detection with Normal Samples. arXiv 2023, arXiv:2306.02025. [Google Scholar]
- Wang, H.; Wu, H.; Sun, J.; Zhang, S.; Chen, C.; Hua, X.S.; Luo, X. IDEA: An Invariant Perspective for Efficient Domain Adaptive Image Retrieval. In Proceedings of the Thirty-Seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
Hyperparameter | Description | Value |
---|---|---|
Learning rate | Step size for weight updates | 0.001 |
Batch size | Number of samples per training batch | 64 |
Epochs | Number of complete passes through data | 50 |
Dropout rate | Probability of dropping neurons | 0.1 |
Transformer layers | Number of transformer layers | 4 |
Heads | Number of attention heads | 8 |
Embedding dimension | Dimension of feature embeddings | 256 |
Weight initialization | Method for initializing weights | Xavier |
Optimizer | Algorithm for adjusting model weights | Adam |
Methods | WindFormer | FEDformer | Autoformer | Informer | Pyraformer | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Metric | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MAE | MAE | |
ERA5 | 96 | 0.126 | 0.186 | 0.157 | 0.209 | 0.254 | 0.306 | 0.366 | 0.431 | 0.442 | 0.502 |
192 | 0.144 | 0.247 | 0.181 | 0.231 | 0.278 | 0.335 | 0.387 | 0.438 | 0.488 | 0.532 | |
336 | 0.167 | 0.285 | 0.218 | 0.298 | 0.317 | 0.364 | 0.413 | 0.458 | 0.520 | 0.561 | |
720 | 0.183 | 0.337 | 0.244 | 0.381 | 0.351 | 0.401 | 0.444 | 0.495 | 0.548 | 0.600 | |
ISD | 96 | 0.202 | 0.318 | 0.267 | 0.333 | 0.386 | 0.459 | 0.524 | 0.592 | 0.671 | 0.735 |
192 | 0.239 | 0.350 | 0.296 | 0.367 | 0.412 | 0.484 | 0.557 | 0.624 | 0.702 | 0.769 | |
336 | 0.261 | 0.378 | 0.324 | 0.391 | 0.445 | 0.514 | 0.588 | 0.652 | 0.725 | 0.793 | |
720 | 0.278 | 0.413 | 0.361 | 0.409 | 0.475 | 0.543 | 0.620 | 0.681 | 0.749 | 0.821 | |
WIND | 96 | 0.278 | 0.324 | 0.318 | 0.389 | 0.457 | 0.522 | 0.591 | 0.665 | 0.734 | 0.802 |
192 | 0.345 | 0.363 | 0.326 | 0.419 | 0.489 | 0.555 | 0.625 | 0.696 | 0.759 | 0.836 | |
336 | 0.330 | 0.409 | 0.375 | 0.450 | 0.510 | 0.583 | 0.655 | 0.728 | 0.795 | 0.868 | |
720 | 0.357 | 0.434 | 0.408 | 0.475 | 0.547 | 0.616 | 0.685 | 0.752 | 0.821 | 0.896 |
Method | Training Time (h) | Parameters (Millions) | FLOPs (Billions) |
---|---|---|---|
WindFormer | 4.5 | 12.3 | 34.2 |
NHits | 5.2 | 13.1 | 36.5 |
GNN | 6.0 | 14.7 | 40.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qiu, X.; Li, Y.; Li, J.-H.; Wang, B.-F.; Liu, Y.-L. WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction. Appl. Sci. 2024, 14, 6741. https://doi.org/10.3390/app14156741
Qiu X, Li Y, Li J-H, Wang B-F, Liu Y-L. WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction. Applied Sciences. 2024; 14(15):6741. https://doi.org/10.3390/app14156741
Chicago/Turabian StyleQiu, Xiang, Yang Li, Jia-Hua Li, Bo-Fu Wang, and Yu-Lu Liu. 2024. "WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction" Applied Sciences 14, no. 15: 6741. https://doi.org/10.3390/app14156741
APA StyleQiu, X., Li, Y., Li, J.-H., Wang, B.-F., & Liu, Y.-L. (2024). WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction. Applied Sciences, 14(15), 6741. https://doi.org/10.3390/app14156741