WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction

Qiu, Xiang; Li, Yang; Li, Jia-Hua; Wang, Bo-Fu; Liu, Yu-Lu

doi:10.3390/app14156741

Open AccessArticle

WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction

by

Xiang Qiu

^1,2,

Yang Li

³,

Jia-Hua Li

³,

Bo-Fu Wang

^2,* and

Yu-Lu Liu

^1,2

¹

College of Science, Shanghai Institute of Technology, Shanghai 201418, China

²

Shanghai Institute of Applied Mathematics and Mechanics, Shanghai Frontiers Science Base for Mechanoinfomatic, School of Mechanics and Engineering Science, Shanghai University, Shanghai 200072, China

³

College of Urban Construction and Safety Engineering, Shanghai Institute of Technology, Shanghai 201418, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(15), 6741; https://doi.org/10.3390/app14156741

Submission received: 14 June 2024 / Revised: 12 July 2024 / Accepted: 15 July 2024 / Published: 1 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we introduce WindFormer, an innovative transformer-based model engineered for short-term wind speed forecasting, leveraging multivariate time series data. Unlike traditional approaches, WindFormer excels in processing diverse meteorological features—temperature, humidity, and power—to intricately map their spatiotemporal interdependencies with wind speeds. Utilizing a novel unsupervised pre-training strategy, WindFormer initially learns from vast volumes of unlabeled data to capture generalized feature representations. This foundation enhances the subsequent fine-tuning phase on labeled wind speed data, in which our model demonstrates exceptional predictive accuracy. Empirical evaluations across various public datasets illustrate that WindFormer markedly surpasses both conventional statistical models and contemporary deep learning techniques. The model not only achieves superior accuracy in forecasting wind speeds but also reveals a significant enhancement in handling complex spatiotemporal data dynamics. These advancements facilitate more effective wind farm management and power grid scheduling, making a substantial impact on operational efficiencies and renewable energy utilization. Our findings confirm the robustness of WindFormer in a real-world setting, underscoring its potential as a pivotal tool in meteorological and energy sectors. The integration of unsupervised pre-training with multi-task fine-tuning establishes a new benchmark for short-term wind speed prediction.

Keywords:

WindFormer; short-term wind speed forecasting; transformer-based based model; spatiotemporal dependencies; unsupervised pre-training

1. Introduction

Accurate short-term wind speed prediction is critical for applications in wind energy generation [1,2], aerospace [3,4], meteorological warnings [5], and agriculture [6,7]. Despite progress, the inherent complexity and unpredictability of wind speed variations continue to challenge the efficacy of existing prediction models [8,9].

Traditional methods, primarily reliant on physical or statistical models, demonstrate significant limitations, particularly in handling large-scale, multivariate datasets. These conventional approaches often fail to capture the intricate patterns of wind variability that are crucial for precise predictions. This research directly addresses these shortcomings by introducing a novel transformer-based approach, the WindFormer model [10], which is specifically designed to manage the dynamic nature of wind speed data with revolutionary efficiency and accuracy.

Recent advancements in deep learning have substantially improved the precision of time series forecasting. Techniques employing neural networks, such as convolutional neural networks (CNNs) [11] and recurrent neural networks (RNNs) [12], have shown remarkable prowess in wind speed predictions. However, these models often cater to specific datasets and lack the flexibility needed for generalizations across diverse tasks and conditions. Moreover, with the evolution of sensor technology and the expansion of data collection methods [13], the accumulation of rich multivariate time series data—encompassing variables like temperature, humidity, and power—presents new hurdles for traditional forecasting models [14].

To navigate these complexities, we have developed the Wind Speed Transformer model (WindFormer), an architecture tailored to the precise prediction of short-term wind speeds. WindFormer employs unsupervised pre-training to construct robust representations of time series data, demonstrating superior performance in various subsequent tasks. The model has been adapted to process multivariate time series, treating different variables such as temperature, humidity, and power as distinct channels. It utilizes the transformer’s self-attention mechanism to adeptly capture the intricate spatiotemporal interdependencies that influence fluctuations in wind speed.

In our approach, we segment multivariate time series data into fixed-length time windows and normalize the segments within each window. Preliminary temporal features for each variable are extracted using convolutional layers, and these features are then embedded into a high-dimensional space. The transformer encoder subsequently processes these embedded features to elucidate the spatiotemporal relationships among them. The prediction head of the model then generates forecasts based on these encoded features.

Experimental results confirm that the methodology underpinning WindFormer significantly outperforms both traditional statistical methods and contemporary deep learning models in short-term wind speed forecasting across various public datasets [15]. This study not only validates the effectiveness of WindFormer in handling complex multivariate time series data but also paves the way for innovative approaches and methodologies in future research within the domain of wind speed forecasting. The key contributions of our work are outlined below:

1.: Innovative model architecture: WindFormer utilizes a transformer-based architecture adapted for multivariate time series prediction. This model intricately processes and integrates multiple meteorological data streams like temperature, humidity, and power to capture their complex spatiotemporal dynamics with wind speed.
2.: Robust training strategy: Our approach combines unsupervised pre-training with multitask fine-tuning. Initially, WindFormer learns general feature representations from extensive unlabeled time series data, significantly enhancing the fine-tuning using labeled wind speed data.
3.: Exceptional predictive performance: Comparative assessments using multiple public datasets show that WindFormer significantly outperforms existing statistical and deep learning models in short-term wind speed prediction, confirming its superiority.

These contributions not only prove WindFormer’s effectiveness in handling complex multivariate time series data but also lay the groundwork for innovative approaches in the future wind speed prediction field.

2. Related Work

The capability to predict wind speeds accurately over short periods is crucial for various applications such as meteorology [16], wind energy production [17], and aviation safety [18]. Traditional methods for predicting wind speeds include statistical models and increasingly complex machine learning techniques. Traditional statistical approaches like the autoregressive integrated moving average (ARIMA) model [19] and the Kalman filter [20] utilize historical wind data to forecast future conditions. Although these models perform adequately for straightforward time series, they often falter when confronted with complex, multivariate datasets due to their limited ability to capture dynamic interactions between multiple variables [21].

With the evolution of machine learning, neural network-based methods such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have become prominent due to their proficiency in handling time series data. CNNs excel in extracting localized features, while RNNs are adept at discerning long-term dependencies [22]. Despite their strengths, these models require intricate feature extraction and fusion techniques to manage multivariate series, which often complicates the models’ architecture and hampers generalization and training efficiency [23].

The transformative impact of the transformer architecture, [24] initially popularized within the field of natural language processing, has extended to time series forecasting. Its self-attention mechanism facilitates the capture of global dependencies within sequences, providing a robust alternative to traditional sequence modeling techniques such as RNNs and CNNs. Transformer-based models, including Informer and Reformer, have showcased exceptional capabilities in handling long-sequence predictions [25,26]. Despite their advancements, these models have predominantly been tailored to single-feature time series forecasting, and their potential for handling complex, multivariate datasets has not been fully realized.

This research introduces the WindFormer model, a novel approach that integrates convolutional layers with a transformer encoder to masterfully manage multivariate time series data. By harnessing both convolutional and transformer architectures, WindFormer captures intricate spatiotemporal dependencies among various environmental features and wind speeds with unprecedented accuracy [27]. This model not only bridges the gap in existing transformer applications but also innovates model architecture by incorporating unsupervised pre-training, followed by multi-task fine-tuning. These techniques collectively enhance the model’s generalization capabilities and elevate its predictive performance, surpassing existing transformer-based models in the realm of complex, multivariate time series forecasting.

3. Methods

3.1. Problem Definition

This manuscript explores the short-term prediction of wind speeds [28,29] employing multivariate time series data. We specifically utilize a dataset comprising variables like temperature, humidity, and power [30,31] to forecast upcoming wind speeds. The mathematical formulation of this problem is as follows:

Consider the input multivariate time series data defined as follows:

X = {x_{1}, x_{2}, \dots, x_{T}}

(1)

where each

x_{t} \in R^{d}

denotes the recorded values across d features at time step t, including temperature, humidity, and power.

Our objective is to predict future values of wind speed over the next h time steps:

\hat{y} = {{\hat{y}}_{T + 1}, {\hat{y}}_{T + 2}, \dots, {\hat{y}}_{T + h}}

(2)

Based on historical data up to T time steps, the model seeks a function f that maps the following:

\hat{y} = f (X)

(3)

A model is trained to optimize the likelihood function across a training dataset, defined as

D = {(X^{(i)}, y^{(i)})}_{i = 1}^{N}

, where

y^{(i)}

represents the actual wind speeds for the i-th instance. The goal is to maximize the following log-likelihood function:

L (θ) = \sum_{i = 1}^{N} log p (y^{(i)} | X^{(i)}; θ)

(4)

where

θ

symbolizes the model parameters, and

p (y | X; θ)

is the probability of observing the sequence

y

, given the sequence

X

, under these parameters. Assuming a Gaussian distribution for prediction errors,

y^{(i)} = f (X^{(i)}; θ) + ϵ

(5)

with

ϵ \sim N (0, σ^{2})

, the log-likelihood can be reformulated as follows:

L (θ) = - \frac{N}{2} log (2 π σ^{2}) - \frac{1}{2 σ^{2}} \sum_{i = 1}^{N} {∥ y^{(i)} - f (X^{(i)}; θ) ∥}^{2}

(6)

Thus, maximizing

L (θ)

corresponds to minimizing the mean squared error (MSE) loss function, defined as follows:

L_{M S E} (θ) = \frac{1}{N} \sum_{i = 1}^{N} {∥ y^{(i)} - f (X^{(i)}; θ) ∥}^{2}

(7)

Minimizing

L_{M S E} (θ)

enables us to fine-tune the parameters

θ

for accurate predictions of future wind speeds.

3.2. Model Architecture

We introduce WindFormer, a neural transformer tailored to decoding multichannel wind speed data from various sensors. Its design can be adapted to any number of timestamps and channels, a feature crucial for accommodating different measurement setups and conditions [32]. A distinctive aspect of WindFormer is its use of signal chunking—a technique inspired by block-embedding methods prevalent in image processing—which optimizes the handling of input data. This approach not only improves processing efficiency but also enhances the model’s ability to capture essential spatiotemporal dynamics from complex datasets. The details of WindFormer are illustrated in Figure 1.

Each wind speed data point with a timestamp, t, and a stride, s, forms a matrix, X, that is segmented into

⌊\frac{T - t}{s}⌋ + 1

parts, each

x \in R^{C \times t}

. We segment each channel using a non-overlapping window of length w, resulting in the following:

x = {x_{c i j, k} \in R^{w} | j = 1, 2, \dots, C, k = 1, 2, \dots, ⌊\frac{t}{w}⌋} .

(8)

3.2.1. Temporal Encoder

Given the high temporal variability of wind speeds [35], it is essential to encode temporal features effectively before integrating block-level interactions. We employ a temporal encoder, comprising multiple convolutional blocks, to convert each chunk into dense feature embeddings. Each block [36] consists of a 1D convolution, group normalization [11], and a GELU activation [12]. The encoder outputs a series of embeddings,

e = {e_{c i j, k} \in R^{d} | j = 1, 2, \dots, C, k = 1, 2, \dots, ⌊\frac{t}{w}⌋}

.

3.2.2. Temporal and Spatial Embeddings

To enhance the model’s ability to interpret both temporal and spatial aspects of wind data, we initialize lists of temporal embeddings,

T E = {t e_{1}, t e_{2}, \dots, t e_{t_{m a x}}}

, and spatial embeddings,

S E = {s e_{1}, s e_{2}, \dots, s e_{C}}

, both with dimension d. These embeddings are added to the output of the temporal encoder to enrich each chunk’s representation:

{e_{c i j, k} + t e_{k} + s e_{i j} | j = 1, 2, \dots, C, k = 1, 2, \dots, ⌊\frac{t}{w}⌋}

(9)

3.2.3. Transformer Encoder

The transformer encoder processes the enhanced sequences directly. To optimize training stability and efficiency, we apply layer normalization to queries and keys before attention computation, mitigating large attention score values and enhancing generalization across different wind datasets.

Attention (Q, K, V) = softmax (\frac{LN (Q) LN {(K)}^{T}}{\sqrt{d_{h e a d}}}) V

(10)

For downstream applications, average pooling is applied to the outputs, and the model is adapted with task-specific heads for various time-domain wind speed prediction tasks.

3.3. Neural Tokenizer Training

Further, the schematic diagram of the Neural Tokenizer training and WindFormer pre-training process are demonstrated in Figure 2. To prepare WindFormer for wind speed forecasting, we tokenize wind speed series into discrete symbols efficiently. We use a neural spectral tokenizer originally developed for signal analysis and adapt it for wind speed time series.

Neural Tokenizer. This tokenizer transforms wind speed data into discrete symbols. It uses a neural codebook with K discrete symbols of dimension D, represented as $V = {v_{i}}_{i = 1}^{K}$ . For each data segment $x$ , the tokenizer encodes it into chunk representations $p = {p_{i}}_{i = 1}^{N}$ , where N is the number of chunks. Each $p_{i}$ is then mapped to the nearest vector in the codebook via the following:

$z_{i} = \underset{v_{j} \in V}{arg min} {∥ p_{i} - v_{j} ∥}^{2} .$

(11)
Wind speed prediction. Rather than employing traditional Fourier transforms, this advanced method leverages high-resolution temporal data chunks $x_{c, k} = [x [1], x [2], \dots, x [w]]$ to predict future wind speeds:

${\tilde{x}}_{c, k} = \sum_{n = 1}^{N} x [n] \cdot exp (- 2 π i \frac{m n}{N}),$

(12)

where m represents the number of future time steps intended for prediction. This formulation allows us to capture both the linear and non-linear dependencies that characterize wind speed variations over time.

To further refine our prediction capabilities, we compute the magnitude and phase of each

{\tilde{x}}_{c, k}

. These complex features are crucial, as they encapsulate the fundamental frequency components of wind speeds, providing a robust basis for training our neural decoder:

Advanced neural decoder training. The decoder, implemented within a transformer architecture specifically optimized for temporal sequences, is meticulously trained to predict the quantified features from the processed data chunks. This approach enhances the model’s ability to generalize across different temporal dynamics, thereby improving prediction accuracy.
Training objective with enhanced metrics. The comprehensive training of the neural tokenizer and prediction model focuses on minimizing a mean squared error (MSE) loss, meticulously designed to enhance model performance:

$L_{T} = \sum_{x \in D} \sum_{i = 1}^{N} (∥ o A_{i} - A_{i} ∥^{2} + {∥ o Φ_{i} - Φ_{i} ∥}^{2}),$

(13)

where $o A_{i}$ and $o Φ_{i}$ represent the magnitudes and phases predicted via the neural decoder, respectively, and $A_{i}$ and $Φ_{i}$ are the actual values derived from the discrete Fourier transformation of the wind speed data. This loss function not only drives the fidelity of the magnitude and phase predictions but also ensures that the model captures the essential dynamics critical for accurate short-term forecasting.

This sophisticated method integrates the neural tokenizer training with explicit objectives for short-term wind speed prediction. By transforming historical data into discrete, interpretable symbols, the model learns to effectively forecast future wind conditions, marking a significant advancement in predictive meteorology and computational modeling.

3.4. Pre-Training Module

In the context of time series prediction, we adopt a masked modeling approach suitable for continuous data streams for pre-training [26]. Similar to neural tokenizer training, we mask portions of the time series data and train the model to predict the masked segments [37], leveraging the temporal dynamics encoded via the model. The architecture is similar to that used in Section 3.2, but here, the focus is on capturing and predicting future values in the following sequence:

p (v^{'} | e_{M}) = softmax (Linear (h)),

(14)

where

e_{M}

represents the embeddings of the masked input, and h is the output of the transformer encoder. The training objective remains to predict the original values of the masked segments, thus fostering a model capable of effective generalization across various prediction tasks:

L_{M} = - \sum_{x \in D} \sum_{m_{i} = 1} log p (v_{i} | e_{M}),

(15)

where

m_{i}

designates a segment as masked. This method not only enables the model to process incomplete data effectively but also enhances its ability to forecast future outcomes from available data segments, improving its utility in real-world forecasting scenarios.

4. Experiments

4.1. Baseline Models for Time Series Forecasting

4.1.1. Fourier-Enhanced Dformer

The FEDformer [34] is an innovative deep learning model that integrates the concept of Fourier transformation with the transformer architecture to process time series data. This model was specifically designed to concentrate on the periodic characteristics of time series data. It boosts the capability of detecting long-term dependencies and cyclical patterns using the Fourier transformation technique. The FEDformer is particularly effective for time series forecasting tasks characterized by notable periodicity and extensive long-term dependencies.

4.1.2. Autoformer

Autoformer [32] is an autoregressive transformer model that utilizes an autoregressive mechanism to capture the long-term dependencies in time series. Compared to traditional transformer models, the Autoformer is more efficient in design because it reduces computational complexity through its autoregressive attention mechanism. This efficiency makes Autoformer particularly effective in handling long time series, especially demonstrating strong performance in areas such as energy load forecasting.

4.1.3. Informer

Informer [23] is a model specifically designed for long-sequence time series forecasting. Its core innovation is the introduction of a new attention mechanism called ”ProbSparse Self-attention”, which effectively reduces computational complexity while retaining the capacity to capture long-term dependencies. The Informer has shown remarkable results in tasks that require precise long-term forecasting, such as meteorological predictions and stock market analysis.

4.1.4. Pyraformer

Pyraformer [22] is a time series forecasting model that employs a pyramidal structure. It captures dependencies across different time granularities through a multi-scale attention mechanism and extracts multi-level features of time series via a layer-by-layer aggregation approach. This structural design allows Pyraformer to process long-term time series data effectively while maintaining low computational costs.

Hyperparameter settings are shown in Table 1.

4.2. Performance Metrics’ Introduction

In this research, we adopted two key metrics to assess the accuracy of the electric power prediction model: the mean squared error (MSE) [24] and the mean absolute error (MAE) [37]. These indicators are commonly used in the field of power system forecasting, and they offer a robust evaluation of the discrepancies between predicted outputs and actual observations.

The mean squared error (MSE), a standard metric in forecasting, calculates the average squared deviations between forecasted and true values. This error metric is formulated as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2},

(16)

where

Y_{i}

is the true value for the

i th

observation,

{\hat{Y}}_{i}

is the predicted value, and n represents the total number of observations. A lower MSE value indicates higher predictive accuracy of the model.

The mean absolute error (MAE) calculates the mean of the absolute differences between the forecasts and the actual figures. Unlike MSE, the MAE is less affected by outliers. The calculation formula is as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | Y_{i} - {\hat{Y}}_{i} | .

(17)

In this context,

Y_{i}

and

{\hat{Y}}_{i}

are the true and predicted values, respectively, and n denotes the number of observations. A smaller MAE indicates superior model performance. Evaluating both MSE and MAE allows for a thorough assessment of the model’s capabilities in electric power forecasting, with MSE focusing on squared error magnitudes and MAE reflecting the magnitude of errors directly, together providing a robust basis for model comparison and assessment.

4.3. Datasets

4.3.1. ERA5 Reanalysis Data

ERA5, provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) [38], is a high-resolution reanalysis dataset and the successor to ERA-Interim [39]. ERA5 [40] offers global atmospheric, land, and ocean weather data from 1979 to the present, meeting various meteorological and climate research needs. The data include variables such as surface temperature, precipitation, surface pressure, wind speed, direction, and humidity, with a spatial resolution of 0.25 degrees by 0.25 degrees and an hourly temporal resolution. It features long-term records from 1979 to the present, covering diverse meteorological variables for the atmosphere, land, and oceans. Data can be downloaded from ECMWF’s Climate Data Store (CDS) [41] and customized as needed: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview (accessed on 12 June 2023).

4.3.2. NOAA’s Integrated Surface Data (ISD)

ISD, provided by the National Oceanic and Atmospheric Administration (NOAA) [42], is a global surface meteorological observation dataset. It aggregates detailed historical meteorological data from thousands of weather stations worldwide. The data include variables such as temperature, pressure, wind speed, wind direction, precipitation, and relative humidity, covering 1901 to the present, with global spatial coverage and hourly and daily temporal resolutions. Its features include extensive global station coverage, long-term records suitable for climate change and trend analysis, and detailed hourly observations for short-term weather forecasting and research. Data can be accessed and downloaded through NOAA’s National Centers for Environmental Information (NCEI): https://www.ncdc.noaa.gov/isd (accessed on 11 October 2023).

4.3.3. Wind Integration National Dataset Toolkit (WIND Toolkit)

The WIND Toolkit, developed by the National Renewable Common Energy Laboratory (NREL) [43], offers a comprehensive wind speed dataset with a high resolution, specifically tailored to supporting renewable energy studies and the development of wind power projects across North America. This dataset encompasses a variety of measurements, including wind speed, wind direction, temperature, and atmospheric pressure. It spans the years 2007 to 2013, provides data with a spatial resolution of 2 km × 2 km, and records every 5 min for temporal resolution. It is ideal for wind energy resource assessment and short-term wind speed prediction, offering high-quality, reliable data covering various meteorological variables needed for wind energy research. Data can be accessed and downloaded through NREL’s WIND Toolkit website, with options for customized datasets: https://www.nrel.gov/grid/wind-toolkit.html (accessed on 15 October 2023).

4.4. Main Results

In this study, as shown in Table 2, we compared the performance of five different prediction methods (WindFormer, FEDformer, Autoformer, Informer, and Pyraformer) using the ERA5, ISD, and WIND datasets. The table presents the MSE and MAE of these methods for four different prediction time steps (96, 192, 336, and 720).

The data show that WindFormer consistently achieved the best performance across all datasets and time steps, with the lowest MSE and MAE values. For example, with the ERA5 dataset and a 96-step prediction, WindFormer had an MSE of 0.126, significantly lower than FEDformer’s 0.157 and Autoformer’s 0.254. Additionally, WindFormer demonstrated a clear advantage with the ISD and WIND datasets, especially for longer prediction steps (720 steps), with MSE values of 0.278 and 0.357, respectively, outperforming other methods.

The findings underscore WindFormer’s enhanced predictive accuracy and robust generalization capabilities across complex time series prediction tasks. This superior performance is largely attributable to its innovative architecture and adept handling of spatiotemporal dynamics, which distinguish it from conventional models. Although alternative methods exhibit certain strengths under specific scenarios, they consistently fall short of WindFormer’s overall effectiveness.

In summary, WindFormer emerges as the superior choice in our comparative analysis, setting a new standard for time series prediction tasks. The demonstrated efficacy of WindFormer provides critical insights, highlighting its potential as a foundational model for advancing the field of time series analysis. These results not only underscore the model’s immediate practical benefits but also pave the way for ongoing enhancements and broader applications in time series forecasting.

In this paper, we compare the performance of five methods (WindFormer, FEDformer, Autoformer, Informer, and Pyraformer) using the ERA5, ISD, and WIND datasets. The table shows their mean squared error (MSE) values for different prediction time steps (96, 192, 336, and 720).

Figure 3 demonstrates the models’ performance across different wind farms via simulation, which shows the suitability and robustness of WindFormer. The results shown in Figure 4 shows that WindFormer achieves the best performance across all datasets and time steps, with the lowest MSE values, indicating its superior prediction accuracy. For example, with the ERA5 dataset, WindFormer had an MSE of 0.126 for the 96-step prediction, significantly lower than FEDformer’s 0.157 and Autoformer’s 0.254. Additionally, WindFormer maintained the lowest MSE values with the ISD and WIND datasets, demonstrating its stability and superiority under various conditions.

These findings indicate that WindFormer has a strong generalization ability and robustness in handling complex time series prediction tasks. Its superior performance may be attributed to its unique architecture and effective capture of spatiotemporal features. In contrast, while the other methods show certain advantages, they do not outperform WindFormer in most cases.

In summary, WindFormer demonstrated the best predictive performance in this comparison, making it a recommended method for similar time series prediction tasks. This result also provides valuable insights for the further improvement and optimization of time series prediction models.

Figure 5 shows the short-term wind speed prediction results for the first week of January 2023 using the WindFormer model with three datasets (ERA5, ISD, and WIND Toolkit). ERA 5 Dataset:The actual wind speed (blue line) and predicted wind speed (red dashed line) closely match. The model captured the fluctuations and trends of wind speed well, indicating good predictive performance with the ERA5 dataset. The actual wind speed (green line) and predicted wind speed (orange dashed line) generally followed the same trend, though there were some deviations during periods of rapid change. The model slightly underperforms in capturing sharp wind speed changes but still maintains overall accuracy. WIND Toolkit Dataset: The actual wind speed (purple line) and predicted wind speed (brown dashed line) show high variability. While the model tracked the general trend of actual wind speed, there were some errors in predicting the peaks and troughs.

The analysis of short-term wind speed predictions across these three datasets demonstrates that the WindFormer model consistently reflects the actual wind speed trends. However, the characteristics of different datasets led to certain prediction biases, especially during periods of rapid wind speed changes. This suggests that, in wind speed prediction, both dataset selection and model adaptability require further optimization and research.

4.5. Efficiency Analysis

Table 3 presents a comparative analysis of three methods in terms of training time, parameter count, and computational complexity (FLOPs). The WindFormer method showed a significant advantage in training time, requiring only 4.5 h, while the NHits and GNN methods required 5.2 h and 6.0 h, respectively. This indicates that WindFormer is more time-efficient than the other two methods. In terms of parameter count, WindFormer has 12.3 million parameters, which is fewer than NHits’ 13.1 million and GNN’s 14.7 million. This suggests that WindFormer not only excels in training time but also has a relatively smaller model size, potentially reducing memory usage and enhancing computational efficiency. Regarding computational complexity (FLOPs), WindFormer has 3.42 billion floating-point operations, compared to NHits’ 3.65 billion and GNN’s 4.08 billion. WindFormer again demonstrates a lower computational cost. This result further supports WindFormer’s advantage in computational efficiency. In conclusion, WindFormer exhibits superior efficiency in its training time, parameter count, and computational complexity, suggesting its potential advantages in practical applications. Future research can further explore WindFormer’ s performance in different tasks and with different datasets to comprehensively evaluate its potential and application value.

5. Conclusions

In this study, we introduced WindFormer, an innovative transformer-based model for predicting multivariate time series, tailored for short-term wind speed forecasting. WindFormer not only fills critical technological voids by providing robust handling of multivariate, dynamic time series data but also sets a new benchmark in the predictive accuracy of short-term wind forecasts, thus significantly contributing to the fields of meteorology and renewable energy management. Our contributions are threefold. Firstly, we designed the WindFormer model to adeptly handle complex spatiotemporal relationships among wind speed and ancillary data like temperature, humidity, and power, thus enhancing the accuracy of wind speed forecasts. Secondly, we adopted an unsupervised pre-training strategy to extract general feature representations from extensive unlabeled time series data, subsequently refining the model with labeled data specific to wind speed forecasting. This method exploits the vast quantities of available time series data to boost the model’s performance in targeted tasks. Lastly, our thorough validation across various public datasets has shown that WindFormer surpasses both traditional statistical techniques and current deep-learning approaches in forecasting wind speed over short intervals, confirming the efficacy and superior capability of our proposed model.

Author Contributions

Writing—original draft, X.Q., Y.L., J.-H.L. and Y.-L.L.; Writing—review & editing, B.-F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant Nos. 12032016, 12372277 and 12372220).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Vaninsky, A. Efficiency of electric power generation in the United States: Analysis and forecast based on data envelopment analysis. Energy Econ. 2006, 28, 326–338. [Google Scholar] [CrossRef]
Li, Z.F.; Li, J.H.; Wu, J.Z.; Chong, K.L.; Wang, B.F.; Zhou, Q.; Liu, Y.L. Numerical simulation of flow instability induced by a fixed cylinder placed near a plane wall in oscillating flow. Ocean. Eng. 2023, 288, 116115. [Google Scholar] [CrossRef]
Li, J.; Wang, B.; Qiu, X.; Wu, J.; Zhou, Q.; Fu, S.; Liu, Y. Three-dimensional vortex dynamics and transitional flow induced by a circular cylinder placed near a plane wall with small gap ratios. J. Fluid Mech. 2022, 953, A2. [Google Scholar] [CrossRef]
Meng, W.S.; Zhao, C.B.; Wu, J.Z.; Wang, B.F.; Zhou, Q.; Chong, K.L. Simulation of flow and debris migration in extreme ultraviolet source vessel. Phys. Fluids 2024, 36, 023322. [Google Scholar] [CrossRef]
Masini, R.P.; Medeiros, M.C.; Mendes, E.F. Machine learning advances for time series forecasting. J. Econ. Surv. 2023, 37, 76–111. [Google Scholar] [CrossRef]
Torres, J.F.; Hadjout, D.; Sebaa, A.; Martínez-Álvarez, F.; Troncoso, A. Deep learning for time series forecasting: A survey. Big Data 2021, 9, 3–21. [Google Scholar] [CrossRef] [PubMed]
Li, J.H.; Wang, B.F.; Qiu, X.; Zhou, Q.; Fu, S.X.; Liu, Y.L. Vortex dynamics and boundary layer transition in flow around a rectangular cylinder with different aspect ratios at medium Reynolds number. J. Fluid Mech. 2024, 982, A5. [Google Scholar] [CrossRef]
Zhou, Q.; Lu, H.; Liu, B.; Zhong, B. Measurements of heat transport by turbulent Rayleigh-Bénard convection in rectangular cells of widely varying aspect ratios. Sci. China Physics, Mech. Astron. 2013, 56, 989–994. [Google Scholar] [CrossRef]
Shen, Z.; Zhang, Y.; Lu, J.; Xu, J.; Xiao, G. A novel time series forecasting model with deep learning. Neurocomputing 2020, 396, 302–313. [Google Scholar] [CrossRef]
Challu, C.; Olivares, K.G.; Oreshkin, B.N.; Ramirez, F.G.; Canseco, M.M.; Dubrawski, A. Nhits: Neural hierarchical interpolation for time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 6989–6997. [Google Scholar]
Stankeviciute, K.; M Alaa, A.; van der Schaar, M. Conformal time-series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 6216–6228. [Google Scholar]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 23–27 August 2020; pp. 753–763. [Google Scholar]
Livieris, I.E.; Pintelas, E.; Pintelas, P. A CNN–LSTM model for gold price time-series forecasting. Neural Comput. Appl. 2020, 32, 17351–17360. [Google Scholar] [CrossRef]
Gasparin, A.; Lukovic, S.; Alippi, C. Deep learning for time series forecasting: The electric load case. CAAI Trans. Intell. Technol. 2022, 7, 1–25. [Google Scholar] [CrossRef]
Du, S.; Li, T.; Yang, Y.; Horng, S.J. Multivariate time series forecasting via attention-based encoder–decoder framework. Neurocomputing 2020, 388, 269–279. [Google Scholar] [CrossRef]
Fan, C.; Zhang, Y.; Pan, Y.; Li, X.; Zhang, C.; Yuan, R.; Wu, D.; Wang, W.; Pei, J.; Huang, H. Multi-horizon time series forecasting with temporal attention learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2527–2535. [Google Scholar]
Elsworth, S.; Güttel, S. Time series forecasting using LSTM networks: A symbolic approach. arXiv 2020, arXiv:2003.05672. [Google Scholar]
Le Guen, V.; Thome, N. Shape and time distortion loss for training deep time series forecasting models. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Lara-Benítez, P.; Carranza-García, M.; Luna-Romera, J.M.; Riquelme, J.C. Temporal convolutional networks applied to energy-related time series forecasting. Appl. Sci. 2020, 10, 2322. [Google Scholar] [CrossRef]
Cirstea, R.G.; Yang, B.; Guo, C.; Kieu, T.; Pan, S. Towards spatio-temporal aware traffic time series forecasting. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 2900–2913. [Google Scholar]
Bose, M.; Mali, K. Designing fuzzy time series forecasting models: A survey. Int. J. Approx. Reason. 2019, 111, 78–99. [Google Scholar] [CrossRef]
Sahoo, B.B.; Jha, R.; Singh, A.; Kumar, D. Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting. Acta Geophys. 2019, 67, 1471–1481. [Google Scholar] [CrossRef]
Kurle, R.; Rangapuram, S.S.; de Bézenac, E.; Günnemann, S.; Gasthaus, J. Deep rao-blackwellised particle filters for time series forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 15371–15382. [Google Scholar]
Hajirahimi, Z.; Khashei, M. Hybrid structures in time series modeling and forecasting: A review. Eng. Appl. Artif. Intell. 2019, 86, 83–106. [Google Scholar] [CrossRef]
Godahewa, R.; Bandara, K.; Webb, G.I.; Smyl, S.; Bergmeir, C. Ensembles of localised models for time series forecasting. Knowl.-Based Syst. 2021, 233, 107518. [Google Scholar] [CrossRef]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning. PMLR, Baltimore, MA, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
Sirisha, U.M.; Belavagi, M.C.; Attigeri, G. Profit prediction using Arima, Sarima and LSTM models in time series forecasting: A Comparison. IEEE Access 2022, 10, 124715–124727. [Google Scholar] [CrossRef]
Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar] [CrossRef]
Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.X.; Yan, X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural Inf. Process. Syst. 2019, 32, 5243–5253. [Google Scholar]
Cao, D.; Wang, Y.; Duan, J.; Zhang, C.; Zhu, X.; Huang, C.; Tong, Y.; Xu, B.; Bai, J.; Tong, J.; et al. Spectral temporal graph neural network for multivariate time-series forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 17766–17778. [Google Scholar]
Zhao, C.B.; Wu, J.Z.; Wang, B.F.; Chang, T.; Zhou, Q.; Chong, K.L. Human body heat shapes the pattern of indoor disease transmission. Phys. Fluids 2024, 36, 035149. [Google Scholar] [CrossRef]
Kumar, A.; Raghunathan, A.; Jones, R.; Ma, T.; Liang, P. Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv 2022, arXiv:2202.10054. [Google Scholar]
Eldele, E.; Ragab, M.; Chen, Z.; Wu, M.; Kwoh, C.K.; Li, X.; Guan, C. Time-series representation learning via temporal and contextual contrasting. arXiv 2021, arXiv:2106.14112. [Google Scholar]
Kim, T.; Kim, J.; Tae, Y.; Park, C.; Choi, J.H.; Choo, J. Reversible instance normalization for accurate time-series forecasting against distribution shift. In Proceedings of the International Conference on Learning Representations, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11121–11128. [Google Scholar]
Zerveas, G.; Jayaraman, S.; Patel, D.; Bhamidipaty, A.; Eickhoff, C. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 2114–2124. [Google Scholar]
Wu, H.; He, Z.; Zhang, W.; Hu, Y.; Wu, Y.; Yue, Y. Multi-class text classification model based on weighted word vector and bilstm-attention optimization. In Proceedings of the Intelligent Computing Theories and Application: 17th International Conference, ICIC 2021, Shenzhen, China, 12–15 August 2021; pp. 393–400. [Google Scholar]
Wu, H.; Xion, W.; Xu, F.; Luo, X.; Chen, C.; Hua, X.S.; Wang, H. PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction. arXiv 2023, arXiv:2305.11421. [Google Scholar]
Xu, F.; Wang, N.; Wu, H.; Wen, X.; Zhao, X. Revisiting Graph-based Fraud Detection in Sight of Heterophily and Spectrum. arXiv 2023, arXiv:2312.06441. [Google Scholar] [CrossRef]
Xu, F.; Wang, N.; Wen, X.; Gao, M.; Guo, C.; Zhao, X. Few-shot Message-Enhanced Contrastive Learning for Graph Anomaly Detection. arXiv 2023, arXiv:2311.10370. [Google Scholar]
Xu, F.; Wang, N.; Zhao, X. Exploring Global and Local Information for Anomaly Detection with Normal Samples. arXiv 2023, arXiv:2306.02025. [Google Scholar]
Wang, H.; Wu, H.; Sun, J.; Zhang, S.; Chen, C.; Hua, X.S.; Luo, X. IDEA: An Invariant Perspective for Efficient Domain Adaptive Image Retrieval. In Proceedings of the Thirty-Seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]

Figure 1. Overall architecture of WindFormer, a neural transformer for short-term wind speed prediction. Input wind speed data are segmented into fixed-length chunks via a time window [33]. These chunks are processed via a time encoder to extract relevant temporal features. Temporal and spatial embeddings enhance the data representation to capture crucial time and spatial dynamics [34]. The transformer encoder processes these embeddings to produce the prediction output.

Figure 2. Neural Tokenizer training and WindFormer pre-training process. Top: the neural tokenizer is trained to transform wind speed signals into discrete neural tokens by reconstructing their Fourier spectra. Bottom: during pre-training, some data segments are masked with the aim of predicting these masked tokens using the unmasked segments.

Figure 3. Model performance across different wind farms via simulation.

Figure 4. This figure shows the MSE values of WindFormer, FEDformer, Autoformer, Informer, and Pyraformer with the ERA5, ISD, and WIND datasets for different prediction time steps (96, 192, 336, and 720). WindFormer maintained a leading position across all metrics.

Figure 5. This figure highlights short-term wind speed predictions for the first week of January 2023 using the WindFormer model across three datasets: ERA5, ISD, and WIND Toolkit.

Table 1. Hyperparameter settings.

Hyperparameter	Description	Value
Learning rate	Step size for weight updates	0.001
Batch size	Number of samples per training batch	64
Epochs	Number of complete passes through data	50
Dropout rate	Probability of dropping neurons	0.1
Transformer layers	Number of transformer layers	4
Heads	Number of attention heads	8
Embedding dimension	Dimension of feature embeddings	256
Weight initialization	Method for initializing weights	Xavier
Optimizer	Algorithm for adjusting model weights	Adam

Table 2. Performance comparison results.

Methods		WindFormer		FEDformer		Autoformer		Informer		Pyraformer
Metric		MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MAE	MAE
ERA5	96	0.126	0.186	0.157	0.209	0.254	0.306	0.366	0.431	0.442	0.502
	192	0.144	0.247	0.181	0.231	0.278	0.335	0.387	0.438	0.488	0.532
	336	0.167	0.285	0.218	0.298	0.317	0.364	0.413	0.458	0.520	0.561
	720	0.183	0.337	0.244	0.381	0.351	0.401	0.444	0.495	0.548	0.600
ISD	96	0.202	0.318	0.267	0.333	0.386	0.459	0.524	0.592	0.671	0.735
	192	0.239	0.350	0.296	0.367	0.412	0.484	0.557	0.624	0.702	0.769
	336	0.261	0.378	0.324	0.391	0.445	0.514	0.588	0.652	0.725	0.793
	720	0.278	0.413	0.361	0.409	0.475	0.543	0.620	0.681	0.749	0.821
WIND	96	0.278	0.324	0.318	0.389	0.457	0.522	0.591	0.665	0.734	0.802
	192	0.345	0.363	0.326	0.419	0.489	0.555	0.625	0.696	0.759	0.836
	336	0.330	0.409	0.375	0.450	0.510	0.583	0.655	0.728	0.795	0.868
	720	0.357	0.434	0.408	0.475	0.547	0.616	0.685	0.752	0.821	0.896

Table 3. Training time and computational complexity comparison.

Method	Training Time (h)	Parameters (Millions)	FLOPs (Billions)
WindFormer	4.5	12.3	34.2
NHits	5.2	13.1	36.5
GNN	6.0	14.7	40.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, X.; Li, Y.; Li, J.-H.; Wang, B.-F.; Liu, Y.-L. WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction. Appl. Sci. 2024, 14, 6741. https://doi.org/10.3390/app14156741

AMA Style

Qiu X, Li Y, Li J-H, Wang B-F, Liu Y-L. WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction. Applied Sciences. 2024; 14(15):6741. https://doi.org/10.3390/app14156741

Chicago/Turabian Style

Qiu, Xiang, Yang Li, Jia-Hua Li, Bo-Fu Wang, and Yu-Lu Liu. 2024. "WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction" Applied Sciences 14, no. 15: 6741. https://doi.org/10.3390/app14156741

APA Style

Qiu, X., Li, Y., Li, J.-H., Wang, B.-F., & Liu, Y.-L. (2024). WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction. Applied Sciences, 14(15), 6741. https://doi.org/10.3390/app14156741

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

WindFormer: Learning Generic Representations for Short-Term Wind Speed Prediction

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Problem Definition

3.2. Model Architecture

3.2.1. Temporal Encoder

3.2.2. Temporal and Spatial Embeddings

3.2.3. Transformer Encoder

3.3. Neural Tokenizer Training

3.4. Pre-Training Module

4. Experiments

4.1. Baseline Models for Time Series Forecasting

4.1.1. Fourier-Enhanced Dformer

4.1.2. Autoformer

4.1.3. Informer

4.1.4. Pyraformer

4.2. Performance Metrics’ Introduction

4.3. Datasets

4.3.1. ERA5 Reanalysis Data

4.3.2. NOAA’s Integrated Surface Data (ISD)

4.3.3. Wind Integration National Dataset Toolkit (WIND Toolkit)

4.4. Main Results

4.5. Efficiency Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI