Wind Speed Forecasting with Differentially Evolved Minimum-Bandwidth Filters and Gated Recurrent Units

Sivhugwana, Khathutshelo Steven; Ranganai, Edmore

doi:10.3390/forecast7020027

Open AccessArticle

Wind Speed Forecasting with Differentially Evolved Minimum-Bandwidth Filters and Gated Recurrent Units

by

Khathutshelo Steven Sivhugwana

^* and

Edmore Ranganai

Department of Statistics, University of South Africa, Florida Campus, Johannesburg 1709, South Africa

^*

Author to whom correspondence should be addressed.

Forecasting 2025, 7(2), 27; https://doi.org/10.3390/forecast7020027

Submission received: 14 April 2025 / Revised: 26 May 2025 / Accepted: 3 June 2025 / Published: 10 June 2025

(This article belongs to the Special Issue Feature Papers of Forecasting 2025)

Download

Browse Figures

Versions Notes

Abstract

:

Wind data are often cyclostationary due to cyclic variations, non-constant variance resulting from fluctuating weather conditions, and structural breaks due to transient behaviour (due to wind gusts and turbulence), resulting in unreliable wind power supply. In wavelet hybrid forecasting, wind prediction accuracy depends heavily on the decomposition level (

L

) and the wavelet filter technique selected. Hence, we examined the efficacy of wind predictions as a function of

L

and wavelet filters. In the proposed hybrid approach, differential evolution (DE) optimises the decomposition level of various wavelet filters (i.e., least asymmetric (LA), Daubechies (DB), and Morris minimum-bandwidth (MB)) using the maximal overlap discrete wavelet transform (MODWT), allowing for the decomposition of wind data into more statistically sound sub-signals. These sub-signals are used as inputs into the gated recurrent unit (GRU) to accurately capture wind speed. The final predicted values are obtained by reconciling the sub-signal predictions using multiresolution analysis (MRA) to form wavelet-MODWT-GRUs. Using wind data from three Wind Atlas South Africa (WASA) locations, Alexander Bay, Humansdorp, and Jozini, the root mean square error, mean absolute error, coefficient of determination, probability integral transform, pinball loss, and Dawid-Sebastiani showed that the MB-MODWT-GRU at

L = 3

was best across the three locations.

Keywords:

wind speed; wind forecasting; MODWT; differential evolution; GRU; Morris minimum bandwidth

1. Introduction

1.1. Research Motivation

Wind power is clean and environmentally friendly. Furthermore, wind power has multiple economic and societal advantages [1,2,3,4]. For instance, wind power is economical, sustainable, and inexhaustible [1,2,3,4,5]. In fact, wind power resources are abundant, and can be captured day and night (when solar energy is unavailable). Consequently, there has been a rapid increase in the volume of wind power penetrating existing electric power grids. Provided that the primary and most significant input or resource to wind power, namely, wind speed, is highly complex and unpredictable, the integration of substantial amounts of wind power into the power grid frequently compromises wind power management strategies [5].

The complex nature of wind speed originates from the fact that wind as a physical quantity is dependent on a variety of complex factors, such as atmospheric pressure fluctuations, topography changes, seasonal variations, elevation above ground level, weather patterns, and land formations, such that it is irregular and variable in both location and time scale [3,4]. Furthermore, wind data display cyclostationarity due to unsteady cyclic variations, non-constant variance from changing weather phenomena, and structural breaks reflecting transient wind behaviour [6], which could be handled (or extracted) through the proper utilisation of denoising approaches [7,8] such as wavelet transforms (WTs).

Theoretically, wind power increases eightfold when wind speed doubles. However, in practice, wind turbine output is constrained beyond certain speed limits (i.e., cut-out speed) [4]. As a result, errors in estimating wind speed can lead to significant fluctuations in pricing and missed investment opportunities in the energy markets [8,9]. Therefore, studying the availability of wind and the resulting generation of wind power is essential for informed management and investment decisions in electricity markets. Overall, handling wind speed data, a multi-dimensional (encompassing nonlinear and linear behaviour) and ever-changing physical phenomenon, using a single model often leads to inaccurate and unreliable predictions.

1.2. Literature Highlights

It is pivotal to use models that predict wind speed with high accuracy to improve and properly regulate wind power output. Although physical models can effectively predict atmospheric dynamics, they require the use of large amounts of numerical weather data, hence the need for longer training time to correlate various datasets [9,10]. On the other hand, statistical models (e.g., autoregressive integrated moving average (ARIMA)) are generally reserved for capturing short-time and linear data characteristics, as they are insufficient for capturing nonlinear ones and longer forecast horizons [10]. The advancement of technology in recent years has resulted in machine learning methods (which are mostly data-greedy and black-boxed), dominating both statistical and physical methods in tuning speed, scalability, and accuracy when handling high-variant, nonlinear stochastic, and non-stationary wind speed sequences, as well as achieving robustness and efficiency, and thus attracting considerable interest.

Hybrid models combine more than one forecasting method to form a new one, thereby addressing the deficiencies of each single model by harnessing each model’s strengths [6,11]. The high precision, accuracy, and, to some extent, processing power of hybrid techniques have garnered significant attention from researchers in recent years. Additionally, the combined approaches are excellent at handling and overcoming frequent statistical, computational, and representational drawbacks in the forecasting arena [9,11]. In short-term wind speed forecasting, the authors of [12] combined artificial neural networks (ANNs) with WT to develop WT-ANN. The proposed approach using the Daubechies 4 (DB4) filter demonstrated superior wind forecasting power compared to the model without wavelet transformation. In a similar approach to that of [12], the authors of [13] used the discrete wavelet transform-ANN (DWT-ANN) (DB4) to predict wind speed on a short-term horizon with high accuracy. In the work of [14], the authors propose WD-NILA-WRF, a combination of wavelet decomposition (WD) and weighted random forest (RF) (WRF), based on the niche immune lion algorithm (NILA) for ultra-short wind speed forecasting. Based on 10 min interval wind speed data, the work of [15] combined WT (DB3), ARIMA, and machine learning algorithms (support vector regression (SVR) and RF), which showed promising wind speed forecasting results. According to the authors of [10], the repeated WT-ARIMA (RWT-ARIMA) approach based on (DB2) improved very short-term wind speed forecasting significantly. In the work of [4], similar results were obtained through the application of least asymmetric 8 (LA8) maximal overlap discrete wavelet transform (MODWT), ARIMA, gradient-boosting decision trees (GBDTs), and SVR for short-term wind speed forecasting. The authors of [16] also proposed an effective hybrid strategy that combines (DB4) WT, particle swam optimisation (PSO), and an adaptive neuroma-fuzzy inference system (ANFIS) to significantly and effectively reduce wind power prediction errors on a short-term forecast scale. The work of [17] (DB4) and [18] (DB7) combined DWT with long short-term memory (LSTM) networks for short-term wind speed forecasting. Despite the lengthy processing time for LSTM, the results showed that WT preprocessing significantly improved forecast accuracy. In a similar study, the work of [19] enhanced short-term wind speed predictions using a hybrid approach that combines LA8, MODWT, sample entropy, neural network autoregression (NNAR), LSTM and gradient boosting machine (GBM). Although highly accurate and robust, LSTMs were found to be computationally expensive.

None of the studies reviewed above provided adequate rationale for selecting the specific type of wavelet filter and decomposition level. Additionally, most studies used a single-wavelet filter (especially DB) in the data pre-processing phase, with very scant usage of Morris minimum bandwidth (MB). Aside from [19], the examined research papers hardly explored MODWT, or the strategy used to determine the level of decomposition in detail. Rather than using a reproducible mathematical approach, the level of decomposition is often determined by trial and error (see e.g., [10,12,13]).

In wind speed forecasting, ref. [20] applied wavelet filters (such as DB and LA) to various forecasting models (i.e., persistence, neural networks (NNs), fuzzy logic, and regression) across different decomposition levels. Additionally, the study discarded high-variant detailed frequency components, only forecasting the original signal from smooth approximation sub-signal. It was found that the decomposition level (optimised using a genetic algorithm (GA)) significantly influenced the results, more so than the choice of wavelet filter. In a similar work, the authors of [21] proposed a GA-optimised wavelet neural network (WNN) for day-ahead wind speed forecasting, which outperformed traditional wavelet-based forecasting techniques in precision. The aforementioned studies used simple statistical models that are prone to nonlinearity, or conventional ANNs that are susceptible to vanishing gradients and overfitting (see e.g., [12,13,20,21]).

The application of differential evolution (DE) as a metaheuristic search algorithm for the optimisation of wavelet decomposition levels in wind speed forecasting is seldom (or never) addressed in the existing literature compared to GA. Furthermore, most of the studies (see e.g., [12,13]) used the conventional DWT, which (though effective) is shift-variant such that miniature changes in the signal affect wavelet coefficients compromising their ability to effectively undertake time series tasks as compared to the more effective and time-invariant MODWT (see [19]). In general, there is scant literature on the choice of wavelet methods and decomposition levels used in wind forecasting, compromising the comparability and reproducibility of the results.

Hence, this study proposes a hybrid wavelet-MODWT-GRU approach to effectively assess the influence of the wavelet filter and decomposition level on the accuracy and reliability of wind predictions. This approach leverages the strengths of MODWT, which effectively denoises highly variable signals and exposes transient wind behaviour, along with DE, recognised for its strong global search capabilities, fast convergence, and simplicity. Thus, our proposed approach uses DE to optimise the decomposition levels of the DB, LA, and MB wavelet filters. This enables the efficient decomposition of wind speed signals into frequency components with improved statistical characteristics, facilitating the detection of cyclic changes in the mean and variance using MODWT. Each sub-signal from the MODWT decomposition is individually forecasted using the GRU model to effectively capture the complex behaviour of wind speed. The final predicted value is derived by reconstructing the original signal using the inverse MODWT method based on a multiresolution analysis (MRA) technique that incorporates each sub-signal forecast as input. We compared the performance of three wavelet-MODWT-GRU models, LA-MODWT-GRU, DB-MODWT-GRU, and MB-MODWT-GRU, using 10 min averaged high-resolution wind speed data from Alexander Bay, Humansdorp, and Jozini stations from the Wind Atlas South Africa (WASA) (https://wasadata.csir.co.za/wasa1/WASAData) (accessed on 20 September 2022/5 June 2023). We further evaluated the efficacy of the best wavelet-MODWT-GRU among the three models against the GRU model and the benchmark naïve model, using deterministic and probabilistic predictions.

1.3. Innovations

The novelty and originality of the proposed ensemble method are postulated on the basis that wind data are characterised by cyclostationarity due to cyclic variations, non-constant variance resulting from fluctuating weather conditions, and structural breaks that reflect transient wind behaviour [6,7]. As such it cannot be effectively and efficiently characterised by one single class of model.

Unlike Fourier transforms (FT), the proper selection of wavelets can accurately identify structural breaks, thereby enhancing the accuracy and reliability of wind speed forecasting [7]. However, the prior application of the wavelet-based classes of hybrid models focused on prediction accuracy, disregarding significant elements of wavelets specifically the selection of the decomposition level and type of wavelet filter [8], which ultimately and holistically influence the level of accuracy and reliability of the wind predictive model [12,20,21].

Deterministic prediction metrics (i.e., root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE)) have been the top priority in the wind forecasting literature [10,14,15,16,20,21], and are constantly being refined to make them more accurate. However, in extreme and uncertain conditions, these forecasts cannot completely capture the chaotic nature (i.e., wind gusts and turbulence) of wind speed. Consequently, the sporadic gap between observed and predicted values can pose risks to wind power market scenarios. Thus, our motivations are summarised as follows:

We choose wavelets because they efficiently manage complex wind data by decomposing signals into different frequency components, allowing for the detection of complex patterns. In essence, we used wavelets to accurately identify structural breaks, thereby enhancing the accuracy and reliability of the wind speed forecasting model.
Instead of using the time-variant DWT, we opted for MODWT, which is time-invariant such that subseries signals have the same coefficients even if the signal is shifted [3,8,19]. Plus, these methods maintain the full resolution of the original signal as it does not apply a decimation, thereby enhancing the modelling and forecasting of sub-signals. We, therefore, used the MODWT technique to decompose the original wind speed data signal into detailed and approximate frequency sub-signals with reduced complexity than the original sub-signal.
The selection of the most suitable wavelet filter depends on the specific problem being addressed. In contrast with conventional DB4 and LA8, we also applied MB filters exhibiting excellent frequency localisation, narrow bandwidths reducing spectral leakage, and the effective isolation and filtering of noise outside a particular frequency band [22].
GA results are reliable and consistent, but these algorithms require significant computational resources and are typically characterised by a slow convergence rate. DE differs from GA in that it can solve complex optimisation problems and optimise non-differentiable and nonlinear continuous functions with high convergence speed, making it well suited to forecasting wind speeds [23,24,25,26]. We utilised the latter stochastic algorithms, which are easy to understand and converge quickly, to find the optimal level of decomposition for the wavelet filter applied.
Often, wind speed (as a physical quantity) presents itself at particular locations in both linear and nonlinear forms, which compromises the performance of linear models in the forecasting arena. GRUs have the ability to capture varying patterns of variance over time which is typical of wind data. By leveraging GRU’s simplicity, accuracy and computational efficiency (to some extent), we modelled and predicted each sub-signal to such a level of accuracy and precision.
Also, most reviewed studies focused on short-term forecasts (see e.g., [10,12,13,14,15,16,17,18,19,20]), ignoring the medium-to-long-term forecasts that are critical to wind turbine maintenance and wind farm construction. We evaluated probabilistic or distributional forecast measures in both medium and long-term wind speed forecasting using pinball loss (PL), Dawid–Sebastiani (DS), and probability integral transform (PIT).
The practicability and efficacy of the proposed forecasting model were confirmed empirically via prediction metrics. Also, this work was conducted in a way that is reliable and easy to replicate.

1.4. Structure of the Study

The rest of the paper is structured as follows. The Materials and Methods are presented in Section 2. The Discussion of the results and Conclusions are provided in Section 3 and Section 4, respectively.

2. Materials and Methods

2.1. Fundamentals of Wavelet Analysis

A wavelet is a compact wave that can adjust its amplitude and width within a fixed time frame [3,27,28,29,30,31]. Unlike FTs, which focus primarily on sine waves, wavelet analysis utilises a broader range of functions. Its distinctive properties make it well-suited to various problems, allowing for customisation of parameters based on specific requirements [3,30,31]. In fact, wavelet analysis is capable of localised and MRA, enabling the identification of trends, structural breaks, and discontinuities that may be overlooked by other methods such as FT and short-time Fourier transform (STFT) [3,27,31]. Furthermore, wavelets’ support for MRA provides a greater flexibility advantage over STFT and FT. In addition, wavelets can analyse frequency and time simultaneously, unlike FTs, which can only analyse either time or frequency [3,18,21]. In essence, wavelets are actually real-valued mathematical functions defined in real space such that they satisfy two main fundamental properties [3], namely

\int_{- \infty}^{\infty} ψ (t) d t = 0,

(1)

\int_{- \infty}^{\infty} {|ψ (t)|}^{2} d t = 1 .

(2)

Wavelets are widely employed in discrete DWT, but also in continuous WT (CWT). The CWT and DWT are, respectively, given by the following equations:

C W T (a, b) = \frac{1}{\sqrt{| a |}} \int_{- \infty}^{\infty} y_{t} ψ^{*} (\frac{t - b}{a}) d t,

(3)

D W T (j, k) = \frac{1}{\sqrt{| 2^{j} |}} \int_{- \infty}^{\infty} y_{t} ψ^{*} (\frac{t - {k 2}^{j}}{2^{j}}) d t,

(4)

where

ψ^{*} (a, b)

denotes the conjugate of

ψ (a, b) = \frac{1}{\sqrt{| a |}} ψ (\frac{t - b}{a})

with

a > 0

being a scaling factor and

b

a time-shifting parameter. The current study will focus on the application of the MODWT in the context of wind speed forecasting.

2.1.1. Wavelet Filters

DB wavelets are orthogonal; however, they are not symmetrical [28,29,30,31,32]. The scaling functions for DB wavelets have regularity properties such that it has compact support, and can accurately decompose and recompose signal functions [28,31]. Symlets (e.g., least asymmetric) are also orthogonal and compactly supported wavelets, which were proposed by [28] as modifications to the conventional DB. This filter offers symmetry and regularity beyond that of traditional DB wavelets, which facilitates signal reconstruction with minimal signal distortion [28,31]. In fact, their high regularity and symmetrical properties enable precise feature extraction while minimising noise [28,29,30,31]. The MB wavelets minimise frequency bandwidth while retaining their localisation properties, making them suitable for signal denoising and data extraction. Furthermore, this wavelet filter is less susceptible to spectral leakages in frequency domain applications [22] (also see Table 1).

Overall, the DB filter is ideal for signals with sharp transitions, but might be less effective for high-frequency analysis. The LA filter offers a balanced option for mixed signals, while the MB filter excels in high-frequency analysis and minimises spectral leakage, making it particularly useful for wind speed forecasting.

2.1.2. Multiresolution Analysis

In MRA, a signal is broken down into components with better statistical properties, which are then combined to reconstruct the original signal. The advantage of MRA is that features can be localised in both time and frequency. In wind speed predictions, this helps identify transient wind behaviour such as wind gusts, thereby improving overall model accuracy and precision. The MRA equation for the wind speed signal

{{y}_{t}}

is given by [28,31,35]

y_{t} = \sum_{n \in Z} A_{J, n} ϕ_{J, n} (t) + \sum_{j = J}^{\infty} \sum_{n \in Z} d_{j, n} ψ_{j, n} (t)

(5)

where

A_{J, n}

is approximate or scaling coefficients at the resolution level

j

, and

d_{j, n}

denotes the detailed coefficients for each level

j

and position

n

. As a result of this decomposition, a structured view of a signal is gained, which allows for efficient compression, denoising, and feature extraction [28,31]. An essential component of the MRA technique is the ability to efficiently decompose and reconstruct the original signal.

2.1.3. Maximal Overlap Discrete Wavelet Transform

MODWT is a version of DWT [7,19,33,34,35]. Similar to conventional DWT, these methods utilise both MRA and pyramid algorithms to decompose signals into approximate and detailed sub-signals of varying scales with more exposed trends and patterns [33,34,35]. While DWTs allow for signal-perfect reconstruction, they decimate (downscale) the signal, resulting in sub-signals with smaller coefficients than the original signal [33]. Furthermore, these techniques are shift-variant in the sense that miniature changes in the signal affect wavelet coefficients, making them (to some extent) unfit for complex time series exercises such as wind speed forecasting. MODWTs are time-invariant and maintain the original signal’s full resolution because they do not use a decimation approach [7,33,34,35]. Different from conventional DWT, MODWT improves wavelet strength by altering the filter, not the signal, making it more resistant to boundary effects [33,34,35]. As such, MODWT has better statistical stability than DWT and can handle quasi-stationary signals such as wind speed data. Consider the time series signal

{{y}_{t}}

decomposed into several detailed signals {

d_{j, n}

} and an approximate signal {

A_{j, n}

}. Then, the MRA representation for

MODWT is given by (see [35] for details)

d_{j, n} = \sum_{l = 0}^{L_{j} - 1} \tilde{h_{l}} A_{j - 1, (n - 2^{j - 1} l) m o d N},

(6)

A_{j, n} = \sum_{l = 0}^{L_{j} - 1} \tilde{g_{l}} A_{j - 1, (n - 2^{j - 1} l) m o d N},

(7)

where

l \in Z^{+}; L

is the length of the wavelet filter;

n = 0,1, \dots, N - 1

;

{\tilde{g}}_{l} = \frac{g_{l}}{\sqrt{2}}

and

{\tilde{h}}_{l} = \frac{h_{l}}{\sqrt{2}}

, respectively, denote the scaling and wavelet filter coefficients of a band-pass filter (critical for energy conservation). The equations above must satisfy the following conditions (see [35] for details):

\sum_{l = 0}^{L - 1} {\tilde{g}}_{l} = 1, \sum_{l = 0}^{L - 1} {\tilde{h}}_{l} = 0,

(8)

\sum_{l = 0}^{L - 1} {\tilde{g}}_{l}^{2} = \frac{1}{2}, \sum_{l = 0}^{L - 1} {\tilde{h}}_{l}^{2} = \frac{1}{2},

(9)

\sum_{l = - \infty}^{+ \infty} {\tilde{g}}_{l} {\tilde{g}}_{l + 2 r} = 0, \sum_{l = - \infty}^{+ \infty} {\tilde{h}}_{l} {\tilde{h}}_{l + 2 r} = 0 .

(10)

This is a pivotal property for the perfect reconstruction of the signal. Using the inverse pyramid algorithm, the original signal can be reconstructed through the following mathematical expression [7,35]:

y_{n} = \sum_{l = 0}^{L - 1} \tilde{h_{l}} d_{j, n + 2^{j - 1} l m o d N} + \sum_{l = 0}^{L - 1} \tilde{g_{l}} A_{j, n + 2^{j - 1} l m o d N} .

(11)

Thus, the wavelet and scaling coefficients are calculated by cascading convolutions with filters

h_{j l}

and

g_{j l}

[35]. It should be noted that

E (d_{j, n}) = 0

at each resolution level

j

(see [35] for details).

2.2. Differential Evolution

DE is a stochastic algorithm that can be applied to complex optimisation problems without explicit adaptation to each particular problem [23]. Furthermore, DE is effective and efficient at minimising non-differentiable and possibly nonlinear continuous functions [23,24,25]. In this way, DE is more reliable, robust, efficient, accurate, and scalable than artificial bee colony (ABC), simulated annealing (SA), GA, and PSO (also see [23,24,25]). The four main steps involved in the DE algorithm are detailed below.

Step 1: Intialisation

Consider an objective function

F : y \subseteq R^{D} \to R

. Ideally, we seek to find a solution

{y_{j}}^{*} (j = 1,2, \dots, D) \in y

such that for all

y_{j} \in y

with the boundary constraints

L \leq y \leq U,

where

F ({y_{j}}^{*}) \neq - \infty

. DE uses population

V

which consists of potential solutions to explore a solution space. The DE starts by initialising the population such that the initial population

{R_{i j} \in V \subseteq R}^{D \times N P}

with random initialisation

L \leq R_{i, j} \leq U

given by [23,24]:

R_{i, j} = L + (U - L) {r a n d}_{i j} (0,1], j = 1, 2, . . ., D; i = 1, 2, . . ., N P,

(12)

where

L

and

U

, respectively, denote the lower and upper limit, and

{r a n d}_{i j} (0,1]

is a uniform distribution. In essence, each element or individual in the population is uniformly distributed within a multidimensional search space. By substituting superior individuals in the present population for every generation, the algorithm builds a new population. To reach the global minimum, the population will have to go through mutation, crossover, and selection processes [23,24,25,26].

Step 2: Mutation

The mutation is a biologically inspired terminology which enables the algorithm to effectively and efficiently effect random modification on the population by developing the so-called mutant vector for each individual in the population [23,24,25,26]. To form a mutant or donor vector (

Γ_{i}

), the three distinct population individuals are randomly selected from the population. Thereafter, the scaled differences (accounting for differential fluctuations) of the distances between any two of the three vectors are summed to the third one such that [24,25]

Γ_{i, g} = y_{r 1, g} + λ (y_{r 3, g} - y_{r 2, g}),

(13)

where

Γ_{i, g}

represents a mutant vector for the ith individual in the next generation (

g

); and

y_{r 1, g}, y_{r 2, g}, a n d y_{r 3 g}

are distinct individuals (i.e.,

r 1 \neq r 2 \neq r 3

)

\in (1, \dots, N P)

randomly and independently selected from the population generation

g

; and

λ

denotes the scaling factor. In the DE/rand/1 presented in Equation (13), each mutant step blends parameter sets between successful population individuals. As shown in Equation (14), most DE strategies apply pairs of difference vectors (see [24,25] for details):

u_{i} = y_{r 1} + λ_{1} (y_{r 2} - y_{r 3}) + λ_{2} (y_{r 4} - y_{r 5}) + \dots

(14)

where

λ = λ_{1} = \dots = λ_{k}

such that substituting the best vector (

y_{b e s t})

from the population will yield the DE strategy given by the equation below:

Γ_{i, g} = y_{b e s t, g} + λ (y_{1, g} - y_{2, g}) .

(15)

Using scaled difference vectors and a weighted average of the best and arbitrary vectors, DE mutant strategies can be generally represented by the following equation [23,24,25]:

Γ_{i, g} = ρ y_{b e s t, g} + (1 - ρ) y_{i, g} + \sum_{k = 1}^{k} y_{k} (y_{r 2 k - 1, g} - y_{r 2 k, g}) .

(16)

Step 3: Crossover

The crossover operator creates offspring by mixing components of the current element and those generated by mutation [24,25,26]. The elements of the mutant vector are transferred to the trial vector with a certain probability (

P_{c r})

. In essence, the binomial crossover parameter mixes either the characteristics of the mutant (

Γ_{i}

) and target vector (

y_{i}

) to develop a trial vector strategy (

U_{i}

) such that [23,24,25,26,36]

U_{i, j, g} = \{\begin{matrix} Γ_{i, j, g} i f r_{i j} \leq P_{c r} o r j = j_{r a n d}, \\ . \\ y_{i, j, g} o t h e r w i s e . \end{matrix}

(17)

where

i \in (1, \dots, N P)

r_{i j} = {r a n d}_{i j} (0,1]

are uniformly distributed random numbers generated for each

j \in (1, \dots, D)

and

j_{r a n d} \in (1, \dots, D)

, and

P_{c r} \in (0,1)

denotes the crossover probability initialised from the start. The parameter

j_{r a n d}

is vital as it assures that

Γ_{i, j, g} \neq y_{i, j, g}

. High

P_{c r}

encourages exploitation through a broader search space, whereas low

P_{c r}

emphasises refinement. In addition to ensuring escape from local maxima, crossing ensures that the algorithm is able to make bigger steps across the problem landscape [26,36,37].

Step 4: Selection

In the final step, the trial vector

U_{i, g}

is compared with the target vector

y_{i, g}

, and the vector with the optimal function value is propagated to the next generation [26,36,37] such that

y_{i, g + 1} = \{\begin{matrix} U_{i, g}, i f F (U_{i, g}) \leq F (y_{i, g}), \\ . \\ y_{i, g} o t h e r w i s e, \end{matrix}

(18)

where

F

is the objective function. Thus, if the new trial vector (i.e.,

F (U_{i, g})

) is the same or smaller than the target vector, it becomes the new target. If not, the target vector stays the same, thereby ensuring that the population remains the same or improves [23,24,25,26,36,37]. The mutation, crossover and selection carry on until some termination criteria are attained.

In the “DEoptim” package of the R program (version 4.4.1), we applied the DE/rand/1/bin strategy (discussed above) to optimise the decomposition level (also see [38]).

2.3. Gated Recurrent Unit

The GRU is a simpler and more efficient variant of recurrent neural networks (RNNs) designed for processing sequential data of variable length; it is not prone to the vanishing gradient problem that affects conventional neural networks [39,40,41]. Different from LSTMs, GRUs have a much simpler structural architecture. Provided that the parameters are kept the same, the work of [41] showed that GRUs could outperform LSTMs in terms of convergence, computational time, parameter updates, and generalisation on varying datasets. Comparable to LSTM, GRU interpolates the previous hidden state (

h_{t - 1}

) and candidate hidden state (

{\tilde{h}}_{t}

) to build an activation function such that [41]

h_{t} = Ω_{t} {⨂ h}_{t - 1} + (1 - Ω_{t}) ⨂ {\tilde{h}}_{t},

(19)

where

⨂

denotes element-wise multiplication, the update gate at

t

Ω_{t}

controls how much information from the hidden state should be retained and is given by the following expression:

Ω_{t} = σ (W_{Ω} y_{t} + U_{Ω} h_{t - 1}),

(20)

where

y_{t}

is the input at time

t

,

h_{t - 1}

is the hidden state from the previous time step,

W_{Ω}

and

U_{Ω}

denote weight matrices,

σ (.)

is the logistic sigmoid function given by:

σ (y_{t}) = \frac{1}{1 + e^{- y_{t}}} \in [0, 1] .

(21)

It should be noted that

Ω_{t} \to 1

will mostly retain information from

h_{t - 1}

, otherwise will retain information

h_{t} .

The candidate activation (

{\tilde{h}}_{t}

) is given by the following equation:

{\tilde{h}}_{t} = t a n h (W y_{t} + U (r_{t} ⨂ h_{t - 1})),

(22)

where the reset gate

r_{t}

is given by the following expression which is similar to that employed by the update gate:

r_{t} = σ (W_{r} y_{t} + U_{r} h_{t - 1}),

(23)

where

r_{t}

is the reset gate at time

t

,

W_{r}

and

U_{r}

are weight matrices associated with the reset gate. This equation governs the extent to which the previous hidden state is disregarded when calculating the new candidate’s hidden state [41]. With varying datasets, GRUs can capture temporal trends and dependencies, but they may fall behind LSTMs (which can handle highly complex sequential data) in time series tasks involving long-term dependencies (see [41]). We fitted the GRU using the “keras” library in the R program.

2.4. Persistence Model

The naive model assumes that the wind speed data at the time

t

is exactly the same as it was at the time

t - 1 .

Generally, the naive model is used as a benchmark model, and it provides accurate forecasts for short and very short-term forecasting timescales [42]. The naive model is given by the equation below:

y_{t} = y_{t - 1} .

(24)

2.5. Proposed Prediction Approach

The wind speed data are characterised by continuous variation and structural breaks resulting from transient wind behaviour, which negatively impacts the integration of large volumes of wind power into the existing power grid and is difficult to capture using a single forecasting model. In turn, the resultant power disequilibrium complicates the operation of power grids and utility managers’ ability to develop effective management strategies. A methodology for improving wind speed forecasting is depicted in Figure 1 which outlines the sourcing, cleaning and processing of data, as well as the use of evolutionary algorithms, wavelets, and machine learning algorithms. Hence, we summarise the role of each of the models involved in the development of the optimised wavelet-MODWT-GRU model as follows (also see Algorithm 1):

DE algorithm is used to optimise the decomposition level of highly variant wind speed data due to its simplicity, efficiency, and ability to deal with complex continuous non-differentiable problems. As a result, more statistically sound sub-signals that are easy to characterise in modelling and forecasting can be extracted.
A time-invariant MODWT resistant to boundary effects is used, employing a variety of wavelet filters, particularly the DB4, LA8, and MB8, to decompose wind speed data into detailed (high) and approximate (low) frequency components. These components have reduced noise and volatility, exposing short-term and long-term trends. They can be modelled and forecasted with ease and efficiency.
The advantage of nonlinear approximators GRUs lies not only in their simplicity, but also in their ability to efficiently and accurately capture complex wind behaviour’s (including linear and nonlinear components) in each sub-signal. For this reason, we opted for GRU, renowned for its capability to manage vanishing gradients, a characteristic particularly suited for handling detailed signals. This choice was made to ensure the prediction of each sub-signal is conducted with an unprecedented level of accuracy and reliability, all the while mitigating the risk of model overfitting or explosion.
Finally, we leveraged the MRA, known for its effective data compression, denoising, feature extraction, and efficient reconstruction, to reconstruct the original wind data and arrive at the final forecast using all sub-signal forecasts.

Algorithm 1: Wavelet-MODWT-GRU

1. Input: Wind speed data (

y_{t}

)

A. Data Cleaning and Preprocessing

2. data_cleaning_and_preprocessing
3. Load original wind speed data

y_{t} \in R^{M}

into R program environment.
4. Clean and format data inconsistencies and anomalies caused by environmental factors and instrument instability.
5. Retain

y_{t} \in (0, 22 \frac{m}{s}]

as wind turbines resort to feathering beyond this limit and are switched off.
6. Divide data into 80% training (

{y_{t}}^{t r a i n} \in R^{M - h}

) and 20% testing sets (

{y_{t}}^{t e s t} \in R^{h}

) with

M > h

such that

M \in R^{+}, h \in R^{+}

.
7. output

B. DE hyperparameter search

8. de_hyperparameter_search
9. Initialise the wavelet filter.
10. Define the objective function based on the original wind data (

y_{t}

) and the reconstructed series (

\hat{y_{t}}

) such that the mean sum of square (MSE) error is given by

{M S E}_{R} = \frac{\sum_{t = 1}^{M} {(y_{t} - \hat{y_{t}})}^{2}}{M} \in R^{+}

. The function is specific to the wavelet filter and is used to evaluate performance.
11. Set parameter bounds within which DE will search. Thus, set population size, number of iterations, crossover probability, parameter bounds, and weights. This is vital for DE to search a relevant interval and to improve search efficiency.
12. Run the DE until the predetermined termination criterion (i.e., number of runs) is reached.
13. output

C. Signal denoising and processing

14. signal_denoising_and_formating
15. In MODWT, the optimised decomposition level (

L

) is used alongside the conditions, filters and boundary parameters to decompose the

y_{t} \in R^{M}

into less noisy and more statistically sound sub-signals

Γ = {d_{1}, d_{2}, \dots,, d_{L}, A_{L}}

with

d_{i} (i = 1 : L) \in R^{M}

and

A_{L} \in R^{M}

.
16. Each sub-signal is divided into two sets, namely; the training set (80%) (

{ϑ_{t}}^{t r a i n}

\in R^{M - h}

) and the testing set (20%) (

{ϑ_{t}}^{t e s t} \in R^{h}

).
17. Normalise sub-signals using the min-max criterion such that

{ϑ_{t (n o r m)}}^{t r a i n} (t = 1 : M - h) = \frac{{Γ_{t}}^{t r a i n} - {Γ_{m i n}}^{t r a i n}}{{Γ_{m a x}}^{t r a i n} - {Γ_{m i n}}^{t r a i n}}

\in R^{M - h}

and

{ϑ_{t (n o r m)}}^{t e s t} (t = (M - h + 1) : M) = \frac{{Γ_{t}}^{t e s t} - {Γ_{m i n}}^{t e s t}}{{Γ_{m a x}}^{t e s t} - {Γ_{m i n}}^{t e s t}} \in R^{h}

. This ensures that

{ϑ_{t (n o r m)}}^{t e s t} \in R_{\{0,1\}}

and

{δ_{t (n o r m)}}^{t r a i n} \in R_{{0,1}}

are compatible with the hyperbolic tangent function and minimise noise/variance effects on the predictions.
18. output

D. GRU hyperparameter search

19. gru_hyperparameter_search
20. Array data into a 3D format (i.e., samples, time steps, feature) for compatibility with the GRU network.
21. Initialise parameters: input shape, batch size, dropout rates, epochs, activation function, loss function, learning rate, and optimiser.
22. Train GRU model and evaluate performance based on

M S E = \frac{1}{M - h} \sum_{t = 1}^{M - h} {({y_{t}}^{t r a i n} - {\hat{y}}_{t}^{t r a i n})}^{2}

\in R^{+}

) using the normalised training dataset

{ϑ_{t (n o r m)}}^{t r a i n} \in R^{M - h}

.
23. Preserve model parameters with optimal performance. 24. output

E. Test GRU performance

25. test_gru_performance
26. Superimpose the GRU model with optimal parameters on

{ϑ_{t (n o r m)}}^{t e s t}

to generate normalised forecasts

{\hat{ϑ}}_{t (n o r m)}^{t e s t}

.
27. Return to the original sub-signal forecast via

{\hat{Γ}}_{t}^{t e s t} = {(Γ}_{m a x}^{t e s t} - {Γ_{m i n}}^{t e s t}) * {\hat{ϑ}}_{t (n o r m)}^{t e s t} + {Γ_{m i n}}^{t e s t},

where

{\hat{Γ}}_{t}^{t e s t} (t = (M - h + 1) : M) \in ({\hat{d}}_{1}^{t e s t}, {\hat{d}}_{2}^{t e s t}, \dots, {\hat{d}}_{L}^{t e s t}, {\hat{A}}_{L}^{t e s t}) \in R^{h}

are the original sub-signal predictions and

{\hat{ϑ}}_{t (n o r m)}^{t e s t} \in ({\hat{d}}_{1 (n o r m)}^{t e s t}, {\hat{d}}_{2 (n o r m)}^{t e s t}, \dots, {\hat{d}}_{L (n o r m)}^{t e s t}, {\hat{A}}_{L n o r m}^{t e s t}) \in R^{h}

are the normalised sub-signal predictions.
28. Evaluate the performance of the GRU predictions for each sub-signal using RMSE, MAE, and MAPE.
29. output

F. Signal reconstruction and output evaluation

30. signal_reconstruction_and_output_evaluation
31. All sub-signals predictions are used to reconstruct

{y_{t}}^{t e s t} \in R^{h}

such that

{\hat{y_{t}}}^{t e s t} = i n v e r s e

-

M O D W T ({\hat{d}}_{1}^{t e s t}, {\hat{d}}_{2}^{t e s t}, \dots, {\hat{d}}_{L}^{t e s t}, {\hat{A}}_{L}^{t e s t})

.
32. Use performance metrics and statistical tests (i.e., RMSE, MAE, MAPE, coefficient of determination (R²), PL, MD, Mincer-Zarnowitz (MZ), PIT, and DS) to compare

{\hat{y_{t}}}^{t e s t} \in R^{h}

and

{y_{t}}^{t e s t} \in R^{h}

.
33. output

34. Output:

{\hat{y}}_{t}

, RMSE, MAE, MAPE, R², PL, MD, MZ, PIT, and DS

2.6. Data Description

The data used in this study were downloaded from the WASA website (https://wasadata.csir.co.za/wasa1/WASAData) (accessed on 20 September 2022/5 June 2023). A detailed description of each station is provided in Table 2. The univariate time series wind speed data (i.e., “WS_60_mean”) are partitioned into two sets, namely, the training set (80%) and the testing set (20%). Models were built on the training set, while the testing set was used to evaluate the proposed methods’ predictive strength and generalisation abilities in varying years, seasons, terrains, and forecast horizons.

At latitude

-

28.601882, longitude 16.664410, and elevation 152

m

, the first station with Mast ID WM01 is located in Alexander Bay, a desert region of the Northern Cape. A second station is located in Humansdorp in the Eastern Cape with Mast ID WM08, longitude 24.514360, latitude

-

34.109965, and elevation 110

m

(also see Figure 2). A third station with Mast ID WM13 is located in the Jozini region of KwaZulu-Natal at longitude 32.16636, latitude

-

27.42605, and elevation 80

m

(see Table 3). Each of the data points of each station consists of 10 min averaged wind speed with varying features depending on the location (see Table 3).

2.7. Performance Evaluation

2.7.1. Deterministic Forecast Evaluation Scores

For proper evaluation, robust and reliable models must be distinguished from weak ones using appropriate error metrics. Hence, we tested the accuracy of the prediction model in wind speed point forecasting using the RMSE, MAE, and MAPE, respectively, given by the following equations:

R M S E = \sqrt{\frac{1}{m} \sum_{t = 1}^{m} {ξ_{t}}^{2}},

(25)

M A E = \frac{1}{m} \sum_{t = 1}^{m} |ξ_{t}|,

(26)

M A P E = \frac{1}{m} \sum_{t = 1}^{m} |\frac{ξ_{t}}{y_{t}}| \times 100,

(27)

where

ξ_{t} = {\hat{y}}_{t} - y_{t}

is the error term; and

y_{t}

and

\hat{y_{t}}

, respectively, denote the actual and predicted wind speed value. Smaller values of RMSE, MAE, and MAPE indicate a better predictive model. The penalisation of large errors by RMSE makes it susceptible to outliers. MAE is less sensitive to outliers but only considers errors’ magnitude, not their direction. Despite its scale dependence, MAPE overemphasises small errors when comparing models (see [9,42] for details). We also use the coefficient of determination or

R^{2}

\in [0, 1]

which is given by the equation below:

R^{2} = 1 - \frac{\sum_{t = 1}^{m} {({\hat{y}}_{t} - y_{t})}^{2}}{\sum_{t = 1}^{m} {({\bar{y}}_{t} - y_{t})}^{2}},

(28)

where

{\bar{y}}_{t}

represents the mean wind speed value. A better-fitted and preferred predictive model has an

R^{2}

value much closer to 1. This indicator is highly sensitive to outliers.

2.7.2. Probabilistic Forecast Evaluation Scores

Probabilistic forecasting scores focus on assessing the model’s spread by taking into account the deviance of the conditional mean distribution to observed values. In this study, we tested the strength of the models in probabilistic forecasting through the PL function. Apart from its ability to deal with non-symmetric errors, the PL can be used to assess the reliability of interval predictions. The PL is given by [42]

{P L}_{τ} (y_{t}, {\hat{y}}_{τ t}) = \{\begin{matrix} (y_{t} - {\hat{y}}_{τ t}) τ, y_{t} \geq {\hat{y}}_{τ t}, \\ ({\hat{y}}_{τ t} - y_{t}) (1 - τ), y_{t} < {\hat{y}}_{τ t,} \end{matrix}

(29)

where

τ \in (0, 1)

is the target quantile (i.e.,

τ = 0.95

),

y_{t}

the actual wind speed value,

{\hat{y}}_{τ t}

the quantile forecast

.

The lower the PL value, the more accurate the quantile forecast.

We also used the DS score to evaluate the accuracy, sharpness, and reliability of probabilistic predictions. In this scoring rule, which only considers the mean and variance of a forecast distribution, broad distributions were penalised, whilst narrower distributions were incentivised, emphasising distribution sharpness. The DS is calculated by the following equation [43]:

D S = 2 \log (σ) + {(\frac{y_{t} - μ}{σ})}^{2},

(30)

where

y_{t}

is the actual wind speed observation,

μ

denotes the mean and

σ

is the standard deviation of the forecast distribution. Lower values of the DS score imply more accurate and reliable probabilistic forecasts.

PIT histograms or values are used to evaluate the consistency between probabilistic forecasts and actual wind speed values. In an ideal forecast distribution, there would be no bins with extremely high or low levels on the PIT histogram. The PIT is computed using the equation below [44,45]:

P I T = β = F_{δ} (y_{t}),

(31)

where

β

is the transformed variable and

F_{δ} (y_{t})

is the forecast distribution

F_{δ}

evaluated at the actual wind speed value

y_{t}

. By transforming observed values into uniform distributions, the agreement between forecasts and actual data can then be examined. Smaller and uniformly distributed PIT values imply well-calibrated probabilistic forecasts; otherwise, the models’ probabilistic forecasts are miscalibrated or skewed.

2.7.3. Biasedness Assessment

Consider the residual terms denoted

ξ_{t} = {\hat{y}}_{t} - y_{t}

,

t = 1, \dots, n

, if

ξ_{t} \neq 0

, then the model is either overestimating (

ξ_{t} < 0

) or underestimating (

ξ_{t} > 0

) the actual wind speed observation and is said to be biased. The MZ test regression function is given by the following equation (see [46] for details):

y_{t} = ω + ϕ {\hat{y}}_{t - 1} + ξ_{t} .

(32)

The MZ test evaluates the unbiasedness and consistency of the predictions, by testing the null hypothesis that the intercept (

ω

) and slope (

ϕ

) terms are respectively

0

and

1

. If parameter

ω = 0

and

ϕ = 1

, it indicates that the model’s predictions are unbiased and the prediction errors are minimal, otherwise, the model is considered biased. In essence, the rejection rule is such that if the p-value

> 0.05

then the model is unbiased; otherwise the model is biased.

2.7.4. Predictive Accuracy Assessment

The MD was proposed by [47], and it graphically plots to compare the predictive strength of the models. Thus, MD displays forecast skill over various scoring functions, thereby enhancing comprehensive comparison and assessment of probabilistic model accuracy. Considering the loss function

S ({\hat{y}}_{t}, y_{t}) = {({\hat{y}}_{t} - y_{t})}^{2}

, the work of [47] showed that any consistent scoring function can be written in the following form:

S ({\hat{y}}_{t}, y_{t}) = \int s_{θ} ({\hat{y}}_{t}, y_{t}) H d (θ),

(33)

where

H

denotes a non-negative measure and

S_{θ} ({\hat{y}}_{t}, y_{t}) = \{\begin{matrix} {| y}_{t} - θ | & i f m i n ({\hat{y}}_{t}, y_{t}) \leq θ < m a x ({\hat{y}}_{t}, y_{t}), \\ 0 & o t h e r w i s e \end{matrix}

(34)

where

θ \in R

. For

m

events point forecasts, the average, denoted by

S_{θ} ({\hat{y}}_{t}, y_{t})

can be calculated using the following mathematical expression:

s ({\hat{y}}_{t}, y_{t}) = \frac{1}{m} \sum_{i = 1}^{m} S_{θ} ({\hat{y}}_{t_{i}}, y_{t_{i}}) .

(35)

3. Results

3.1. Computational Tools

We developed and executed comprehensive tests across all models within the Dell Intel Core i7 notebook development environment, utilising the R programming language. To accommodate the DE algorithm, we employed the “DEoptim” library, while the “wavelism” and ”wavelets” libraries were utilised for the selection of appropriate wavelet filters, including “d4”, “mb8”, and “la8” (see [48]). The “modwt” function from the “waveslim” library was instrumental in decomposing wind speed data sets into various sub-signals with varying frequency components. The GRU model was established and subsequently fitted using the “keras” library, a Python package (version 3.12) (see [49]). A systematic approach involving metaheuristic, early stopping and dropout regularisation was employed to identify the best hyperparameters for the models. The resulting best range of parameters is outlined in Table 4.

3.2. Exploratory Data Analysis

Table 5 summarises the descriptive statistics for wind speed data for the three WASA stations under investigation. The skewness values for all three stations are positive, indicating that wind data from each station has a positively skewed distribution (see Figure 3). In addition, the Jarque–Bera (JB) test (all p-values

< 0.05

) indicates that the three datasets are non-normal. This is also evident from the skewed density plots; the datasets under investigation have non-constant variance over time, illustrating some elements of cyclostationary patterns. In terms of standard deviation, Alexander Bay data seem more variant or volatile than other stations. Furthermore, data from Alexander Bay and Jozini stations are platykurtic since their kurtosis is less than 3. In the case of Humansdorp wind data, the kurtosis value is higher than 3, and the distribution is leptokurtic.

3.3. Deterministic and Probabilistic Performance Evaluation

A summary of the results of point prediction accuracy indicators for the models explored at the three WASA stations, namely; Alexander Bay, Humansdorp, and Jozini, is presented in Table 6. The rationale is to assess the impact and effect of wavelet filters, decomposition level, location of the station, and forecasting scale/horizon on the proposed methods at both point and probabilistic forecasting levels. To effectively justify the predictive strength of the optimised and proposed decomposition level (

L = 3

), additional levels (

L = 4

and

L = 5

) were set out to do the same task using the wavelet-MODWT-GRU hybrid model. Additionally, three wavelet filters were used, namely; LA with eight (8) vanishing moments (i.e., “la8”) (denoted by M1), MB with eight (8) vanishing moments (i.e., “mb8”) (denoted by M3), and DB (denoted by M2) with four (4) vanishing moments (i.e., “d4“) for effective comparison. Besides, we also tested the efficacy of the best wavelet-MODWT-GRU among the wavelet filters against the individual GRU and benchmark naïve model. Best models are bolded.

For the high-variant Alexander Bay dataset, model M3 optimised at

L = 3

produced the best results based on the least RMSE, MAE, MAPE, and highest

R^{2}

values. Considering the MZ test, M1 was unbiased at

L = 3

and

L = 4

; whilst M3 was unbiased at

L = 5

. Model M1 (

L = 3

), produced the second-best result based on the same indicators. The least desirable results were observed for M2 (

L = 5

). Overall, predictive accuracy is greatest at

L = 3

and declines as the decomposition level increases.

For the Humansdorp dataset, model M3 outcompeted all models at decomposition levels

L = 3

and

L = 5

based on the smallest RMSE, MAE, and the highest

R^{2}

. As was observed in the Alexander Bay dataset, the optimised decomposition level (

L = 3

) produced the best results in comparison with

L = 4

and

L = 5

; and models’ strength seems to decrease with increasing decomposition levels. In terms of the MZ test, all models are unbiased across all decomposition levels, with the exception of M2 which is biased for

L = 3

. In general, the performance of M3 at a decomposition level

L = 3

was the best for the Humansdorp dataset.

For the least-variant Jozini dataset, model M3 (

L = 3

) displayed the best performance as compared to other models and decomposition levels (

i . e ., L = 4

and

L = 5

). In fact, model M3 dominated all other models at decomposition levels

L = 3

and

L = 4

based on highest value of

R^{2}

and the smallest values of MAE and MAPE. The MZ test showed that predictions from all models were unbiased (see Figure 4).

The comparative analysis showed that the performance of the models varied with the location of the station, wavelet filter applied, decomposition level, and the forecasting horizon. Overall, models seem to be more accurate and sharper on lower forecast horizons than on longer forecast horizons. For instance, the highest RMSE = 1.3119 value was recorded by M2 (

L = 5

) at Alexander Bay station with forecast horizon

(h = 892)

, whilst the smallest RMSE was observed by model M3 (

L = 3

) (RMSE = 0.4678 ) at Humansdorp station with

h = 720

. Similar to the work of [50], wavelet decomposition at higher levels produces more statistically sound sub-signals, but also produces larger error accumulations.

Table 7 compares the predictive performance of the best wavelet filter model (at

L = 3

) with that of M4 (GRU) and benchmark M5 (naive). Using the skilled scores, we computed the predictive accuracy improvement relative to the benchmark M5 (see [42] for details). In essence, the skilled indicator produces a percentage that other models (i.e., M3 and M4) improve the benchmark model or M5. Our optimised proposed strategy dominated both the M4 and M5 models based on the smallest values of RMSE and MAE across the three datasets (also see Figure 4). In addition, the proposed hybrids produced the best results for the Humansdorp and Jozini datasets based on their lowest MAPE, whilst M4 produced the best results for the other station based on the same indicator. On the basis of the MZ test, the predictions from the proposed model (M3) were unbiased for Humansdorp and Jozini stations, whilst the predictions from model M5 were biased across all three datasets. In general, point error metrics indicated that model M3 provided the best predictions of wind speed data across the three stations.

Table 8 presents the comparative evaluation of models using distributional forecast accuracy measures (i.e., PL, DS, and PIT). We used the PL and DS scores to measure model accuracy and uncertainty of predictions, whilst the PIT scores were used to assess models’ quality based on calibration. The smaller the values of PL, DS, and PIT, the better the probabilistic forecasts. Additionally, the model characterised by an unbiasedness model should have a uniform PIT histogram.

Model M4 produced the least values of PL for the Jozini and Alexander Bay datasets, whilst M5 produced the smallest PL value for the Humansdorp dataset. Based on the DS score, M3 outcompeted M4 and M5 with the lowest scores for the Alexander Bay, Humansdorp, and Jozini datasets. The DS Score indicator further showed that M4 produced the second-best results and demonstrated superiority over our benchmark for the Alexander Bay and Jozini datasets. Overall, model M3 had the most accurate and reliable predictions across the three datasets.

The PIT test statistics show that the proposed hybrid approach (i.e., M3) outcompeted both M4 and M5 for Alexander Bay and Jozini datasets by producing the smallest deviance between the observed and expected values. Furthermore, the Kolmogorov-Smirnov (KS) test showed that the p-values for M3 are greater than 0.05 for the Alexander Bay and Humansdorp datasets. Thus, we do not reject the null hypothesis that the PIT values are uniformly distributed. Hence, the predictions from these models are well-calibrated and unbiased for the respective datasets. Although the p-value

< 0.05

at the Jozini data station, M3 outcompeted both M4 and M5 in producing the least biased and deviant predictions that are better-calibrated (see Figure 5).

Overall, M3 produced the most accurate, reliable, and well-calibrated predictions for the Alexander Bay, Humansdorp, and Jozini datasets. Across all datasets, preprocessed models through WT at the optimised decomposition level (

L = 3

) were superior in terms of prediction accuracy, reliability, and calibration over those that have not been preprocessed.

3.4. Predictive Performance Analysis

The MDs are provided in Figure 6 for comparing the predictive accuracy of our proposed strategy against model M4 (or GRU). For Alexander Bay data (top panel), model M3 showed clear dominance (blue curve mostly below the red curve) over model M4, as illustrated in the top panel of Figure 6. A slight curve overlap at the extreme values was also observed. In addition, empirical score differences are mostly negative values, with a slightly wider and broader range of 95% prediction interval scores.

The empirical score curve in Figure 6 (middle panel) shows that M3 dominated M4 for the Humansdorp data. For this data station, M3 provided better prediction accuracy than M4. Furthermore, the difference in scores with a majority of negative ratings had a wider (but slightly broader as compared to Alexander Bay data) 95% prediction interval.

The MD results for the Jozini data (bottom panel of Figure 6) show that M3 had superior predictive power than M4. Furthermore, the empirical score differences were mostly negative values with a slightly wider range of 95% prediction intervals when compared to the other two datasets. Similar to the Alexander Bay, curves overlapping the extreme values were also observed.

Overall, the MD confirms the results deduced from the other performance indicators, and it indicates that model M3 provides the best forecasts for all three datasets. Furthermore, the 95% interval range width seems much narrower for high-variant data and a longer forecasting horizon.

Table 9 compares the forecasting performance of the proposed approach (M3) compared to that of M4 and M5 at different lead time intervals based on the high-variant Alexander Bay dataset. Ideally, we would like to assess the accuracy, generalisability, and robustness of the proposed approach as lead time increases from 10 min to 60 min (or 1 h). As evidenced by the increases in the values of error metrics (except MAPE for model M3) and reductions in the skill score values, it can be deduced from Table 9 that the models forecasting performance reduces with increases in lead times. Nonetheless, the proposed approach (M3) still demonstrates clear dominance and superiority in predictive forecasting over all other models even at a 1 h lead time.

4. Conclusions

While wind power has numerous advantages, one of the most challenging aspects of integrating large volumes of wind power into an existing electrical power grid is that wind resource (i.e., wind speed) has inherent climatic behaviour, characterised by high intermittent and continuous variations in its intensity. This type of behaviour often encompasses aspects of both linearity and nonlinearity. Attempting to characterise it using a single model (let alone a linear model) is impractical and potentially erratic, ultimately impacting wind power investment decisions.

The study proposes a hybrid strategy that integrates MODWT wavelet filters, DE algorithms, and GRUs for wind speed forecasting, referred to as wavelet-MODWT-GRUs. This approach was tested using high-resolution wind data averaged over 10 min from WASA data stations located in Alexander Bay, Humansdorp, and Jozini, South Africa. We began with a comparative analysis of the wavelet-MODWT-GRUs employing LA8, DB4, and MB8 filters across various decomposition levels (i.e.,

L = 3, L = 4

, and

L = 5

). The efficacy of the best wavelet-MODWT-GRU at each of the three stations was subsequently validated and compared with the individual GRU, while also being benchmarked against the naïve model.

The point forecast error indicators (i.e., RMSE, MAE, and MAPE) and

R^{2}

showed that the proposed models’ predictive performance at the decomposition level

L = 3

optimised through DE was best across all three datasets and wavelet filters. The filter-wise comparative analysis further revealed that the MB-MODWT-GRU at

L = 3

produced the most accurate and unbiased wind speed predictions for the Alexander Bay data, Humansdorp, and Jozini datasets.

In probabilistic forecasting, the DS and PIT scores also showed that the MB-MOWT-GRU dominated GRU and the naïve model in terms of the most reliable and calibrated probabilistic forecasts across the three datasets. MD diagram analysis shows that MB-MODWT-GRU models dominated the GRU and naive models across all three datasets. From the comparative analysis, we can conclude the following:

DE optimises the decomposition level (at $L = 3$ ) efficiently and simply, resulting in MODWT decomposed sub-signals that are easier to characterise for GRU modelling.
The location of the station, wavelet filter applied, decomposition level, and forecasting horizon all affected model performance.
The MB filter (followed by the LA filter) produced better wind speed forecasts for short-to-longer forecast horizons, less variant-to-high-variant, and small-to-larger datasets.
The prediction error tends to increase as the decomposition level increases. Similar results were obtained in [20,50].
The application of MODWT reduces noise and volatility in wind speed data, thereby improving the prediction performance of all hybrids. Consequently, complex wind speed data sub-signals were accurately and reliably captured via the GRU model. These results concur with the work of [4,19].
Although the naive model is easy to understand, it is ineffective when dealing with wind speed data. Similar results were obtained in the work of [10].
Generally, the increase in lead times led to an increase in error metrics and a decrease in the predictive ability of all models. Nonetheless, the proposed approach still demonstrated superiority over all other models.

The proposed strategy will enable utility managers to integrate wind energy into power grids while developing a deeper knowledge of MODWT-based hybrids and wind speed measurements. However, only three wavelet filters were examined in the current study, which used small South African wind speed data. There is potential for future research to utilise larger and more complex datasets from varying regions within and outside the country with other wavelet filters (e.g., Meyer, Morlert, etc.). Moreover, the proposed strategy can be improved further by using other effective and data-efficient optimisation strategies, such as Bayesian optimisation (infused with cross-validation) and GRU. Future work will also look at spatially aggregating wind speed forecasting from multiple wind farms to diversify variability.

Author Contributions

Conceptualisation, K.S.S.; methodology, K.S.S.; software, K.S.S.; validation, K.S.S. and E.R.; formal analysis, K.S.S.; investigation, K.S.S.; resources, K.S.S. and E.R.; data curation, K.S.S.; writing-original draft, K.S.S.; writing review and editing, K.S.S. and E.R.; visualisation, K.S.S.; supervision, E.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

These wind speed data can be downloaded from the WASA website (https://wasadata.csir.co.za/wasa1/WASAData) (accessed 20 September 2022/5 June 2023).

Conflicts of Interest

The corresponding author states that there are no conflicts of interest.

References

Zhen, H.; Niu, D.; Yu, M.; Wang, K.; Liang, Y.; Xu, X. A Hybrid Deep Learning Model and Comparison for Wind Power Forecasting Considering Temporal-Spatial Feature Extraction. Sustainability 2020, 12, 9490. [Google Scholar] [CrossRef]
Wang, X.; Guo, P.; Huang, X. A Review of Wind Power Forecasting Models. Energy Procedia 2011, 12, 770–777. [Google Scholar] [CrossRef]
Chandra, D.; Sailaja Kumari, M.; Sydulu, M.; Grimaccia, F.; Mussetta, M. Adaptive Wavelet Neural Network-Based Wind Speed Forecasting Studies. J. Electr. Eng. Technol. 2014, 9, 1812–1821. [Google Scholar] [CrossRef]
Sivhugwana, K.S.; Ranganai, E. An Ensemble Approach to Short-Term Wind Speed Predictions Using Stochastic Methods, Wavelets and Gradient Boosting Decision Trees. Wind 2024, 4, 44–67. [Google Scholar] [CrossRef]
Soman, S.S.; Zareipour, H. A Review of Wind Power and Wind Speed Forecasting Methods with Different Time Horizons. In Proceedings of the North American Power Symposium 2010, Arlington, TX, USA, 26–28 September 2010. [Google Scholar] [CrossRef]
Gardner, W.A.; Napolitano, A.; Paura, L. Cyclostationarity: Half a Century of Research. Signal Process. 2006, 86, 639–697. [Google Scholar] [CrossRef]
Percival, D.B.; Walden, A.T. Wavelet Methods for Time Series Analysis; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
Zhang, Z.; Telesford, Q.K.; Giusti, C.; Lim, K.O.; Bassett, D.S. Choosing Wavelet Methods, Filters, and Lengths for Functional Brain Network Construction. PLoS ONE 2016, 11, e0157243. [Google Scholar] [CrossRef]
Gensler, A. Wind Power Ensemble Forecasting: Performance Measures and Ensemble Architectures for Deterministic and Probabilistic Forecasts. Ph.D. Thesis, University of Kassel, Hessen, Germany, 21 September 2018. [Google Scholar]
Singh, S.N.; Mohapatra, A. Repeated Wavelet Transform-Based ARIMA Model for Very Short-Term Wind Speed Forecasting. Renew. Energy 2019, 136, 128. [Google Scholar] [CrossRef]
Valdivia-Bautista, S.M.; Domínguez-Navarro, J.A.; Pérez-Cisneros, M.; Vega-Gómez, C.J.; Castillo-Téllez, B. Artificial Intelligence in Wind Speed Forecasting: A Review. Energies 2023, 16, 2457. [Google Scholar] [CrossRef]
Catalão, J.P.S.; Pousinho, H.M.I.; Mendes, V.M.F. Short-Term Wind Power Forecasting in Portugal by Neural Networks and Wavelet Transform. Renew. Energy 2011, 36, 1245–1251. [Google Scholar] [CrossRef]
Berrezzek, F.; Khelil, K.; Bouadjila, T. Efficient Wind Speed Forecasting Using Discrete Wavelet Transform and Artificial Neural Networks. Rev. d’Intelligence Artificielle 2019, 33, 447–452. [Google Scholar] [CrossRef]
Niu, D.; Pu, D.; Dai, S. Ultra-Short-Term Wind-Power Forecasting Based on the Weighted Random Forest Optimized by the Niche Immune Lion Algorithm. Energies 2018, 11, 1098. [Google Scholar] [CrossRef]
Patel, Y.; Deb, D. Machine Intelligent Hybrid Methods Based on Kalman Filter and Wavelet Transform for Short-Term Wind Speed Prediction. Wind 2022, 2, 37–50. [Google Scholar] [CrossRef]
Catalão, J.P.S.; Pousinho, H.M.I.; Mendes, V.M.F. Hybrid Wavelet-PSO-ANFIS Approach for Short-Term Wind Power Forecasting in Portugal. IEEE Trans. Sustain. Energy 2011, 2, 50–59. [Google Scholar] [CrossRef]
Xiang, J.; Qiu, Z.; Hao, Q.; Cao, H. Multi-Time Scale Wind Speed Prediction Based on WT-Bi-LSTM. MATEC Web Conf. 2020, 309, 05011. [Google Scholar] [CrossRef]
Liu, Y.; Guan, L.; Hou, C.; Han, H.; Liu, Z.; Sun, Y.; Zheng, M. Wind Power Short-Term Prediction Based on LSTM and Discrete Wavelet Transform. Appl. Sci. 2019, 9, 1108. [Google Scholar] [CrossRef]
Sivhugwana, K.S.; Ranganai, E. Short-Term Wind Speed Prediction via Sample Entropy: A Hybridisation Approach against Gradient Disappearance and Explosion. Computation 2024, 12, 163. [Google Scholar] [CrossRef]
Domínguez-Navarro, J.A.; Lopez-Garcia, T.B.; Valdivia-Bautista, S.M. Applying Wavelet Filters in Wind Forecasting Methods. Energies 2021, 14, 3181. [Google Scholar] [CrossRef]
Khelil, K.; Berrezzek, F.; Bouadjila, T. GA-based design of optimal discrete wavelet filters for efficient wind speed forecasting. Neural Comput. Appl. 2020, 33, 4373–4386. [Google Scholar] [CrossRef]
Morris, J.M.; Peravali, R. Minimum-bandwidth discrete-time wavelets. Signal Process. 1999, 76, 181–193. [Google Scholar] [CrossRef]
Storn, R.; Price, K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Wang, Y.; Cai, Z.; Zhang, Q. Differential Evolution With Composite Trial Vector Generation Strategies and Control Parameters. IEEE Trans. Evol. Comput. 2011, 15, 55–66. [Google Scholar] [CrossRef]
Eltaeib, T.; Mahmood, A. Differential Evolution: A Survey and Analysis. Appl. Sci. 2018, 8, 1945. [Google Scholar] [CrossRef]
Leon, M.; Xiong, N. Investigation of Mutation Strategies in Differential Evolution for Solving Global Optimization Problems. In Artificial Intelligence and Soft Computing, ICAISC 2014; Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M., Eds.; Lecture Notes in Computer Science, 8467; Springer: Cham, Switzerland, 2014. [Google Scholar] [CrossRef]
Merry, R.J.E. Wavelet Theory and Applications: A Literature Study; DCT Rapporten; Technische Universiteit Eindhoven: Eindhoven, The Netherlands, 2005. [Google Scholar]
Daubechies, I. Ten Lectures on Wavelets; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1992. [Google Scholar]
Gröchenig, K. Foundations of Time-Frequency Analysis; Springer Science & Business Media: New York, NY, USA, 2001. [Google Scholar]
Kovačević, J.; Goyal, V.K. Fourier and Wavelet Signal Processing; Verlag Nicht Ermittelbar: 2010. Available online: https://www.fourierandwavelets.org/FWSP_a3.2_2013.pdf (accessed on 4 October 2024).
Mallat, S.G. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. [Google Scholar] [CrossRef]
Misiti, M.; Misiti, Y.; Oppenheim, G.; Poggi, J.M. Wavelets Toolbox User’s Guide; The MathWorks: Natick, MA, USA, 2000. [Google Scholar]
Dghais, A.A.; Ismail, M.T. A Comparative Study between Discrete Wavelet Transform and Maximal Overlap Discrete Wavelet Transform for Testing Stationarity. Int. J. Math. Comput. Phys. Electr. Comput. Eng. 2013, 7, 1677–1681. [Google Scholar]
Rodrigues, D.V.Q.; Zuo, D.; Li, C. A MODWT-Based Algorithm for the Identification and Removal of Jumps/Short-Term Distortions in Displacement Measurements Used for Structural Health Monitoring. IoT 2022, 3, 60–72. [Google Scholar] [CrossRef]
Alarcon-Aquino, V.; Barria, J.A. Change Detection in Time Series Using the Maximal Overlap Discrete Wavelet Transform. Lat. Am. Appl. Res. 2009, 39, 145–152. [Google Scholar]
Zaharie, D. A Comparative Analysis of Crossover Variants in Differential Evolution. In Proceedings of the International Multiconference on Computer Science and Information Technology, Gosier, Guadaloupe, 4–9 March 2007; pp. 171–181. Available online: https://staff.fmi.uvt.ro/~daniela.zaharie/lucrari/imcsit07.pdf (accessed on 2 October 2024).
Eiben, Á.E.; Hinterding, R.; Michalewicz, Z. Parameter Control in Evolutionary Algorithms. IEEE Trans. Evol. Comput. 1999, 3, 124–141. [Google Scholar] [CrossRef]
Mullen, K.M.; Ardia, D.; Gil, D.L.; Windover, D.; Cline, J. DEoptim: An R package for global optimization by differential evolution. J. Stat. Softw. 2011, 40, 1–26. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Cho, K.; van Merrienboer, B.; Bahdanau, D.; Bengio, Y. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 2nd ed.; OTexts: Melbourne, Australia, 2021. [Google Scholar]
Jordan, A.; Krüger, F.; Lerch, S. Evaluating probabilistic forecasts with scoringRules. J. Stat. Softw. 2019, 90, 1–37. Available online: https://www.jstatsoft.org/article/view/v090i12 (accessed on 17 December 2024). [CrossRef]
Gneiting, T.; Raftery, A.E. Strictly Proper Scoring Rules, Prediction, and Estimation. J. Am. Stat. Assoc. 2007, 102, 359–378. [Google Scholar] [CrossRef]
Bosse, N.I.; Gruson, H.; Cori, A.; van Leeuwen, E.; Funk, S.; Abbott, S. Evaluating Forecasts with Scoringutils in R. arXiv 2022, arXiv:2205.07090. [Google Scholar] [CrossRef]
Mincer, J.; Zarnowitz, V. The Evaluation of Economic Forecasts. In Economic Forecasts and Expectations; National Bureau of Economic Research: Cambridge, MA, USA, 1969; pp. 3–46. Available online: https://www.nber.org/system/files/chapters/c1214/c1214.pdf (accessed on 17 June 2024).
Werner, E.; Tilmann, G.; Alexander, J.; Fabian, K. Of Quantiles and Expectiles: Consistent Scoring Functions, Choquet Representations, and Forecast Rankings. J. R. Stat. Soc. B 2016, 78, 505–562. [Google Scholar] [CrossRef]
Whitcher, B. waveslim: Basic Wavelet Routines for One-, Two-, and Three-Dimensional Signal Processing; R package version 1.8.5; 2024. Available online: https://cran.r-project.org/package=waveslim (accessed on 3 October 2024).
Allaire, J.J.; Chollet, F. Keras: R Interface to ‘Keras’. CRAN: Contributed Packages. R Package. 2024. Available online: https://cran.r-project.org/web/packages/keras/vignettes/ (accessed on 21 June 2024).
Kaur, D.; Lie, T.T.; Nair, N.K.; Vallès, B. Wind Speed Forecasting Using Hybrid Wavelet Transform-ARMA Techniques. Aims Energy 2015, 3, 13–24. [Google Scholar] [CrossRef]

Figure 1. Flow chart of the proposed wavelet-MODWT-GRU.

Figure 2. WASA high-resolution wind resource map (https://wasadata.csir.co.za/wasa1/WASAData) (accessed on 20 September 2022/5 June 2023).

Figure 3. Wind speed data for Alexander Bay (a), Humansdrop (b), and Jozini (c). Lines in blue represent QQ lines and boxes in grey indicate interquartile ranges.

Figure 4. Comparison of wind speed predictions against actual wind speed data for Alexander Bay (left panel), Humansdrop (right panel), and Jozini (bottom centre panel).

Figure 5. PIT Histograms for the Alexander Bay (top panel), Humansdorp (middle panel), and Jozini (bottom panel) comparing models M3, M4, and M5.

Figure 6. Murphy diagrams with 95% confidence intervals: M3 and M4 (upper panel, Alexander Bay), (centre panel, Humansdorp), and (bottom panel, Jozini). Shaded regions indicates 95% confidence intervals for the difference between the two functions.

Table 1. Properties of wavelet filters applied in the current study.

Filter
DB4	LA8	MB8	Citation
Can (to some extent) capture sharp transition features in the dataset.	LA8 yields better results when handling high-variant data.	Highly capable of handling high-frequency and transient features in wind data.	[8,22,27,28,29,30,31]
Compactly supported. Offers good localisation features but spectral leakage can be a problem when dealing with highly non-stationary data.	Compactly supported. Good localisation and reconstructs signals much better. Excellent for handling nonstationary data.	Offers excellent localisation properties alongside minimal spectral leakage.	[8,22,27,28,29,30,31]
DB4 are asymmetric and have a non-linear phase response. Signal distortion is possible due to inadequate boundary handling (compared to LA8)	LA8 are least asymmetric (compared to DB4) and has linear phase response. Signal distortion is possible, but minimal. Adequately handles boundaries.	Despite a lack of symmetrical properties, MB8 is less prone to signal distortion due to narrow bandwidth.	[8,22,27,28,29,30,31,32,33,34,35]
DB4 is orthogonal and has (to some extent) good energy preservation properties. Efficient perfect reconstruction.	Offer optimal trade between orthogonality and symmetry. Also possesses efficient perfect reconstruction	Orthogonal and provides a great time-frequency trade-off. Efficient computation and perfect reconstruction.	[8,22,27,28,29,30,31,32,33,34,35]

Table 2. Training and testing dataset.

Station	Month	N	Granularity	Training	Testing
Alexander Bay	1–31 August 2022	4462	10 min	3570	892
Humansdorp	1–25 April 2021	3600	10 min	2800	720
Jozini	1–17 December 2020	2400	10 min	1920	480

Table 3. Location description for the stations.

Station	Mast ID	Longitude (°E)	Latitude (°N)	Elevation (m)	Anemometer Height (m)
Alexander Bay	WM01	16.664410	28.601882	152	61.85
Humansdorp	WM08	24.514360	34.109965	110	61.84
Jozini	WM13	32.16636	27.42605	80	61.75

Table 4. Model hyperparameters.

Model	Main Hyperparameter	Search Space
DE	Number of iterations	50–60
	Population size	45–60
	Crossover probability	0.75–0.85
	Weights	0.5–0.6
	Bounds	1–10
MODWT	filters	“la8”, ”d4”, ”mb8”
GRU	Dropout rates	0–0.5
	Time steps	1–10
	Epochs	1–100
	Learning rate	0–0.1
	Activation function	tanh
	Loss function	MSE
	Optimiser	Adam

Table 5. Descriptive statistics for wind speed data (in m/s) at the three stations of interest.

Station	Min	Q1	Median	Mean	Q3	Max	Std.dev	Skewness	Kurtosis	JB (p-Value)
Alexander Bay	0.210	2.790	5.030	5.727	8.140	18.360	3.5739	0.6861	2.7581	<2.2 × 10⁻¹⁶
Humansdorp	0.2349	3.3596	5.2340	5.5351	7.2610	17.3089	2.8532	0.6385	3.1533	<2.2 × 10⁻¹⁶
Jozini	0.4045	3.2022	5.0106	5.3241	7.2137	14.1427	2.6980	0.4722	2.5046	<2.2 × 10⁻¹⁶

Table 6. Predictive performance indicators for the three wavelet filter models.

Data	Model	Performance Indicator
Data	Model	RMSE	MAE	MAPE (%)	$R^{2}$	MZ Bias Test
Alexander Bay (August,WM01) N = 4462 h = 892	$L = 3$
	M1 (LA8)	0.5256	0.3968	14.1156	0.9810	Unbiased
	M2 (DB4)	0.5562	0.4050	14.3321	0.9792	Biased
	M3 (MB8)	0.5102	0.3787	12.2258	0.9824	Biased
	$L = 4$
	M1 (LA8)	0.7557	0.5847	37.2397	0.9607	Unbiased
	M2 (DB4)	0.8676	0.6520	15.4997	0.9505	Biased
	M3 (MB8)	0.7882	0.6007	26.4022	0.9585	Biased
	$L = 5$
	M1 (LA8)	1.2909	0.9729	15.8073	0.8872	Biased
	M2 (DB4)	1.3119	0.9923	16.8702	0.8839	Biased
	M3 (MB8)	1.2150	0.9081	14.7143	0.8983	Unbiased
Humansdorp (April, WM08) N = 3600 h = 720	$L = 3$
	M1 (LA8)	0.4767	0.3580	8.8326	0.9718	Unbiased
	M2 (DB4)	0.5482	0.4059	10.4977	0.9631	Biased
	M3 (MB8)	0.4678	0.3544	9.1258	0.9729	Unbiased
	$L = 4$
	M1 (LA8)	0.8640	0.6027	14.3734	0.9073	Unbiased
	M2 (DB4)	0.7322	0.5198	12.2237	0.9335	Unbiased
	M3 (MB8)	0.7634	0.5512	12.5808	0.9276	Unbiased
	$L = 5$
	M1 (LA8)	1.0015	0.7537	16.5959	0.8754	Unbiased
	M2 (DB4)	0.9003	0.7154	16.6009	0.8994	Unbiased
	M3 (MB8)	0.8619	0.6750	15.2706	0.9077	Unbiased
Jozini (December, WM13) N = 2400 h = 480	$L = 3$
	M1 (LA8)	0.7092	0.4924	10.6034	0.9116	Unbiased
	M2 (DB4)	0.7167	0.5002	10.9543	0.9098	Unbiased
	M3 (MB8)	0.6703	0.4685	10.0479	0.9210	Unbiased
	$L = 4$
	M1 (LA8)	0.7789	0.5652	12.7938	0.8935	Unbiased
	M2 (DB4)	0.7566	0.5512	12.3017	0.8995	Unbiased
	M3 (MB8)	0.7693	0.5350	11.5570	0.8962	Unbiased
	$L = 5$
	M1 (LA8)	0.8927	0.7088	15.5410	0.8599	Unbiased
	M2 (DB4)	0.8096	0.6266	13.2615	0.8848	Unbiased
	M3 (MB8)	1.0169	0.8060	18.7082	0.8194	Unbiased

Table 7. Point performance indicators for the best wavelet filters model (at

L = 3

) against the GRU and naïve model.

Table 7. Point performance indicators for the best wavelet filters model (at

L = 3

) against the GRU and naïve model.

Model	Performance Indicator			Skilled Indicator *			Bias
Model	RMSE	MAE	MAPE (%)	RMSE	MAE	MAPE	MZ Test
Alexander Bay (August)
M3 (MB8)	0.5102	0.3787	12.2258	0.8705	0.8819	0.7045	Biased
M4 (GRU)	0.7147	0.5099	8.2808	0.8186	0.8410	0.7999	Biased
M5 (Naive)	3.9391	3.2069	41.3795				Biased
Humansdorp (April)
M3 (MB8)	0.4768	0.3544	9.1258	0.8321	0.8493	0.7793	Unbiased
M4 (GRU)	0.7557	0.5376	12.7266	0.7340	0.7714	0.6923	Unbiased
M5(Naive)	2.8406	2.3516	41.3546				Biased
Jozini (December)
M3 (MB8)	0.6703	0.4685	10.0479	0.7482	0.7885	0.8093	Unbiased
M4 (GRU)	0.8497	0.6377	13.5132	0.6808	0.7121	0.7435	Biased
M5 (Naive)	2.6616	2.2151	52.6837				Biased

Table 8. Distributional forecast accuracy indicators for the best wavelet filter model (at

L = 3

) against the naïve (M5) and GRU (M4) model.

Table 8. Distributional forecast accuracy indicators for the best wavelet filter model (at

L = 3

) against the naïve (M5) and GRU (M4) model.

Model	PL Score	DS Score	PIT Score
Model	$τ$ = 0.95	Mean	KS Test (D)	KS (p-Value)
Alexander Bay (August)
M3 (MB8)	0.7029	3.6753	0.0309	0.3633
M4 (GRU)	0.6635	3.6798	0.0503	0.0223
M5 (Naive)	0.7187	3.7441	0.1199	1.615 × 10⁻¹¹
Humansdorp (April)
M3 (MB8)	0.6361	3.0853	0.0481	0.0729
M4 (GRU)	0.6020	3.0905	0.0428	0.1451
M5 (Naive)	0.5204	3.0881	0.0677	0.0028
Jozini (December)
M3 (MB8)	0.4286	2.7419	0.0689	0.0217
M4 (GRU)	0.4248	2.7618	0.0902	0.0009
M5 (Naive)	0.5156	2.9833	0.2192	$<$ 2.2 × 10⁻¹⁶

Table 9. Effect of the lead times on model performance using the Alexander Bay dataset (at

L = 3

).

Table 9. Effect of the lead times on model performance using the Alexander Bay dataset (at

L = 3

).

Method		Performance Indicator			Skilled Indicator
Method	Lead Time (Minutes)	RMSE	MAE	MAPE (%)	RMSE	MAE	MAPE
M3	10	0.5102	0.3787	12.2258	0.8705	0.8819	0.7045
M3	60	0.5898	0.4330	7.9098	0.8507	0.8657	0.8099
M4	10	0.7147	0.5099	8.2808	0.8186	0.8411	0.7999
M4	60	2.3845	1.8621	39.4635	0.3964	0.4225	0.0514
M5	10	3.9391	3.2069	41.3795
M5	60	3.9503	3.2243	41.6032

Keynote: time interval = 10 min.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sivhugwana, K.S.; Ranganai, E. Wind Speed Forecasting with Differentially Evolved Minimum-Bandwidth Filters and Gated Recurrent Units. Forecasting 2025, 7, 27. https://doi.org/10.3390/forecast7020027

AMA Style

Sivhugwana KS, Ranganai E. Wind Speed Forecasting with Differentially Evolved Minimum-Bandwidth Filters and Gated Recurrent Units. Forecasting. 2025; 7(2):27. https://doi.org/10.3390/forecast7020027

Chicago/Turabian Style

Sivhugwana, Khathutshelo Steven, and Edmore Ranganai. 2025. "Wind Speed Forecasting with Differentially Evolved Minimum-Bandwidth Filters and Gated Recurrent Units" Forecasting 7, no. 2: 27. https://doi.org/10.3390/forecast7020027

APA Style

Sivhugwana, K. S., & Ranganai, E. (2025). Wind Speed Forecasting with Differentially Evolved Minimum-Bandwidth Filters and Gated Recurrent Units. Forecasting, 7(2), 27. https://doi.org/10.3390/forecast7020027

Article Menu

Wind Speed Forecasting with Differentially Evolved Minimum-Bandwidth Filters and Gated Recurrent Units

Abstract

1. Introduction

1.1. Research Motivation

1.2. Literature Highlights

1.3. Innovations

1.4. Structure of the Study

2. Materials and Methods

2.1. Fundamentals of Wavelet Analysis

2.1.1. Wavelet Filters

2.1.2. Multiresolution Analysis

2.1.3. Maximal Overlap Discrete Wavelet Transform

2.2. Differential Evolution

2.3. Gated Recurrent Unit

2.4. Persistence Model

2.5. Proposed Prediction Approach

2.6. Data Description

2.7. Performance Evaluation

2.7.1. Deterministic Forecast Evaluation Scores

2.7.2. Probabilistic Forecast Evaluation Scores

2.7.3. Biasedness Assessment

2.7.4. Predictive Accuracy Assessment

3. Results

3.1. Computational Tools

3.2. Exploratory Data Analysis

3.3. Deterministic and Probabilistic Performance Evaluation

3.4. Predictive Performance Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI