Evaluating Wind Speed Forecasting Models: A Comparative Study of CNN, DAN2, Random Forest and XGBOOST in Diverse South African Weather Conditions

Mugware, Fhulufhelo Walter; Sigauke, Caston; Ravele, Thakhani

doi:10.3390/forecast6030035

Open AccessArticle

Evaluating Wind Speed Forecasting Models: A Comparative Study of CNN, DAN2, Random Forest and XGBOOST in Diverse South African Weather Conditions

by

Fhulufhelo Walter Mugware

^†

,

Caston Sigauke

^*,†

and

Thakhani Ravele

Department of Mathematical and Computational Sciences, University of Venda, Private Bag X5050, Thohoyandou 0950, Limpopo, South Africa

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Forecasting 2024, 6(3), 672-699; https://doi.org/10.3390/forecast6030035

Submission received: 22 July 2024 / Revised: 14 August 2024 / Accepted: 16 August 2024 / Published: 19 August 2024

(This article belongs to the Special Issue Advance Techniques for Solar Radiation, Wind Speed and Photovoltaic Forecasting)

Download

Browse Figures

Versions Notes

Abstract

:

The main source of electricity worldwide stems from fossil fuels, contributing to air pollution, global warming, and associated adverse effects. This study explores wind energy as a potential alternative. Nevertheless, the variable nature of wind introduces uncertainty in its reliability. Thus, it is necessary to identify an appropriate machine learning model capable of reliably forecasting wind speed under various environmental conditions. This research compares the effectiveness of Dynamic Architecture for Artificial Neural Networks (DAN2), convolutional neural networks (CNN), random forest and XGBOOST in predicting wind speed across three locations in South Africa, characterised by different weather patterns. The forecasts from the four models were then combined using quantile regression averaging models, generalised additive quantile regression (GAQR) and quantile regression neural networks (QRNN). Empirical results show that CNN outperforms DAN2 in accurately forecasting wind speed under different weather conditions. This superiority is likely due to the inherent architectural attributes of CNNs, including feature extraction capabilities, spatial hierarchy learning, and resilience to spatial variability. The results from the combined forecasts were comparable with those from the QRNN, which was slightly better than those from the GAQR model. However, the combined forecasts were more accurate than the individual models. These results could be useful to decision-makers in the energy sector.

Keywords:

air pollution; global warming; fossil fuels; renewable energy; carbon emissions; volatility; reliability; machine learning; DAN2; CNN

1. Introduction

1.1. Overview

Wind energy utilisation as a sustainable alternative has become increasingly favoured due to its environmental friendliness and accessibility (Wiser et al. [1]). One of the advantages of wind energy is that it is accessible all day, making it preferable to solar energy. Using wind as the primary energy source would reduce global warming and the carbon footprint, which is critical because many countries are rushing to implement measures to reduce carbon emissions due to increased emissions reported over the years. According to (Tiseo [2]), since 1990, carbon emissions have risen by more than 60% and in 2022, 37.15 billion metric tons were recorded, and the projected increase in 2023 is expected to have risen by 1.1%, reaching 37.55 billion metric tons, which would be the highest carbon emission to date. The use of wind power presents several challenges, as it is expensive to implement. Nonetheless, significant investments have been made in recent years. In 2020, approximately USD 175 billion were invested in wind power. The following year, there was a decline in investments, with the figure dropping to USD 155 billion. However, since then, investments in wind power have been steadily increasing, reaching USD 185 billion in 2022 and USD 245 billion in 2023 [3]. As the investments increase, there must be a corresponding increase in the capacity of wind power generation. Comparing the years, in 2020 the worldwide wind energy capacity was at 733,719 megawatts. In 2021, despite the lower investments, the capacity increased by 12% to 824,602 megawatts. In 2022, the capacity further increased to 901,231 megawatts, indicating a 9.29% increase. In 2023, with a 32% increase in investments, the capacity rose to 1,017,199, reflecting a 12.87% increase in capacity, aligning with the significant investments made that year [4]. Another challenge with wind energy is that its highly volatile, which would cause power spikes in the power grid. This issue therefore calls for accurate predictions of wind speed, which is known to be the main driver of wind power. Failure to predict the wind speed accurately can disrupt the power supply (Klein and Celik [5]).

1.2. Literature Review

Much research has been done in modelling and forecasting wind power generation. Li and Shi [6] compared different Artificial Neural Network (ANN) models to find one with the highest predictive power. Three ANN models were considered: Adaptive Linear Neuron (ADALINE), feed-forward back-propagation (BP), and Radial Basis Function (RBF). The wind information utilised consists of the average wind velocity per hour gathered at two monitoring locations in North Dakota: Hanna-ford and Kulm. In the study, both wind speeds are measured 10 m above the ground, as suggested by WMO [7]. The authors used MAE, RMSE, and MAPE to assess the models’ performance. Based on the evaluation metrics, the authors established that the BP and RBF outperformed the ADALINE model.

Antor and Wollega [8], in their study, determined the most accurate machine learning algorithm amongst ridge regression, polynomial regression, and ANN for predicting wind speed. Wind speed is known to be one of the most unpredictable renewable energy sources. The study was conducted in the US, and the data used were from 2017 to 2019, which were collected by the Dark Sky website. After analysing the test data with R-square and RMSE metrics, it was discovered that the polynomial model had the highest R-square value of around 60%. On the other hand, the ANN model had the lowest R-square value. Additionally, the polynomial model had the lowest RMSE value of about 3.07, while the ANN model had the highest RMSE value above 3.5.

Shen et al. [9] conducted a study to predict wind speed for an unmanned sailboat. An unmanned sailboat uses wind to power its sails and moves through the water using wind speed and direction information. To achieve multi-step wind prediction, the authors suggested a new hybrid model for neural networks that combines CNN and LSTM. The study involved analysing the data and improving the grid search method. The appropriate hyperparameters for the learning rates and input length were determined during this process. The information analysed in this research was chosen from the National Climate Database of New Zealand. The dataset comprises several attributes, including humidity and pressure, among others. The training set and test set were created from the data. Specifically, 80% of the original data points were allocated for training purposes, while the remaining 20% was set aside for testing. The accuracy of the CNN-LSTM model was evaluated using MAE, R-Square, RMSE, and correlation coefficients (CC) metrics after using the multi-grid search method and training the models. The CNN-LSTM model performed better than the benchmark models, with lower errors and better CC values showing better accuracy and stability.

Chen and Folly [10] conducted a study comparing three wind speed prediction models: the autoregressive moving average (ARMA), ANN and ANFIS. The ANFIS is a hybrid model. The research employed information from the Wind Atlas of South Africa, obtained specifically from the Vredendal station. The data encompassed wind speed measurements at different heights, temperature readings, and atmospheric pressure data, all recorded at ten-minute intervals during the study period from December 2010 to January 2017. The MAPE and RMSE metrics evaluated how well these models performed. The findings demonstrate that all models perform similarly for extremely short-term predictions; however, the ARMA model was superior for shorter time frames. However, as the prediction period lengthened, its performance decreased more rapidly than that of ANN and ANFIS.

Ghiassi et al. [11] presented a new approach to time series forecasting using a dynamic neural network model called DAN2. Traditional forecasting methods like ARIMA often struggle to capture nonlinear patterns in data. While FFBP and ANNs have been somewhat successful, they have limitations in flexibility and accuracy. The DAN2 model addresses these issues by employing a unique architecture that dynamically adjusts and learns from data, more effectively integrating linear and nonlinear components. Comparative results show that DAN2 outperforms conventional FFBP models and ARIMA in accuracy, providing a robust alternative for forecasting complex time series events.

The same authors, Ghiassi et al. [12], after presenting the DAN2 model, then evaluated the model for medium-term load forecasting (MTLF) of electrical power systems. This model is trained using historical monthly load data from the Taiwan Power Company. Initially, the researchers included weather data to improve accuracy, but they also developed seasonal models that do not rely on weather variables. The seasonal models achieved high accuracy with mean absolute per cent error (MAPE) values below 3%. The study compared the performance of the DAN2 model to traditional methods such as multiple linear regressions (MLR), ARIMA, and a conventional neural network model, with the DAN2 model showing superior accuracy.

Trebing and Mehrkanoon [13] conducted a study proposing an innovative architecture based on CNNs for wind speed prediction. Their model was compared with classical 2D and 3D CNNs and a 2D CNN equipped with an attention layer, upscaling, and depthwise separable convolution. The models were trained on datasets from Denmark and the Netherlands to forecast wind speeds from 1-h ahead to 24-h ahead. The performance of these models was evaluated using MAE and MSE. The study concluded that the 3D CNN outperformed the other models across both datasets, except for the 6-h ahead forecast, where the 2D CNN with upscaling demonstrated superior performance compared to the classical 3D CNN. However, when the proposed model is compared with these models, it outperforms them. This study highlights the varying advantages of different CNN architectures, emphasizing the importance of model selection based on specific forecasting horizons and the inherent characteristics of the datasets used.

1.3. Research Highlights and Contributions

The contribution of this study lies in the detailed and segmented approach to evaluating Dynamic Architecture for Artificial Neural Networks (DAN2) and CNN models across Napier, Noupoort, and Upington stations for wind speed prediction, highlighting how different weather conditions affect model performance, which is crucial for practical applications in renewable energy modelling.

The research highlights of this study are:

Use of gradient ascent with hyperparameter tuning for maximum performance optimisation of the models.
Performance testing was conducted on the CNN and DAN2 models against a benchmark random forest. The CNN performed better at Napier and Upington stations than the benchmark model; it had lower error metrics and better prediction accuracy.
Compared to the benchmark model, DAN2 did not perform as well on the wind speed predictions for coastal and inland areas, such as Napier and Noupoort. This may imply that DAN2 is not as good as the CNN model in various geographical contexts.
In most of the weather conditions, the CNN model was much better at wind speed forecasting compared to DAN2; it had a mean absolute scaled error of less than 1 in all three stations, indicating it performed better than the baseline model.

A discussion of the modelling framework is given in Section 2. Empirical results are presented in Section 3, while Section 4 presents a discussion of the performance of the models. Section 5 provides concluding remarks.

2. Methods

2.1. Study Area

The research study will investigate three unique locations, each with distinct characteristics. The first location to be examined is Napier station, which can be found in the Western Cape. Its precise coordinates are longitude 19.692446, latitude 34.611915, and an elevation of 288 m. The second location, Noupoort, is in the Northern Cape and has coordinates of longitude 25.028380, latitude 31.252540, and an altitude of 1806 m. Lastly, Upington is also located in the Northern Cape, with its coordinates being longitude 20.568330, latitude 27.726700, and altitude of 848 m. These locations have varying weather conditions; Napier is in a coastal area, Noupoort is inland, and Upington is in a dry region. The information for these three places is sourced from the WASA database, accessible at https://www.wasaproject.info/ (accessed on 12 September 2023). Figure 1 shows the the South African map with the the three locations used in the study.

The programming language used for model implementations and other analyses is Python version 3.8. The following libraries were utilised:

Pandas: for data manipulation and analysis.
NumPy: for numerical computations.
SciPy: for scientific computing and statistical tests.
Statsmodels: for time series analysis and statistical modelling.
Scikit-learn: for machine learning model development and evaluation.
TensorFlow/Keras: for building and training deep learning models.
Matplotlib and Seaborn: for data visualization and plotting.

2.2. Models

To predict wind speed, we will utilise the following machine learning models.

2.2.1. Artificial Neural Networks

For several decades now, researchers have been focusing on artificial neural networks. The origin of this idea can be traced back to the early 1940s when [14] introduced a mathematical model of the brain capable of performing logical operations on neuron behaviour. This concept became the basis for artificial neural networks, and subsequent researchers have developed more advanced algorithms, including the perceptron algorithm created by [15]. Now, we can properly define the ANN. ANNs are sophisticated computer systems that replicate the functionality of biological neural networks. ANN has demonstrated remarkable versatility and efficiency and has been used to solve many world problems, such as image recognition. An illustration of a multilayer feed-forward artificial neural network can be observed in Figure 2. This network’s structure comprises three layers: the input, hidden, and output. A simple neural network can be defined as follows:

y_{k} = φ (\sum_{j = 1}^{k} x_{j} w_{j} + β_{k})

(1)

y_{k} = φ (μ_{k} - θ_{k})

(2)

μ_{k} = \sum_{j = 1}^{k} x_{j} w_{j}

(3)

β_{k} = - θ_{k}

(4)

Equation (1) includes the activation function

φ

, which can be selected from various options such as sigmoid, relu, and others. The inputs or data points are denoted by

x_{j}

, the bias is represented by

β_{k}

, and the weights are shown as

w_{j}

. In Equation (3), the weighted sum is indicated by

μ_{k}

. Additionally, in Equation (4), the threshold is represented by

θ_{k}

.

Figure 2 shows the structure of multilayer feed-forward ANN.

The utilisation of a multilayer feed-forward network is crucial for achieving precise predictions. This also helps prevent network loops [17]. ANN relies on a learning process that involves adjusting parameters like weights and thresholds to predict an output. Schmidhuber [18] categorised the learning process into two types: supervised, where the model is fed a target output, and unsupervised, where the model self-organises without input target data. Multiple researchers have been working on improving the performance of ANNs since their inception in the early 1980s. One significant improvement was introducing the dynamic approach, which adapts the network architecture during training; further details are discussed below.

2.2.2. Dynamic Architecture for Artificial Neural Networks

DAN2, developed by M. Ghiassi [11], is a remarkable improvement over ANN mentioned above. This model works by gradually learning and accumulating knowledge at each layer. This knowledge is then passed on and improved upon in the following layers. This process repeats until the model achieves the desired level of performance. As a result, DAN2 is considered a purely feedforward model, prioritising the propagation of information in a forward direction and making continuous adjustments to improve performance.

This study will use DAN2 instead of a traditional ANN model to predict wind speed. This model (Appendix A) processes all records simultaneously and repeatedly at every layer, using trigonometric transfer functions to capture non-linear relationships in the data. Additionally, DAN2 dynamically generates the number of hidden layers based on the complexity of the underlying process and the desired accuracy. These features help DAN2 more effectively capture the variability and complexity of wind speed data compared to the traditional ANN. By continuously learning and adapting its structure, DAN2 provides improved predictive accuracy and robustness, essential for optimising wind energy production and ensuring reliable power systems [19].

2.2.3. Convolutional Neural Network

The other model we will consider is CNN. CNN is a neural network used in deep learning that was created in the 1990s. Y.LeCun et al. [20] introduced its basic architecture in their paper titled “Gradient-Based Learning Applied to Document Recognition”. This paper presented a more effective way of recognising handwritten digits using a convolutional neural network, which surpassed traditional machine learning methods. CNN effectively reduces network parameters and overfitting risks by processing input data through local connection and parameter sharing. Compared to traditional neural networks, convolutional neural networks have several advantages, including quick training speed, fault tolerance, and parallelism. A typical CNN structure comprises an input layer, a convolutional layer, a pooling layer, and a fully connected layer, as demonstrated in Figure 3 [21].

2.2.4. Random Forest

Random Forest is an ensemble learning algorithm that creates multiple decision trees during training and outputs the average prediction of each tree for regression tasks. The training of each tree is performed on a random subset of the data and features, making Random Forest robust to overfitting and capable of identifying complex relationships in the data [22].

The random selection of features at each node split ensures that every tree in the forest is trained on diverse features, reducing the correlation between trees and promoting robustness. Random Forest can also handle numerical and categorical features without requiring extensive preprocessing. Another advantage of the algorithm is that it provides feature importance, allowing users to understand which features contribute the most to wind speed prediction. This interpretability can aid in comprehending the underlying relationships between meteorological variables and wind speed dynamics [23].

2.2.5. XGboost

Another model that we will consider as a benchmark model is XGBoost, short for eXtreme Gradient Boosting, which is an optimised distributed gradient boosting library known for its high efficiency, flexibility, and portability. It implements the gradient boosting framework, an ensemble learning technique that combines the predictions of several base estimators, typically decision trees, to enhance accuracy and robustness. XGBoost can be utilized for time series prediction by treating it as a regression problem. In this approach, past observations are used to predict future values. XGBoost captures temporal patterns by integrating lagged features and other pertinent time-based variables. Its strong handling of missing values and capability to model complex relationships make it a valuable tool for forecasting tasks in time series data.

2.3. Forecast Combination Using Quantile Regression Averaging

No Free Lunch theorems, as discussed in [24], alert us to the absence of a universal best algorithm in optimisation and machine learning algorithms and tell us that the effectiveness of an algorithm is highly context-dependent. As a result, in this study, we combined the forecasts from the models for each station using two quantile regression averaging methods. Combining forecasts from multiple models leverages the strengths of individual models while mitigating their weaknesses [25]. This is known to lead to more accurate, stable, and robust predictions [25].

Suppose the forecasts from the models DAN2, CNN, RF and XGBoost are combined so that we have a vector

{\hat{y}}_{comb} = ({\hat{y}}^{DAN 2}, {\hat{y}}^{CNN}, {\hat{y}}^{RF}, {\hat{y}}^{XGBoost})

(5)

In this study, we used two quantile regression averaging methods to combine our forecasts. The two methods are discussed in the following two sections.

2.3.1. Generalised Additive Quantile Regression Model

Gaillard et al. [26] developed a method that applies quantile regression (QR) using a generalised additive model, referred to as generalised additive quantile regression (GAQR). This modelling approach was extended by [27]. The response variable in a GAQR is robust to outliers. Crossover and mutation operators can be used to perform a global search and explore the search space. However, a covariate smoothing function is required for the GAQR model, which makes it computationally expensive.

The GAQR model solves the following problem [27]:

{\hat{β}}_{τ} \in \underset{β \in R^{d}}{\underset{︸}{\arg \min}} \sum_{i = 1}^{n} \frac{1}{σ} ρ_{τ} \{y_{i} - g_{i} (x_{i}^{T}, β_{i})\} + \frac{1}{2} \sum_{j = 1}^{m} λ_{j} β^{T} M_{j} β,

(6)

where

λ_{j}

are positive smoothing parameters used for penalization, with

g_{i} (x) = \sum_{j = 1}^{n} s_{j} (x)

and

s_{j}^{'} s

representing the additive smoothing effects,

ρ_{τ} (.)

represents the pinball loss function. The smooth effects are expressed in terms of spline basis as

s_{j} (x) = \sum_{k = 1}^{K} β_{j k} B_{j k} (x_{j}),

(7)

2.3.2. Quantile Regression Neural Network

A quantile regression neural network (QRNN) is a hybrid model combining Quantile regression and Neural Networks. It has the advantage of capturing nonlinear patterns in datasets and overdispersion and underdispersion in the dataset. The QRNN model was improved by [28] and is given in Equation (8).

f (x_{t}, v, w) = g_{2} (\sum_{j = 0}^{m} v_{j} g_{1} (\sum_{i = 0}^{n} w_{j i} x_{i t})),

(8)

where n-number of inputs, m-units of the hidden layer,

x_{i t}

are the predictor weather variables,

g_{1} (.)

and

g_{2} (.)

are activation functions,

w_{i j}

and

v_{j}

are variable weights of parameters to be estimated.

One of the main advantages of quantile regression neural networks is that they can model the full conditional distribution of the target variable, including nonlinear relationships. The QRNN model provides the flexibility of visualisation and interpretability of the distribution across the quantiles, making it robust across different quantiles and less sensitive to outliers. However, some computational challenges are associated with the QRNN models, such as complexity, making them hard to interpret compared to traditional methods.

2.4. Variable Selection

Over-fitting can be a problem that is best avoided using a proper variable selection method. This paper will consider a Lasso (least absolute shrinkage and selection operator), which Tibshirani introduced [29]. Utilising the Lasso technique can significantly enhance training speed. Its ability to select variables and implement regularisation by reducing specific regression coefficients to zero makes it an efficient approach. While Lasso assumes a linear model, its use for variable selection can still be justified in nonlinear models like ANN and CNN due to its ability to reduce dimensionality, improve generalisation, enhance interpretability, provide regularisation, and offer computational efficiency [30].

Suppose we have a regression model with a response variable Y and predictors

X_{1}, \dots, X_{p}

. The Lasso formulation is given as:

L a s s o = min_{β} (Y - X β) + Γ \sum_{i = 1}^{m} | | β_{i} | |

(9)

In the Lasso regression formula, Y represents the target vector, the input matrix will be denoted by X,

β

is the vector of estimated coefficients, variable i represents the number of observations, m is the number of predictors, and

Γ

is the regularisation parameter that determines the strength of the penalty on the coefficients’ absolute values. The goal of Lasso regression is to minimise the sum of squared errors between the predicted and true values by finding the optimal values of

β

while ensuring that the absolute value of the sum of coefficients is less than or equal to a specific threshold determined by

Γ

.

2.5. Metrics for Evaluating Forecasts

The effectiveness of the models will be assessed based on the following forecast evaluation metrics: mean absolute error (MAE), relative mean absolute error (RMAE), root mean square error (RMSE), relative root mean square error (RRMSE) and mean absolute scaled error (MASE). The model with the lowest values for these metrics will be the best. In the following section, we will provide the formulas for calculating these metrics:

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(10)

RMAE = \frac{1}{n} \sum_{i = 1}^{n} \frac{| y_{i} - {\hat{y}}_{i} |}{y_{i}}

(11)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(12)

RRMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(\frac{y_{i} - {\hat{y}}_{i}}{y_{i}})}^{2}}

(13)

MASE = \frac{\frac{1}{n} \sum_{t = 1}^{n} | e_{t} |}{\frac{1}{n - 1} \sum_{t = 2}^{n} | y_{t} - y_{t - 1} |}

(14)

In Equations (10)–(13), the following variables are used: n denotes the count of observations, the value

y_{i}

refers to the factual value of the ith observation, while

{\hat{y}}_{i}

refers to the projected value for the ith observation. In Equation (14), n is the length of the time series,

e_{t}

is the forecast error at time t, and

y_{t}

is the actual value at time t (for

t = 1, 2, \dots, n

).

3. Empirical Results

3.1. Exploratory Data Analysis

The research study will investigate three unique locations, each with distinct characteristics. The first location to be examined is Napier station, which can be found in the Western Cape. The second location, Noupoort, is in the Northern Cape; lastly, Upington is in the Northern Cape. These locations have varying weather conditions; Napier is in a coastal area, Noupoort is inland, and Upington is in a dry region. The information for these three places is sourced from the WASA database, accessible at https://www.wasaproject.info/ (accessed on 12 September 2023).

Table 1 presents a distance matrix showcasing the distance between stations in kilometres.

Based on our earlier discussions, we will be using 70% of the data for training, and the remaining 30% will be split equally between validation and testing. This means that the data for training will cover the period from 1 October 2022, at 00:10 to 22 October 2022, at 16:50, while the data for validation and testing will span from 22 October 2022, at 17:00 to 1 November 2022, at 00:00. Furthermore, the whole dataset has no missing values. Using this approach, we can ensure that our models are accurately trained and validated with the available data, ultimately leading to better insights and predictions. The wind speed is the response variable in this dataset, and it is recorded every 10 min, which means its wind speed is 62 m. A list of covariates is given in Appendix B.1.

Figure 4 displays time series plots of the mean wind speed at Napier, Noupoort, and Upington stations. It is evident from these plots that each station has its unique pattern, but they all exhibit a repeating pattern over time. This pattern indicates that the data may contain seasonality and stationarity in all stations. We conducted a KPSS test at the Napier, Noupoort, and Upington stations to confirm stationarity. The test statistics for Napier and Noupoort are 1.0191 and 1.0761, respectively, greater than the critical value of 0.463 at a 5% significance level. Therefore, we reject the null hypothesis and conclude that wind speed is not stationary at Napier and Noupoort stations. The test statistic for Upington is 0.2858, less than 0.463 at a 5% significance level. Therefore, we fail to reject the null hypothesis and conclude that wind speed at this station is stationary.

It is necessary to make the data from Napier and Noupoort stations stationary. Stationary time series data provide stability in statistical properties and simplifies the detection of patterns and relationships, leading to more reliable results. The data from the stations were differentiated once, and after the KPSS test was carried out again, the test statistics for Napier and Noupoort were 0.0098 and 0.0043, respectively. Both are less than the critical value of 0.463 at the 5% significance level. Thus, we fail to reject the null hypothesis and conclude that the wind speed for both stations is stationary. The differenced data will be utilised for model training and testing.

Figure 5, Figure 6 and Figure 7 show the box plots of the wind speed data for Napier, Noupoort and Upington stations, respectively. The box plots show the wind speed distribution on each day of the month for the given sampling period. Visual inspection of the three figures shows that there is some daily seasonality.

A total of four parametric distributions, normal, log-normal, Weibull and gamma distributions, respectively, were fitted to the data at the three locations. Estimation of the parameters of the distributions was conducted using the maximum likelihood method. Table 2 summarises the evaluation metrics, AIC and BIC. The Weibull distribution is the best-fitting distribution from all three stations. These findings are consistent with what is discussed in the literature in which it is argued that the Weibull distribution’s adaptability and robustness in modelling wind speed data make it the most commonly used distribution in this field [31,32,33].

All three datasets have 4464 observations and 23 columns. The tables labelled as Table 3, Table 4 and Table 5 provide a summary of statistics for the response variable and explanatory variables. These tables display the minimum value (Min), first quantile (Q1), median, mean, third quantile (Q3) and maximum value (Max).

Summary statistics for the Napier data are given in Table 3. The wind speed was at 62 m, and our target variable ranged from a minimum of 0.2075 m/s to a maximum of 18.1209 m/s, with an average of 8.1546 m/s, throughout the entire 31 days. Section 3.1 showed that the data for Napier station are not stationary. To ensure the accuracy of our analysis, we differenced the data, making it stationary before computing the kurtosis and skewness. The skewness of the wind speed data at 62 m was 0.416, indicating a positively skewed distribution. The kurtosis value of 3.266 further confirmed a leptokurtic distribution.

Table 4 summarises the Noupoort station, which had a wind speed minimum of 0.7426 m/s and a maximum of 17.3895 m/s, with a mean of 7.6568 m/s throughout the period. The data for Noupoort were also found to be nonstationary, and as a result, we differenced the data before computing the kurtosis and the skewness. The skewness was found to be 0.603, and the kurtosis value was 8.295. This shows that the distribution of the data is positively skewed and leptokurtic.

Lastly, Table 5 presents the wind speed characteristics of the Upington station. The data ranged from a minimum of 0.3693 m/s to a maximum of 16.8912 m/s, with a mean of 5.7308 m/s. Notably, the Upington station recorded the lowest wind speed numbers compared to the other two stations. The data for the Upington station were found to be nonstationary, resulting in the differencing of the data. The resulting skewness value of 0.277 and kurtosis value of 4.2992 confirmed a positively skewed and leptokurtic distribution, respectively.

We carried out a time series decomposition of the wind speed data at 62 m. As shown in Appendix C.1, the data exhibit some daily seasonality for all three stations. See Figure A1, Figure A2 and Figure A3. The primary driver of daily seasonality in wind speeds is the diurnal heating cycle of the earth’s surface, which affects atmospheric pressure and temperature gradients [34].

3.2. Variable Selection

Table 6 summarise the coefficients of the variables selected by the Lasso regression method for each station. These coefficients indicate the estimated effect of each selected variable on the predicted wind speed. Variables with non-zero coefficients are considered significant and influential in predicting wind speed, while variables with zero coefficients are considered less significant and are not included in the model.

3.3. Training Loss for DAN2 Model for All Stations

Figure 8, Figure 9 and Figure 10 show Napier, Noupoort, and Upington stations’ training and validation loss for the DAN2 model. Training and validation loss plots provide insights into a model’s performance during training. Decreasing training loss indicates improved fit to training data, but increasing validation loss suggests over-fitting, where the model struggles to generalise. Upon examining the plots for Napier, Noupoort, and Upington stations, the training and validation loss decreases, indicating no overfitting.

3.4. Forecast Accuracy for the Models

The performance of the individual models, together with the combined forecasts across the three different stations, was evaluated using error metrics, as shown in Table 7. Combining forecasts using GAQR and QRNN models has improved the forecast accuracy, as shown in Table 7 and Figure 11.

Each station’s test set was used to calculate these metrics, which provide insight into the model’s performance. Notably, all stations’ error metrics values of the DAN2 model are higher than those of the benchmark models except the Napier station on the XGBOOST model, where the DAN2 model performed well. While examining each error metric, we noticed differences in performance across the stations. The Upington station has the lowest MAE and RMAE values, with scores of 1.477 and 0.268, respectively, demonstrating better performance when compared to the other two stations.

On the other hand, the Noupoort station has the highest MAE and RMAE values, with scores of 2.348 and 0.305, respectively. Regarding RMSE, the Upington station again showcases better performance, achieving the lowest RMSE of 1.921 but recording the highest RRMSE with a value of 0.331. Meanwhile, the Napier station has the lowest RRMSE among the three stations, with a value of 0.282, and the Noupoort station displays the highest RMSE value of 2.859.

A detailed analysis of the error metrics compared to the benchmark models highlights that the DAN2 performs worse than the Random forest benchmark model. At the same time, it performed better than the XGBOOST model in only one station. All metrics values of the Random forest are consistently closer to zero across all stations than those of the DAN2 model, while the DAN2 models have higher error Metrics values. After comparing the performance of DAN2 models among the stations, it was found that the Upington station has the highest performance compared to other stations.

3.5. Training Loss for CNN Model for All Stations

After using the Lasso method to select variables, a CNN model was trained on the normalised dataset. The same min-max normalisation method was used, and the data split was performed in the same way as in DAN2. However, another step was taken to fit the data into the CNN model. This involved segmenting the time series into fixed-length windows using a sliding window approach to capture sequential patterns. Each window was considered a separate channel, similar to the multi-channel structure of an image. The data within each window were organised into rows and columns, effectively converting the temporal progression into spatial dimensions. These windows were consolidated into a three-dimensional matrix, where each instance corresponded to a window, and the dimensions defined the rows, columns, and channels. Ultimately, this reshaped matrix served as the input to the CNN, allowing the network to decipher spatio-temporal relationships within the time series data.

Figure 12, Figure 13 and Figure 14 show Napier, Noupoort, and Upington stations’ training and validation loss. The training loss and validation are steady at Napier station with few spikes around the tenth and Twentieth epochs. This is observed again at the remaining stations, Noupoort and Upington stations.

3.6. CNN Model Training and Results

Napier, Noupoort and Upington Stations

Figure 15 shows the final plot of the CNN model’s predictions on the test dataset. The model’s predictive capabilities have been validated as it successfully anticipates the data. Additionally, the plot extends into the future, displaying predictions for the next 60 min. This forward projection indicates that the model has the potential to forecast future values with reasonable accuracy.

Figure 16 and Figure 17 show the Noupoort and Upington CNN model on the test set. These two plots confirm the observations that were made at Napier station. The CNN model accurately predicts data spikes, differentiating it from the DAN2 model. Additionally, the CNN model’s proficiency in forecasting unseen data, including periods of volatility, is highlighted by predictions on the test set.

3.7. Forecast Accuracy for CNN Model

Table 7 presents the results of assessing the accuracy of the CNN model using the same metrics as the DAN2 model. The MASE values indicate that the CNN model performs better than the Random forest, with all values very close to zero except the Nourpoort station; Random forest model performed much better than the XGBOOST model and the CNN again demonstrated superiority in two stations when compared to XGboost. Napier performed the best among the stations, with a low MAE of 0.635 and a remarkably low RMAE of 0.796. On the other hand, Noupoort had the highest MAE and RMAE, with values of 2.564 and 1.601, respectively. Looking at RMSE and RRMSE, the Napier station performed the best, with the lowest values of 0.805 and 0.100, respectively. In contrast, Noupoort station had the highest RMSE value of 2.727 and an elevated RRMSE value of 0.354.

The CNN model has demonstrated higher accuracy than all the benchmark models, achieving better results in two stations than all. This is evident from all error metrics at two out of three stations. Among all the stations evaluated, the Napier station has consistently exhibited superior performance across all metrics. This suggests that the Napier station is the best-performing station for the CNN model.

4. Discussion

Wind speed is predicted using four machine learning models in this study. The models used are DAN2 and CNN, random forest and XGBoost. These models were compared to how they performed in three different stations. The dataset was from WASA and covered the period from 1 October 2022, to 1 November 2022.

With the help of descriptive statistics and formal tests, it was discovered that wind speed was not stationary at two Napier and Noupoort stations, and the last station was stationary at Upington. Again, with the help of further testing, it was discovered that wind speed is normally distributed in all three stations. Furthermore, it is shown that wind speed in all three stations is strongly positively correlated with wind speed minimum and wind speed maximum.

Two machine learning models were used across three weather stations to assess their effectiveness in predicting wind speeds under varying weather conditions. The training started with the designated datasets. Hyperparameter tuning was performed by using gradient ascent to identify the optimal hyperparameters. The results were compared with the Two benchmark models, Random Forest and XGBOOST. The results indicate that CNN consistently outperforms the benchmark model at two stations, Napier and Upington stations, for both the Random Forest and XGBOOST benchmark models, which is primarily attributed to all metrics being lower. In contrast, the DAN2 model performed worse than all the benchmark models, and in terms of the stations, the DAN2 models only outperformed XGBoost at the Napier station. Notably, it faces challenges predicting wind speeds in coastal and inland areas.

The DAN2 model results, when compared to benchmark models, are consistent with many studies that have shown that when working with tabular data tree-based models such as Random Forests or Gradient Boosting Machines tend to outperform deep learning models. This performance advantage is often due to the inherent structure of tabular data, where tree-based models excel in capturing interactions and non-linear relationships without requiring extensive feature engineering or large datasets typically needed to optimise deep learning models. As a result, tree-based methods are often the preferred choice for achieving higher predictive accuracy in scenarios where the data are structured and relatively small in scale.

Also, two main factors were found to influence the performance of CNN superiority over benchmark models. First, the size of the dataset was crucial. While tree-based models tend to outperform deep learning approaches with around 10,000 samples, the dataset used in this study was smaller and divided for training and testing. Despite this, CNN showed a better generalisation ability from the limited sample size than the benchmark models.

Secondly, the number of attributes used in the model also played a significant role. CNNs performed better with a larger number of input features. This was reflected in the performance of the Napier and Noupoort stations, which had the most variables selected during feature selection. On the other hand, XGBoost performed best at the Upington station, which had the fewest variables selected during feature selection. This suggests that CNN’s advantage in handling more attributes contributed to its superior performance.

The findings of this study align with the suggestions made by Trebing et al. [13], who demonstrated that a convolutional neural network (CNN), with appropriate architectural modifications, can perform forecasting tasks, including weather forecasting. Trebing’s study highlighted the potential of CNNs in this domain, and our results further verify this assertion.

Additionally, our study confirms the reliability of hybrid machine-learning models involving CNNs, as suggested by Chen et al. [10]. Their research focused on forecasting wind speed in a different context and found that hybrid models incorporating CNNs could achieve high accuracy and reliability. This corroborates our findings and indicates that hybrid approaches are promising for wind speed prediction.

5. Conclusions

As the world struggles with the urgent need to move away from fossil fuels and toward renewable energy sources, such as wind and others, researching and improving the reliability and accuracy of these alternatives becomes important. This study highlights the significance of using advanced machine learning models, specifically comparing a Dynamic architecture for Artificial neural networks (DAN2) and Convolutional neural networks (CNN), in predicting wind speed accurately across different geographical locations and weather patterns. Even though both the CNN and DAN2 models exhibited exceptional accuracy in predicting wind speed at Napier Station and Upington Station, the findings still strongly suggest that the CNN model may be more reliable for wind speed prediction across different weather conditions.

The current approach bears limitations as the models were trained and tested using data from only three stations, which may limit result generalisation. The dataset used did not support the implementation of more advanced models. Future work should expand the dataset to include a broader range of locations and weather patterns to improve robustness. The proposed models can improve the efficiency and reliability of wind energy production, potentially reducing the frequency of load shedding and contributing to a more stable and reliable energy supply in South Africa.

Author Contributions

Conceptualization, F.W.M. and C.S.; methodology, F.W.M.; software, F.W.M.; validation, F.W.M., C.S. and T.R.; formal analysis, F.W.M.; investigation, F.W.M. and C.S.; data curation, F.W.M.; writing—original draft preparation, F.W.M.; writing—review and editing, F.W.M., C.S. and T.R.; visualization, F.W.M.; supervision, C.S. and T.R.; project administration, C.S. and T.R.; funding acquisition, F.W.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the DST-CSIR National e-Science Postgraduate Teaching and Training Platform (NEPTTP) http://www.escience.ac.za/ (accessed on 1 January 2023).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data were obtained from the Wind Atlas South Africa website http://wasadata.csir.co.za/wasa1/WASAData (accessed on 22 July 2024). The analytic data used in the study are hosted on GitHub https://github.com/csigauke (accessed on 22 July 2024).

Acknowledgments

The support of the DST-CSIR National e-Science Postgraduate Teaching and Training Platform (NEPTTP) towards this research is hereby acknowledged. Opinions expressed, and conclusions arrived at are those of the authors and are not necessarily to be attributed to the NEPTTP. In addition, the authors thank the anonymous reviewers for their helpful comments on this paper.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the study’s design, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

ANFIS	Adaptive Neuro-Fuzzy Inference
ANN	Artificial Neural Network
ARMA	Autoregressive—moving-average
BP	Backpropagation
CNN	Convolutional neural network
DAN2	Dynamic Architecture for Artificial Neural Networks
GAQR	Generalised Additive quantile Regression
KPSS	Kwiatkowski–Phillips–Schmidt–Shin
Lasso	Least Absolute Shrinkage and Selection Operator
LSTM	Long Short-Term Memory networks
MAE	Mean Absolute Error
MASE	Mean Absolute Scaled Error
QRNN	Quantile Regression Neural Network
RBF	Radial Basis Function
RMAE	Relative Absolute Percentage Error
RMSE	Root Mean Squared Error
RRMSE	Relative Root Mean Square Error
WASA	Wind atlas for South Africa
WMO	World Meteorological Organization
WWEA	World Wind Energy Association

Appendix A. Models Configurations

Appendix A.1. DAN2

The Configurations of Dan2 are as follows: a dynamic architecture with three dynamic layers is utilised. The dynamic layers have a growth rate of 0.5 and activation thresholds for layer addition of 0.1 and removal of 0.05. Where we started with around 100 neurons in these layers. Optimisation uses Adam with a learning rate of 0.001, Beta 1 of 0.9, Beta 2 of 0.999, and an epsilon value of 1 × 10⁻⁷ to ensure efficient convergence. Regularisation techniques include a dropout rate 0.25 for dynamic layers and batch normalisation after each dynamic layer to enhance model stability. For performance evaluation, MSE is used as the loss function. Batch size and epochs are 32 for batch size and around 20 epochs. Early stopping mechanisms are in place, monitoring validation loss, set at ten epochs to ensure optimal model training and prevent overfitting.

Appendix A.2. CNN

The CNN configuration utilises Adam as the optimiser with a learning rate of 0.001, Beta 1 of 0.9, Beta 2 of 0.999, and an epsilon value of 1 × 10⁻⁷ for numerical stability. The model architecture includes four convolutional layers with filter sizes ranging from 32 and three 3 × 3 kernel size max-pooling layers with 2 × 2 pooling size. Additionally, two dense layers with ReLU activation and corresponding dropout layers (dropout rates of 0.25 for convolutional and 0.5 for dense layers) are incorporated to prevent overfitting, and the epoch numbers are 50. Batch normalisation is strategically placed after each convolutional and dense layer to expedite training and enhance generalisation. Since it is a regression task, the output activation remains linear, and the chosen loss function is MSE with a batch size of 32 and early stopping based on validation loss.

Appendix B

Appendix B.1. List of Covariates Used in the Study

diff1—This variable represents the first difference of the wind speed (diff1 $= W_{t} - W_{t - 1}$ ), derived from historical wind speed data. It serves as one of the predictors or explanatory variables in the analysis, potentially indicating the effect of past wind speed on the current wind speed.
diff2—Similar to diff1, this variable represents the second wind speed difference (diff2 $= W_{t} - W_{t - 2}$ ), derived from historical data. It is another predictor variable used to examine the influence of wind speed in the previous period on the current wind speed.
noltrend—The noltrend variable is derived from a cubic regression spline model. In this context, it likely captures the trend component of the data after removing any nonlinear patterns through regression splines.
WS_62_min—represents the minimum wind speed recorded at the stations. Wind speed measures how fast the air is moving at a particular location. In this case, it specifically refers to the wind speed measured at a height of 62 m above the ground.
WS_62_max—represents the maximum wind speed recorded at the stations.
WS_62_stdv—refers to the standard deviation of wind speeds measured 62 m above the ground at the stations.
Tair_mean represents the stations’ mean (average) air temperature. Air temperature refers to the measure of the warmth or coldness of the air in a particular location.
Tair_min—represents the minimum air temperature at the stations.
Tair_max—represents the highest air temperature ever recorded at the stations.
Tair_stdv—represents the standard deviation of air temperature at the stations. The standard deviation is a statistical measure that quantifies the amount of variability or dispersion in a set of values.
Tgrad_mean—this represents the average temperature gradient at the stations. Temperature gradient reflects the speed of temperature alteration relative to distance or height.
Tgrad_min—represents the minimum temperature gradient at the stations.
Tgrad_max—represents the highest temperature gradient recorded at the stations.
Tgrad_stdv—represents the standard deviation of the temperature gradient at the stations. The variable helps to understand how much the temperature gradients vary from the average value.
Pbaro_mean—represents the average barometric pressure at the Napier station. Barometric pressure, also called atmospheric pressure, is the force exerted by the weight of the air above a specific area.
Pbaro_min—represents the lowest barometric pressure recorded at the stations during the day.
Pbaro_max—represents the highest barometric pressure recorded at the station during the day.
Pbaro_stdv—represents the variation or dispersion in the barometric pressure values at the station.
RH_mean represents the stations’ mean (average) relative humidity. Relative humidity measures the amount of moisture in the air relative to the maximum amount of moisture the air can hold at a given temperature.
RH_min—represents the minimum relative humidity at the stations. Relative humidity is typically expressed as a percentage (%), with 100% indicating that the air is saturated with moisture and lower percentages indicating drier air.
RH_max—represents the highest relative humidity recorded at the stations during the day.
RH_stdv—represents the variation or dispersion in the relative humidity values at the stations.

Appendix C. Time Series Decomposition at the Three Stations

Appendix C.1. Multiplicative Time Series Decomposition of Wind Speed at 62 m at the Three Stations

Figure A1. Time series decomposition of wind speed data at station Napier.

Figure A2. Time series decomposition of wind speed data at station Noupoort.

Figure A3. Time series decomposition of wind speed data at station Upington.

References

Wiser, R.; Lantz, E.; Mai, T.; Zayas, J.; DeMeo, E.; Eugeni, E.; Lin-Powers, J.; Tusing, R. Wind vision: A new era for wind power in the united states. Electr. J. 2015, 28, 120–132. [Google Scholar] [CrossRef]
Tiseo, I. Annual Carbon Dioxide (CO₂) Emissions Worldwide from 1940 to 2023. 2023. Available online: https://www.statista.com/statistics/276629/global-co2-emissions (accessed on 28 February 2023).
World Energy Investment 2023, IEA, Paris, Licence: CC BY 4.0. Available online: https://www.iea.org/reports/world-energy-investment-2023 (accessed on 3 April 2023).
Renewable Capacity Statistics 2024, International Renewable Energy Agency, Abu Dhabi. Available online: https://www.irena.org/-/media/Files/IRENA/Agency/Publication/2024/Mar/IRENA_RE_Capacity_Statistics_2024.pdf?rev=a587503ac9a2435c8d13e40081d2ec34 (accessed on 26 April 2023).
Klein, R.; Celik, T. The Wits Intelligent Teaching System: Detecting student engagement during lectures using Convolutional Neural Networks. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 2856–2860. [Google Scholar]
Li, G.; Shi, J. On comparing three artificial neural networks for wind speed forecasting. Appl. Energy 2010, 87, 2313–2320. [Google Scholar] [CrossRef]
Mathew, S. Wind Energy: Fundamentals, Resource Analysis and Economics; Springer: Berlin/Heidelberg, Germany, 2006; Volume 1. [Google Scholar]
Antor, A.F.; Wollega, E.D. Comparison of machine learning algorithms for wind speed prediction. In Proceedings of the International Conference on Industrial Engineering and Operations Management, Dubai, United Arab Emirates, 10–12 March 2020; pp. 857–866. [Google Scholar]
Shen, Z.; Fan, X.; Zhang, L.; Yu, H. Wind speed prediction of unmanned sailboat based on CNN and LSTM hybrid neural network. Ocean Eng. 2022, 254, 111352. [Google Scholar] [CrossRef]
Chen, Q.; Folly, K.A. Comparison of three methods for short-term wind power forecasting. In Proceedings of the 2018 International Joint Conference on Neural Networks, IEEE, Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
Ghiassi, M.; Saidane, H.; Zimbra, D.K. A dynamic artificial neural network model for forecasting time series events. Int. J. Forecast. 2005, 21, 341–362. [Google Scholar] [CrossRef]
Ghiassi, M.; Zimbra, D.K.; Saidane, H. Medium term system load forecasting with a dynamic artificial neural network model. Electr. Power Syst. Res. 2006, 76, 302–316. [Google Scholar] [CrossRef]
Trebing, K.; Mehrkanoon, S. Wind speed prediction using multidimensional convolutional neural networks. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 1–4 December 2020; pp. 713–720. [Google Scholar]
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Rosenblatt, F. The perceptron: A probabilistic model for information storage and organisation in the brain. Psychol. Rev. 1958, 65, 386–408. [Google Scholar] [CrossRef]
Novickis, R.; Justs, D.J.; Ozols, K.; Greitāns, M. An Approach of Feed-Forward Neural Network Throughput-Optimized Implementation in FPGA. Electronics 2020, 9, 2193. [Google Scholar] [CrossRef]
Daniel, L.O.; Sigauke, C.; Chibaya, C.; Mbuvha, R. Short-term wind speed forecasting using statistical and machine learning methods. Algorithms 2020, 13, 132. [Google Scholar] [CrossRef]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef]
Han, Y.; Huang, G.; Song, S.; Yang, L.; Wang, H.; Wang, Y. Dynamic Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7436–7456. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, J.; Zhang, Y.; Huang, C.; Wang, L. Short-term wind speed forecasting based on information of neighboring wind farms. IEEE Access 2020, 8, 16760–16770. [Google Scholar] [CrossRef]
Ho, C.-Y.; Cheng, K.-S.; Ang, C.-H. Utilising the random forest method for short-term wind speed forecasting in the coastal area of central Taiwan. Energies 2023, 16, 1374. [Google Scholar] [CrossRef]
Lahouar, A.; Slama, J.B.H. Hour-ahead wind power forecast based on random forests. Renew. Energy 2017, 109, 529–541. [Google Scholar] [CrossRef]
Wolpert, D.H.; Macready, W.G. No Free Lunch Theorems for Optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef]
Wang, X.; Hyndman, R.J.; Li, F.; Kang, Y. Forecast combinations: An over 50-year review. Int. J. Forecast. 2023, 39, 1518–1547. [Google Scholar] [CrossRef]
Gaillard, P.; Goude, Y.; Nedellec, R. Additive models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting. Int. J. Forecast. 2016, 32, 1038–1050. [Google Scholar] [CrossRef]
Fasiolo, M.; Wooda, S.N.; Zaffranb, M.; Nedellecc, R.; Goudec, Y. Fast calibrated additive quantile regression. J. Am. Stat. Assoc. 2020, 116, 1402–1412. [Google Scholar] [CrossRef]
Zhang, W.; Quan, H.; Srinivasan, D. An improved quantile regression neural network for probabilistic load forecasting. IEEE Trans. Smart Grid 2018, 10, 4425–4434. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Bucci, A.; He, L.; Liu, Z. Combining dimensionality reduction methods with neural networks for realized volatility forecasting. Ann. Oper. Res. 2023, 1–29. [Google Scholar] [CrossRef]
Aziz, A.; Tsuanyo, D.; Nsouandele, J.; Mamate, I.; Mouangue, R.; Abiama, P.E. Influence of Weibull parameters on the estimation of wind energy potential. Sustain. Energy Res. 2023, 10, 5. [Google Scholar] [CrossRef]
Shi, H.; Dong, Z.; Xiao, N.; Huang, Q. Wind Speed Distributions Used in Wind Energy Assessment: A Review. Front. Energy Res. 2021, 9, 769920. [Google Scholar] [CrossRef]
Pang, W.-K.; Forster, J.J.; Troutt, M.D. Estimation of Wind Speed Distribution Using Markov Chain Monte Carlo Techniques. J. Appl. Meteorol. Climatol. 2001, 40, 1476–1484. [Google Scholar] [CrossRef]
NORCAST Weather. The Diurnal Wind Cycle: Why Is It Windier During the Day Than at Night? Available online: https://norcast.tv/the-diurnal-wind-cycle-why-is-it-windier-during-the-day-than-at-night/ (accessed on 19 July 2024).

Figure 1. Station map. Source: The map was made using the Google Earth app, and data from the Wasa website were used, which are accessible at https://www.wasaproject.info/ (accessed on 12 September 2023).

Figure 2. Structure of multilayer feed-forward ANN. (Source: [16]).

Figure 3. Structure of CNN. (Source: [21]).

Figure 4. (Top panel): Time series plot of wind speed mean on Napier station. (Middle panel): Time series plot of wind speed mean on Noupoort station. (Bottom panel): Time series plot of wind speed mean on Upington station.

Figure 5. Distribution of daily average wind speed at 62 m at Napier station. The circles are some extreme observations in the data.

Figure 6. Distribution of daily average wind speed at 62 m at Noupoort station. The circles are some extreme observations in the data.

Figure 7. Distribution of daily average wind speed at 62 m at Upington station. The circles are some extreme observations in the data.

Figure 8. Training loss Napier station.

Figure 9. Training loss Noupoort station.

Figure 10. Training loss Upington station.

Figure 11. Plots of Actual and combined forecasts using QRNN.

Figure 12. Training and validation loss Napier station.

Figure 13. Training and validation loss Noupoort station.

Figure 14. Training and validation loss Upington station.

Figure 15. Test set and future prediction Napier station.

Figure 16. Test set and future prediction Noupoort station.

Figure 17. Test set prediction Upington station.

Table 1. Distance matrix.

	Napier	Noupoort	Upington
Napier	0	832	1041
Noupoort	832	0	866
Upington	1041	866	0

Table 2. Fitting of parametric distributions to the data at the three stations.

	Normal	Log Normal	Weibull	Gamma
Napier (WM05)
AIC	24,084.74	25,508.12	24,015.61	24,479.79
BIC	24,097.55	25,520.93	24,028.42	24,492.60
Noupoort (WM09)
AIC	22,451.71	22,864.58	22,308.40	22,446.20
BIC	22,464.51	22,877.39	22,321.21	22,459
Upington (WM19)
AIC	20,394.45	20,731.85	20,194.20	20,320.98
BIC	20,407.26	20,744.66	20,207.01	20,333.79

Table 3. Summary of statistics Napier station.

Variables	Min	Q1	Median	Mean	Q3	Max
WS 62 mean	0.2075	5.3707	8.0980	8.1546	10.7587	18.1209
diff1	−3.672	−0.3843	−0.006	0.0013	10.3471	4.5360
diff2	−5.9052	−0.5083	−0.0124	0.0024	0.5018	5.080
noltrend	0.4194	5.4067	8.1558	8.0186	10.6570	15.6493
WS 62 min	0.2075	3.9265	5.9410	6.0726	8.2654	13.8439
WS 62 max	0.2075	6.7158	9.8150	9.9529	12.6043	21.2820
WS 62 stdv	0.0000	0.4208	0.7302	0.7565	1.0407	2.1862
Tair mean	0.05	12.67	14.14	14.29	15.66	27.54
Tair min	−0.96	12.55	14.00	14.12	15.44	26.32
Tair max	0.33	12.80	14.35	14.49	15.84	28.52
Tair stdv	0.0080	0.0352	0.0544	0.0859	0.1056	6.2100
Tgrad mean	−1.7170	−0.9450	−0.3370	−0.3394	0.0822	5.3090
Tgrad min	−2.3590	−1.1870	−0.4390	−0.5158	−0.0100	4.5340
Tgrad max	−1.4360	−0.7240	−0.2960	−0.1777	0.2050	6.3590
Tgrad stdv	0	0.0310	0.0680	0.0869	0.1230	1.6610
Pbaro mean	975.5	981.9	984.3	984.3	986.8	992.5
Pbaro min	975.4	981.7	984.0	984.1	986.6	992.3
Pbaro max	975.6	982.1	984.5	984.4	987.0	994.1
Pbaro stdv	0.0345	0.0517	0.0615	0.0688	0.0768	0.4847
RH mean	0.3731	67.1750	80.00	76.0780	90.600	100.0
RH min	0	64.34	78.03	72.65	89.70	100.00
RH max	0.4761	69.7800	82.6000	78.9211	92.8000	100.00
RH stdv	0.0073	0.1532	0.4781	2.0726	0.9190	49.8000

Table 4. Summary of statistics Noupoort station.

Variables	Min	Q1	Median	Mean	Q3	Max
WS 62 mean	0.7426	5.4502	7.5723	7.6568	9.5766	17.3895
diff1	−6.7801	−0.4461	−0.0210	0.0007	0.4089	8.8909
diff2	−6.7107	−0.6059	−0.0434	0.0012	0.5449	10.2386
noltrend	2.3325	5.5389	7.5344	7.6575	9.4338	15.3806
WS 62 min	0.2148	3.9322	5.4812	5.6039	7.0301	14.1553
WS 62 max	1.454	6.720	9.199	9.672	11.987	23.139
WS 62 stdv	0.1252	0.4461	0.7215	0.8142	1.0776	4.1196
Tair mean	4.46	13.25	16.36	16.42	19.91	27.44
Tair min	4.37	13.00	16.14	16.21	19.66	27.27
Tair max	4.57	13.53	16.61	16.67	20.12	27.74
Tair stdv	0.01190	0.0526	0.0859	0.1169	0.1384	2.7570
Tgrad mean	−1.5090	−0.8410	−0.3015	−0.0134	0.5712	8.6500
Tgrad min	−2.0680	−1.0760	−0.4370	−0.2633	0.3460	7.5830
Tgrad max	−1.2180	−0.6500	−0.1530	0.2275	0.8460	9.2700
Tgrad stdv	0.0000	0.0690	0.1130	0.1334	0.1650	2.4420
Pbaro mean	815.8	821.4	822.9	822.8	824.6	828.2
Pbaro min	815.3	821.2	822.7	822.7	824.4	828.1
Pbaro max	816.1	821.6	823.1	823.1	824.9	834.6
Pbaro stdv	0.0386	0.0572	0.0660	0.0748	0.0819	0.7640
RH mean	4.63	26.11	48.02	50.97	73.38	100.00
RH min	4.337	24.625	45.320	49.124	70.748	100.00
RH max	4.88	27.92	50.30	52.73	75.86	100.00
RH stdv	0.0137	0.2652	0.5501	0.8811	1.0413	18.6200

Table 5. Summary of statistics Upington station.

Variables	Min	Q1	Median	Mean	Q3	Max
WS 62 mean	0.3693	3.9306	5.6373	5.7308	7.3684	16.8912
diff1	−4.1385	−0.4724	−0.0062	−0.0000	0.4537	7.7245
diff2	−6.8561	−0.6306	0	0.0000	0.6216	10.2096
noltrend	1.2370	4.2517	5.6724	5.7299	7.1200	12.2634
WS 62 min	0.1891	2.0538	3.9186	3.9350	5.4726	11.9993
WS 62 max	0.8106	5.4726	7.3373	7.5899	9.2021	24.1203
WS 62 stdv	0.1193	0.3996	0.6645	0.7589	1.0276	4.7676
Tair mean	11.20	22.72	26.93	26.55	30.72	37.29
Tair min	11.01	22.36	26.59	26.24	30.39	36.98
Tair max	11.55	23.33	27.56	27.16	31.36	38.05
Tair stdv	0.0731	0.1069	0.1373	0.1704	0.1939	2.117
Tgrad mean	−1.5270	−0.8290	0.0760	0.8828	2.0688	11.2300
Tgrad min	−2.375	−1.107	−0.066	0.576	1.712	10.960
Tgrad max	−1.183	−0.571	0.236	1.169	2.391	11.440
Tgrad stdv	0.0090	0.0750	0.1280	0.1588	0.1930	1.9500
Pbaro mean	907.8	913.8	915.2	915.2	916.9	921.4
Pbaro min	907.8	913.5	915.1	915.0	916.6	921.2
Pbaro max	908.2	914.0	915.4	915.4	917.1	921.7
Pbaro stdv	0.0559	0.0818	0.0895	0.0932	0.0991	0.3305
RH mean	3.85	9.94	17.18	22.40	31.01	93.00
RH min	3.599	9.527	16.510	21.691	30.225	92.300
RH max	4.019	10.370	17.740	23.130	32.072	93.300
RH stdv	0.0314	0.1117	0.2061	0.3546	0.3995	7.0850

Table 6. Variable selection table.

Napier Station
Variables	Coeff
$d i f f 1$	0.0415
$d i f f 2$	0.3786
$n o l t r e n d$	3.4182
$W S_62_s t d v$	0.0541
$T a i r_m e a n$	−0.0200
$T a i r_m i n$	−0.0032
$T a i r_m a x$	−0.0113
$T g r a d_m e a n$	−0.0175
$T g r a d_m a x$	0.0529
$T g r a d_s t d v$	−0.0126
$P b a r o_m i n$	−0.0305
$P b a r o_s t d v$	0.0511
$R H_m i n$	0.0293
$R H_s t d v$	0.0372
Noupoort Station
Variables	Coeff
$d i f f 1$	0.0537
$d i f f 2$	0.5167
$n o l t r e n d$	2.6803
$W S_62_s t d v$	0.1136
$T a i r_m e a n$	−0.0610
$T a i r_s t d v$	−0.0200
$T g r a d_m i n$	−0.0656
$T g r a d_m a x$	0.1162
$P b a r o_m e a n$	0.6281
$P b a r o_m i n$	−0.3018
$P b a r o_m a x$	−0.3392
$P b a r o_s t d v$	0.0877
$R H_m i n$	0.0258
$R H_s t d v$	−0.0756
Upington Station
Variables	Coeff
$d i f f 1$	0.0595
$d i f f 2$	0.5506
$n o l t r e n d$	2.0136
$W S_62_s t d v$	0.2308
$T a i r_m e a n$	−0.0995
$T a i r_s t d v$	−0.0041
$T g r a d_m a x$	0.1406
$T g r a d_s t d v$	−0.0483
$P b a r o_m i n$	−0.0199
$P b a r o_s t d v$	−0.0068
$R H_m a x$	0.0071
$R H_s t d v$	−0.0366

Table 7. Forecast evaluation.

Stations	MAE	RMAE	RMSE	RRMSE	MASE
DAN2
Napier	1.737	0.280	2.26	0.282	0.437
Noupoort	2.348	0.305	2.859	0.331	0.768
Upington	1.477	0.268	1.921	0.348	0.477
Random forest
Napier	0.923	0.9608	1.162	0.073	0.224
Noupoort	1.466	1.2109	1.877	0.115	0.436
Upington	0.940	0.969	1.2504	0.0898	0.3549
XGBoost
Napier	0.5392	0.0673	0.6957	0.0869	0.5392
Noupoort	0.6308	0.0820	0.8590	0.1116	0.6308
Upington	0.2031	0.0369	0.2841	0.0516	0.284
CNN
Napier	0.635	0.796	0.805	0.100	0.150
Noupoort	2.564	1.601	2.727	0.354	0.747
Upington	0.7414	0.861	0.9810	0.1781	0.2031
Combined forecasts using GAQR model
Napier	0.482	0.060	0.627	7.834	0.114
Noupoort	0.605	0.079	0.816	10.599	0.176
Upington	0.605	0.110	0.825	14.972	0.232
Combined forecasts using QRNN model
Napier	0.481	0.060	0.626	7.818	0.114
Noupoort	0.600	0.078	0.815	10.595	0.175
Upington	0.602	0.109	0.815	14.794	0.231

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mugware, F.W.; Sigauke, C.; Ravele, T. Evaluating Wind Speed Forecasting Models: A Comparative Study of CNN, DAN2, Random Forest and XGBOOST in Diverse South African Weather Conditions. Forecasting 2024, 6, 672-699. https://doi.org/10.3390/forecast6030035

AMA Style

Mugware FW, Sigauke C, Ravele T. Evaluating Wind Speed Forecasting Models: A Comparative Study of CNN, DAN2, Random Forest and XGBOOST in Diverse South African Weather Conditions. Forecasting. 2024; 6(3):672-699. https://doi.org/10.3390/forecast6030035

Chicago/Turabian Style

Mugware, Fhulufhelo Walter, Caston Sigauke, and Thakhani Ravele. 2024. "Evaluating Wind Speed Forecasting Models: A Comparative Study of CNN, DAN2, Random Forest and XGBOOST in Diverse South African Weather Conditions" Forecasting 6, no. 3: 672-699. https://doi.org/10.3390/forecast6030035

Article Menu

Evaluating Wind Speed Forecasting Models: A Comparative Study of CNN, DAN2, Random Forest and XGBOOST in Diverse South African Weather Conditions

Abstract

1. Introduction

1.1. Overview

1.2. Literature Review

1.3. Research Highlights and Contributions

2. Methods

2.1. Study Area

2.2. Models

2.2.1. Artificial Neural Networks

2.2.2. Dynamic Architecture for Artificial Neural Networks

2.2.3. Convolutional Neural Network

2.2.4. Random Forest

2.2.5. XGboost

2.3. Forecast Combination Using Quantile Regression Averaging

2.3.1. Generalised Additive Quantile Regression Model

2.3.2. Quantile Regression Neural Network

2.4. Variable Selection

2.5. Metrics for Evaluating Forecasts

3. Empirical Results

3.1. Exploratory Data Analysis

3.2. Variable Selection

3.3. Training Loss for DAN2 Model for All Stations

3.4. Forecast Accuracy for the Models

3.5. Training Loss for CNN Model for All Stations

3.6. CNN Model Training and Results

Napier, Noupoort and Upington Stations

3.7. Forecast Accuracy for CNN Model

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Models Configurations

Appendix A.1. DAN2

Appendix A.2. CNN

Appendix B

Appendix B.1. List of Covariates Used in the Study

Appendix C. Time Series Decomposition at the Three Stations

Appendix C.1. Multiplicative Time Series Decomposition of Wind Speed at 62 m at the Three Stations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI