Next Article in Journal
Research on Monitoring and Early Warning of the Mine Backfill System Based on Blockchain Technology
Previous Article in Journal
Color Structured Light Stripe Edge Detection Method Based on Generative Adversarial Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spatio-Temporal Forecasting of Global Horizontal Irradiance Using Bayesian Inference

Department of Mathematical and Computational Sciences, University of Venda, Private Bag X5050, Thohoyandou 0950, South Africa
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2023, 13(1), 201; https://doi.org/10.3390/app13010201
Submission received: 18 November 2022 / Revised: 13 December 2022 / Accepted: 16 December 2022 / Published: 23 December 2022

Abstract

:
Accurate global horizontal irradiance (GHI) forecasting promotes power grid stability. Most of the research on solar irradiance forecasting has been based on a single-site analysis. It is crucial to explore multisite modeling to capture variations in weather conditions between various sites, thereby producing a more robust model. In this research, we propose the use of spatial regression coupled with Gaussian Process Regression (GP Spatial) and the GP Autoregressive Spatial model (GP-AR Spatial) for the prediction of GHI using data from seven radiometric stations from South Africa and one from Namibia. The results of the proposed methods were compared with a benchmark model, the Linear Spatial Temporal Regression (LSTR) model. Five validation sets each comprised of three stations were chosen. For each validation set, the remaining five stations were used for training. Based on root mean square error, the GP model gave the most accurate forecasts across the validation sets. These results were confirmed by the statistical significance tests using the Giacommini–White test. In terms of coverage probability, there was a 100% coverage on three validation sets and the other two had 97% and 99%. The GP model dominated the other two models. One of the study’s contributions is using standardized forecasts and including a nonlinear trend covariate, which improved the accuracy of the forecasts. The forecasts were combined using a monotone composite quantile regression neural network and a quantile generalized additive model. This modeling framework could be useful to power utility companies in making informed decisions when planning power grid management, including large-scale solar power integration onto the power grid.

1. Introduction

1.1. Overview

The use of energy sources that are clean and renewable has been on the rise, hence the need to efficiently manage the power grid. This increase has led to the need to come up with predictions of the available energy sources to ease the management of the power grid. Power grid planning is a challenging process, as it requires efficient and accurate power predictions inputs. In this research, we focused on probabilistic, short-term forecasting to efficiently model GHI. Probability forecasting is an emerging research area that handles the uncertainty of weather variables, which produces more comprehensive results [1]. This probability approach was combined with spatial analysis to assess the relatedness of meteorological stations. Spatial analysis is a very important aspect in regression, and in this study it was used to explore spatial dependence between various meteorological stations.

1.2. Survey of Related Literature

Some authors have applied various methods incorporating spatial analysis. Andre et al. [2] used a spatial–temporal model for short-term solar irradiation forecasting. They used a spatial–temporal vector autoregressive model, which was designed in such a way that it could handle sparse spatial–temporal data. An iterative strategy in the model process selected related stations and eliminated insignificant predictor variables.
In a study by Liu et al. [3], an ensemble-temporal deep learning approach incorporating multi-sites in a spatial power grid was used to predict solar power. Variational modeling was used to predict uncertainty, and the proposed methods estimated uncertainty very well. Yang [4] used the ultra-fast pre-selection method to solve the lasso problem, that of having an insufficient degree of freedom and curse of dimensionality. The variables selected via ultra-fast were then used to develop a lasso-temporal model. Their results showed that their proposed algorithm did not need meteorological priors and provided the best forecasts.
Another study by Eschenbach et al. [5] was based on forecasting solar irradiation based on various machine-learning methods. The methods used were ARX (autoregressive with exogenous inputs), NN (Neural Network), RRF (random regression forest), and RT (regression trees). Their results showed that for a short lead time, NN produced more accurate results as well as for dense-temporal input data.
Kim et al. [6] applied a temporal method that combined public and satellite data. Based on a satellite from South Korea, SVR (Support Vector Regression), ANN (Artificial Neural Network), ARIMAX (Autoregressive Moving Average), and DNN (Deep Neural Network) were used based on short interval forecasts of GHI data. Their results showed that the models based on temporal and spatial characteristics gave better forecasts than the other models that were based on numerical data using weather variables.
Zhang et al. [7] applied a Spatial–Temporal Gaussian process state space model with the Kronecker structure on weather data from Colorado and Global Historical Climatology Network. The other objective was to come up with a kernel that will be used for the Colorado and GHCN (Global Historical Climatology Network) data. To estimate the hyper parameters of the Spatial–Temporal Gaussian model, the Kalman Filter and smoother was used. The results showed that the forecasting performance improved from the results by [8] based on the desirable properties of the Matern kernel used for both datasets.
Hamelinjck et al. [9] applied variational spatio-temporal regression coupled with Gaussian processes. A non-conjugate GP technique produced a sparse state model using separable Markov kernels. The results of the proposed methodology proved to be more accurate and efficient due to filtering parallelization of spatial locations, sparsity, and application of variational Gaussian process analysis. The proposed method outperformed the baseline methods.
Agoua et al. [10] forecasted photovoltaics using a probabilistic spatial–temporal model approach that used datasets from plants that were close to each other. This technique incorporates short dry forecasting of periods of 0–6 h. Quantile regression was combined with Lasso, which was used for variable selection. The proposed methodology showed superior performance in comparison to KDE (Kernel Density Estimation), which was used as the benchmark model.
Banerjee et al. [11] developed a temporal–spatial model using predictive process modeling. The method reduced computational burden by reducing the modeling space to a lower dimensional subspace. The results of the proposed methodology addressed the problem of misspecification of the model using a larger dataset, which was achieved by applying the induced specification predictive process. Luttinen et al. [12] did a Bayesian analysis using sea surface temperature data. They used a probabilistic factor analysis using spatiotemporal data. Gaussian process priors were used for the factors and loading matrix. According to their results, the Gaussian Process Factor Analysis outperformed the Bayesian Principal Component analysis.
Tomizawa and Yoshida [13] applied Gaussian Process regression with various Gaussian random fields to data with problems of spatial variability. The maximum likelihood method was used to estimate the random and fluctuating fields’ scale. Their results showed that the model with the Whittle Matern kernel was the best for the random component. Comber et al. [14] used Gaussian Process splines regression taking into account the variational of geographical areas. The technique is a smoothing parameter used in splines regression combined with Gaussian Processes, optimizing the GP splines regression. The model showed predictions that were more accurate because of accommodating heterogeneity.
Another research was done by Najimbi et al. [15], they used probabilistic GPR to forecast solar power using meteorological data. Short-term forecasting was used based on k-means clustering. A Matern 5 / 2 covariance function was used. A 5-fold validity test set, holding out 30 random days, was used to validate the applied method. Root mean square error was reduced due to the application of the proposed methodology. In another paper, Najimbi et al. [16] predicted solar power using weather variables. Eight different partitions were used to cluster the data using k-means clustering, and GPR was used based on the Matern 5 / 2 covariance function. The Elbow and Gap techniques were used to develop optimal clusters, and the results showed that the forecasting error was reduced.
Paiva et al. [17] performed a comparative analysis on multilayer perceptron (MLP) artificial neural networks and multigene genetic programming (MGGP). The assessments indicated that MGGP gave more accurate and fast results in single predictions, and the ANN performed better for ensemble forecasts. Wang et al. [18] used a cluster-based analysis on ultra-short-term wind power using a hierarchical directed graph method and dynamic–temporal correlation. They first defined three nodes based on wind power, wind speed, and target nodes, and defined input sample and correlation matrices that are temporal-based to evaluate the correlation of neighboring wind farms. They also used directed edges to connect various nodes to obtain a hierarchical-based graph form, which was later applied to train the prediction model. The proposed model outperformed the other benchmark models used.
The world meteorological organization produced a guide highlighting the major sources of uncertainty in forecasted values related to the weather as the process of producing a forecast, its interpretation, and atmospheric unpredictability (Gill et al. [19]). Most of the studies reviewed in this research on forecasting global horizontal irradiance (GHI) using spatial regression considered only the spatial feature, leaving out weather variables and the variability in GHI. Additionally, most classical approaches for predicting GHI rely mainly on a single power plant. The current study focuses on a multi-site approach to modeling and forecasting GHI using Gaussian process models and including a nonlinear trend covariate. A summary of previous studies on modeling solar radiation based on spatial analysis is given in Table 1.

1.3. Research Highlights and Contributions

The current study presents an in-depth analysis and spatial–temporal predictive modeling of GHI at radiometric stations in sparse geographical regions. To the best of our knowledge, this is the first study to be carried out using South African data. The study proposes the standardization of predictions, which improves forecast accuracy. Another highlight is in the selection of the validation sets. Eight radiometric stations were used in the study, with three locations in each validation set. This resulted in fifty six possible validation sets. From the randomly selected five sets, an in-depth analysis was carried out to investigate how the proposed models would perform, especially those stations that are far apart in Euclidean space.
Based on the literature review discussed in Section 1.2, the highlights and contributions made in this study are summarized as follows. The research focused on moving from a single site to a multi-site forecasting approach, that is, using geographically sparse data. Multi-site research is expected to give a diverse or large sample that is good enough to come up with significant associations between meteorological stations; hence, it enhances the statistical power of the model. Previously, solar forecasting has been based on single-site analysis. Exploring the temporal forecasting approach is expected to improve modeling accuracy.
We predicted solar irradiation using spatial regression coupled with Gaussian process modeling using Bayesian inference. GP regression has proved to be a very powerful tool for modeling the variability and uncertainty of solar irradiation [20], though it is very computationally expensive. The spatial analysis then improves the predictive accuracy since it reduces the computational burden in regression analysis. The combination of these two methods thus produces a hybrid model that produces favorable results.
A comparative analysis was done between the proposed models, GP Spatial and GP-AR Spatial, and the linear spatial model, which was used as a benchmark model. The evaluation metrics used to assess the accuracy of the models are MAE (Mean Absolute Error), RMSE (Root Mean Square Error), CP (Coverage Probability) and CRPS (Continuous Rank Probability Score).
Standardization of the forecasts made from the selected model was also done in this research. The standardized forecasts were then added to the original forecasts, improving the forecast accuracy. During forecasting, multicollinearity can be introduced if independent variables are correlated. This problem compromises the model’s statistical significance and produces imprecise coefficients. To overcome this problem, we used ElaticNet, one of the shrinkage methods used in variable selection. ElaticNet is a supervised algorithm that identifies the variables strongly associated with the response variable. The nonlinear rend variable is another important variable that resulted in a significant improvement in the forecast accuracy. Finally, we combined forecasts using QGAM (Quantile Generalized Additive Model) and MCQRNN (Monotone Composite Quantile Regression Neural Networks). Combining forecasts is a very effective tool for reducing errors.
The following sections are organized as follows. Section 2 briefly discusses the temporal GP and GP-temporal autoregressive models, including the benchmark model Linear-temporal Regression. The section also presents a discussion of Bayesian inference, which is used in the computation of the parameters. The results of the exploratory analysis of GHI proposed methods and the benchmark model are given in Section 3. The conclusion is given in Section 4.

2. Methodology

2.1. Flowchart

The flow chart of the structure of the paper outlining the proposed models and evaluation metrics is shown in Figure 1. Two proposed models, the spatial–temporal Gaussian (GP) regression model and GP coupled with autoregressive (GP-AR) error terms, are compared with a benchmark model, the spatial–temporal linear regression model. The performance of the models is evaluated using the root mean square error (RMSE), mean absolute error (MAE), and skill score (SS).

2.2. Variable Selection

In order to select significant variables for the analysis, ElasticNet, a regularization technique, was used [21]. ElasticNet is a shrinkage method that uses penalties from ridge regression ( L 2 ) and Lasso regression ( L 1 ) to reduce the number of variables in a regression model. It combines both ridge and Lasso methods to overcome their limitations. Elastic net minimizes the function in Equation (1).
L ( β ^ ) = i = 1 n ( y i x i 1 β ^ ) 2 + λ β ^ j 2 + α j = 1 m | β j | ,
where α is the mixing parameter that is between lasso, α = 1 and ridge, α = 0 , λ is the regularization penalty parameter, y i is the response variable, and β ^ is a vector of parameters.

2.3. Spatio-Temporal Gaussian Process Regression Model

Gaussian processes were developed by Williams and Rasmussen [22]. Gaussian processes can be presumed as elements of spatial–temporal modeling because these stochastic processes are defined, in this case, over a given region where the stations have different locations. The spatial analysis aims to develop the best model to produce a set of outputs using inputs from various locations at different time frames. A temporal Gaussian process (GP) model is given in Equation (2).
Y ( s i , t ) = x T ( s i , t ) β + w ( s i , t ) + ε ( s i , t ) ,
where w ( s i , t ) is the spatially-dependent error term (spatio-temporal process), Y ( s i , t ) represents the response variable (GHI) at station s i at time t, for i = 1 , 2 , , n   and   t = 1 , 2 , , T ; with x ( s i , t ) representing the covariates, β is a vector of constants and ε ( s i , t ) is the pure random error term. Gaussian processes are characterized by the covariance (kernel) functions. Various kernel functions can be used, but the Matern and squared exponential are commonly used. In this study, the Matern covariance kernel was used because it gives a better balance between the roughness and smoothness of the developed model. The temporal process w ( s i , t ) is assumed to be a GP with the Matern covariance function given in Equation (3)
C ν ( d ) = σ 2 2 1 ν Γ ( ν ) 2 ν d ρ ν K ν 2 ν d ρ ,
where d represents the distance between two points, K ν is the modified Bessel function of the second kind and ρ and ν are parameters. That is
C o v w ( s i , t ) , w ( s j , t ) = σ w 2 ρ d i j | ν .
The GP model can be represented hierarchically as shown in Equation (5).
Y i t = O i t + ε i t , O i t = X i t β + w i t ,
where i = 1 , , n ; t = 1 , , T assuming ε i t N ( O , σ ε 2 I ) and w i t N ( O , σ w 2 S w ) with S w representing the sample covariance matrix.
The prior distribution assumes a normal distribution as shown in Equation (6).
W t N ( 0 , σ w 2 S w )
The posterior distribution is given in Equation (7).
Y ( S 0 , T + 1 ) = O ( s 0 , T + 1 ) + ϵ ( s 0 , T + 1 ) , O ( s 0 , T + 1 ) = x 1 ( s 0 , T + 1 ) β + w ( s 0 , T + 1 ) ,
where Y ( s 0 , T + 1 ) is the 1-step ahead forecast and s 0 is an unobserved location at time T + 1 . The hyperparameters θ are solved by maximizing the marginal likelihood function given in Equation (8).
L ( θ , σ n o i s e 2 ) = n 2 l o g ( 2 π ) 1 2 l o g K θ + σ n o i s e 2 I 1 2 Y T ( K θ + σ n o i s e 2 I ) 1 ,
where σ n o i s e 2 is the noise variance, θ represents the hyperparameters and K θ is the covariance function. The posterior predictive distribution of Y ( s 0 , T + 1 ) given y is given in Equation (9).
ϕ ( y ( s 0 , T + 1 ) | Y ) = π ( y ( s 0 , T + 1 ) | θ , O , O ( s 0 , T + 1 ) , y ) × π ( 0 ( s 0 , T + 1 ) | θ , y ) π ( θ , O | y ) d O ( ( s 0 , T + 1 ) d θ 0 d θ ) ,
where ϕ ( θ , 0 | y ) is the joint posterior distribution of O and θ . This study uses the GP temporal approach to forecasting solar irradiance.

2.4. Gaussian Process Regression Based Autoregressive Model

Another approach to forecasting solar irradiance in this research is the autoregressive (AR) spatio-temporal model. The Gaussian process autoregressive model [23] is given hierarchically as shown in Equation (10).
Y i t = O i t + ε i t O i t = ρ O i t 1 + X i t β + w i t ,
where i = 1 , , n ; t = 1 , , T assuming ε i t N ( O , σ ε 2 I ) and w i t N ( O , σ w 2 S w ) with S w representing the sample covariance matrix as given in Section 2.3.
Parameters of the spatio-temporal AR model are computed based on prior distributions. The data types are used to group the parameters: mean, variance and correlation. Parameters like β and ρ describing the mean are classified under independent normal prior distributions. The means are specified as ( μ β , μ ρ ) and their variances are ( σ β 2 , σ ρ 2 ). The independent normal distribution is X N ( O , σ μ 2 ) for each component of the n-dimensional vector of the autoregressive model.
The likelihood function is given in Equation (11).
log p ( y θ , σ y 2 ) = log N ( y : 0 , K f f + σ 2 I ) = 1 2 y T ( K f f + σ Y 2 I ) 1 y ) 1 2 log | ( K f f + σ Y 2 I ) | N 2 log 2 π ,
where K f f is the covariance vector and K f f is the covariance matrix and θ are hyperparameters for mean and variance. The posterior distribution is given by (12).
f | G P ( m ^ ( x ) , k ^ ( x , x 1 ) ) ,
where
m ^ ( x ) = K f f ( K f f + σ Y 2 I ) 1 K f f 1 .
The posterior predictive distribution is given in Equation (13).
Y ( s ) = X T ( s ) β + ω ( s ) + ϵ ( s ) ,
where ω ( s ) is a spatial process that captures spatial association, ϵ ( s ) is an independent process and X ( s ) are the weather-related variables.

2.5. The Linear Spatial Model

The linear spatial model is given in Equation (14).
Y ( s i , t ) = β 1 x 1 ( s i , t ) + + β p x p ( s i , t ) + ε ( s i , t ) , i = 1 , , n , t = 1 , , T ,
where ε ( s i , t ) is an error term with a zero-mean spatio-temporal Gaussian process with a covariance structure given in Equation (15).
C o v ε ( s i , t k ) , ε ( s j , t l ) = ρ s | s i s j | ; v s ρ t | t k t l | ; v t ,
where ρ s | s i s j | ; v s represents an isotropic correlation function [23].
A summary of the advantages and disadvantages of the proposed methods is given in Table 2.

2.6. Averaging Forecasts

It is well established in the literature that combining forecasts improves forecast accuracy. This study combines forecasts with two models, the monotone composite quantile regression neural network (MCQRNN) and the QGAM.
Given the forecasts y ^ k , from models k = 1 , , K and the combined forecasts y ^ comb , MCQRNN seeks to minimize the loss function given in Equation (16).
L ( θ ) = 1 k k = 1 K L ( τ k ) = 1 N K k = 1 K i = 1 N ρ τ k y ^ i comb y ^ i k ,
where
ρ τ k ( u ) = τ k u I ( u 0 ) + ( τ k 1 ) u I ( u < 0 )
is the conditional quantile of y at τ k and θ is a vector of unknown parameters. For a detailed discussion of MCQRNN, see [24,25], among others.
QGAM is a hybrid model that is a combination of generalized additive (GAM) and quantile regression (QR) models. QGAMs were developed by Fasiolo et al. [26]. The QGAM model is given in Equation (17)
q Y | X ( τ ) = i = 1 N ρ τ y ^ i comb k = 1 K s k y ^ i k ,
where ρ τ is the pinball loss function.

2.7. Deterministic Evaluation Metrics

Four techniques were used to evaluate the models: two are deterministic, and two are probabilistic. RMSE (Root Mean Square Error) and MAE (Mean absolute error) were the error measures used. The two forecast evaluation metrics are given in Equations (18) and (19), respectively.
R M S E = 1 m i = 1 m ( y i y ^ i ) 2 ,
M A E = 1 m i = 1 m y i y ^ i ,
where m is the number of forecasts, y i is GHI and the predictions of GHI are denoted by y ^ i . The RMSE given in Equation (18) is used in this study to determine the forecast skill ( S k ), which helps in benchmarking the forecast quality of the models to the naive model (the model based on unstandardized forecasts). The forecasting skill is given in Equation (20).
S k = 1 R M S E k R M S E n a i v e , k × 100 ,
where S k is a measure of the superiority of the forecast of the developed model to the forecast of the naive (benchmark) model. The best model is one that gives the highest value of the positive percentage change.

2.8. Probabilistic Evaluation Metrics

Continuous ranked probability score (CPRS) and coverage probability (CP) are probabilistic evaluation metrics used to measure forecasts’ performance. They consider the distributions of the forecasted values as a whole.
CRPS ( F i , y i ) = E F i Y i y i 1 2 E F i Y i Y i
where y i are the predictions, F is the cumulative distribution function (CDF) of Y, Y i and Y i are two independent copies of the linear variables, y i j and y i k are their respective values, that are distributed with respect to F. Now, with m validation observations, the score is calculated as
CRPS ^ = 1 m i = 1 m CRPS ^ ( F i , y i ) ,
where
CRPS ^ ( F i , y i ) = 1 N j = 1 N y i ( j ) y i 1 2 N 2 j = 1 N k = 1 N y i ( j ) y i ( k ) .
The CP probability is given Equation (23).
CP = 100 × 1 m i = 1 m I ( L i y i U i ) ,
where ( L i , U i ) denotes the 100 ( 1 α ) % predictive interval for the prediction y i and I ( . ) denotes the indicator function, which takes the value one when y i [ L i , U i ] and zero otherwise. The four techniques were used to evaluate the GP, GP-AR, and linear spatio-temporal models.

2.9. Statistical Significance

To assess the significance of differences in forecast accuracy of a pair of two competing forecasts of models f and h, we use the Diebold Mariano (DM) and Giacomini White (GW) tests.

2.9.1. Diebold-Mariano Test

The Diebold and Mariano [27] test is used to test for the unconditional predictive ability of two competing forecasts from models A and B.
Let y ( s i , t ) , i = 1 , , n , t = 1 , , T be GHI at site i, at time t and two forecasts y ^ ( s i , t ) f , y ^ ( s i , t ) h , f h ; f , h = 1 , 2 , , K . Assuming the errors from the forecasts are defined as ε ( s i , t ) f = y ( s i , t ) f y ^ ( s i , t ) f , f = 1 , 2 . If g ( ε ( s i , t ) f ) is an error loss function, then a loss function which penalizes underprediction more heavily than overprediction as suggested by [28] is given as (24):
g ( ε ( s i , t ) f ) = e λ ε ( s i , t ) f 1 λ ε ( s i , t ) f .
The loss differential is then calculated as [28]:
d t = g ( ε ( s i , t ) 1 ) g ( ε ( s i , t ) 2 )
The null hypothesis states that the two sets of forecasts have the same accuracy and is given as H 0 : E ( d t ) = 0 , and the alternative is H 1 : E ( d t ) 0

2.9.2. Giacomini-White Test

The GW test generalizes the DM test [29]. The GW has the added advantage of accounting for uncertainty in estimating parameters. It tests the conditional predictive ability of a competing pair of forecasts. The test considers the regression given in Equation (26).
Δ d f , h = ϕ 1 X d 1 + ϵ d ,
where X d 1 has elements from the information set on day d 1 , with the pair of forecasts from models f and h. The differential series is the same as that of the Diebold Mariano test. The CPA test is then computed with the null hypothesis H 0 : ϕ = 0 in the regression given in Equation (26).

2.10. Standardization of the Forecasts

Standardization of data scales forecasts to the same range of values to improve the predictions’ quality. This creates consistency in the dataset by avoiding wider ranges from the data. The forecasts are standardized using Equation (27).
Z y ^ ( s i , t ) = y ^ ( s i , t ) y ¯ ( s i , t ) s y ^ ( s i , t ) ,
where y ^ ( s i , t ) are the forecasts at each station i at time t, y ¯ ( s i , t ) is the mean value of the forecasts and s y ^ ( s i , t ) is the standard deviation of GHI forecasts. The new adjusted forecasts will therefore be given by
y ^ adjust ( s i , t ) = y ^ ( s i , t ) + Z y ^ ( s i , t ) y ^ adjust ( s i , t ) = y ^ ( s i , t ) + y ^ ( s i , t ) y ¯ ( s i , t ) s y ^ ( s i , t ) = ( 1 + s y ^ ( s i , t ) ) y ^ ( s i , t ) y ¯ ( s i , t ) s y ^ ( s i , t ) .

3. Empirical Results

3.1. Data Used in the Study

The dataset used in this study was collected from the SAURAN (Southern African Universities Radiometric Network) website https://sauran.ac.za/, accessed on 13 August 2022. Eight radiometric stations were selected for the study. These were the only radiometric stations from SAURAN that had clean data for the sampling period used. However, future research will include the use of satellite data and more stations, including the use of gridded data covering the whole country.
The stations are CSIR—CSIR Energy Centre (1), CUT-Central University of technology (2), UFH-University of Fort Hare (3), UNV-University of Venda (4), UNZ-University of Zululand (5), UPR-University of Pretoria (6), MIN-CRSES Mintek (7) and NUST-Namibian University of Science and Technology (8). The covariates used in the study are temperature, relative humidity, wind speed, wind direction, wind direction standard deviation, barometric pressure, month, day, noltrend (nonlinear trend variable), elevation, and the dependent variable GHI. R statistical software was used for all the analyses that were done in this study.
The modeling and analysis were done using a Dell laptop with the following specifications: 8GB RAM, processor Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz 2.11 GHz, 64-bit operating system, x64-based processor.

3.2. Exploratory Data Analysis

The locations of the eight radiometric stations used in this study are given in Figure 2.
The longitudes, latitudes and elevations of the eight radiometric stations shown in Figure 2 are presented in Table 3a. Table 3b summarizes the distances between radiometric stations. The bottom left of the leading diagonal shows the distance between stations, while the top right shows the pairwise Kendall’s rank correlation coefficients of GHI.
In Figure 3, the left panel shows the variogram cloud for GHI based on eight stations. The right panel shows the empirical variogram, the plot of averaged points of X and Y coordinates of the clouds.
Daily GHI temporal variations of all eight radiometric stations over the sampling period, December 2021 to January 2022, are given in Figure 4. GHI dropped slightly in January compared to December for all the radiometric stations. This is possibly due to the onset of the rainy season.
In order to get some insight into the distribution of the GHI data, the box plots of daily GHI data based on eight radiometric stations are presented in Figure 5. Most stations have GHI values ranging between 20 and 400 W/m 2 . Only CUT and NUST have measures ranging from 100 to 400 and 500 W/m 2 . The spread of the distributions is almost similar except for CUT and NUST, whose spread is slightly different.
Summary statistics of the GHI data are shown in Table 4. The data distribution does not resemble a normal probability distribution since the median and mean are unequal for all eight stations. The measures of skewness are negative, indicating the data are skewed to the left, and kurtosis is less than three for all the stations except CUT, whose kurtosis is greater than three, indicating that its distribution is heavy-tailed.
Figure 6 displays the Kendall tau’s coefficients, pairwise scatter plots, and histograms to show the association between the meteorological stations. A measurement of Kendall’s τ was calculated using Equation (28).
τ = 1 2 I 0.5 n ( n 1 )
Measure of τ < 0.07 indicates weak association, 0.07 < τ < 0.21 -medium association and 0.21 < τ < 0.35 fairly strong association and τ > 0.35 strong association. Most of the associations between stations reflected that most of these stations have strong associations. NUST has a fairly strong negative association with UFH, while CUT and NUST have a positive medium association.

3.3. Variable Selection

In the study, we used ElaticNet, which is one of the shrinkage methods used in variable selection. ElaticNet is a supervised learning algorithm that identifies the variables strongly associated with the dependent variable. The nonlinear rend variable is another important variable that resulted in a significant improvement in the forecast accuracy.
Figure 7 shows the results for variable selection using the R package `glmnet’, [30]. The significance of the variables was checked using the shrinkage method, elastic net. As shown in Figure 7, all twelve variables were retained in the model. Figure 8 shows the relative importance of the independent selected variables. Relative humidity and temperature are the most significant, while the day is the least important covariate.

3.4. Forecasting Results

The models used in this study are the linear spatial model (M1), which is the benchmark model, GP spatial regression (M2) and GP spatial autoregressive (M3). Eight radiometric stations were selected, and two months of data were used for each station. Table 5 shows the model evaluations of the forecasting accuracy based on RMSE, MAE, CRPS, and CP. Five stations were used as training sets, and three were used as test sets. With eight stations, the possible number of validation sets is given as n k = n ! ( n k ) ! k ! , where n is the total number of locations and k is the number of locations in a validation set. In this study, five validation sets, each having three locations, were used.
The running times for the three models were as follows: M1 (linear model) took 3.39 s, M2 (GP model) took 2.27 s, and M3 (GP-AR model) took 2.89 s.
The results show that, overall, the spatio-temporal Gaussian regression model (M2) produces the least measures of error based on the MAE and RMSE values, proving to be the best model. Based on the CRPS, the linear spatio-temporal model (M1) is the best-fitting model. However, regarding coverage probability, M1 has the least values on all five validation sets. Overall, M2 is the best-fitting model based on all four evaluation metrics.
We also tested for the statistical significance of the models using the Diebold–Mariano and Giacomini–White tests to check the predictive accuracies of the developed models. The results are given in Table 6. The results show that Model M2 (Gaussian Process Spatial) performed better than all the other models.
Figure 9 displays a plot of the forecasts in red and the actual GHI in black. The plots are quite close to each other. We used validation set 2 (CUT, UNV, UPR). The rest of the plots are in the Appendix A section.
Kendall’s rank correlation coefficients were used to measure the association between the prediction errors shown in Figure 10. It shows pairwise scatter plots, the Kendall tau coefficients and the histograms. The computations of the Kendall tau coefficients show that errors for stations CUT, UNV and UPR are close to zero except between CUT and UPR, which is 0.17.

3.5. Standardizing Forecasts

To evaluate the forecasts’ accuracy, we standardized them and then added them back to them. Standardization creates consistency in the dataset by avoiding wider ranges of the predicted values from the actual values.
Table 7 compares the RMSE, MAE, Skill Score, DM, and GW scores, of the predicted values versus the standardized forecasts for validation set UNV, UPR and MIN. The models with standardized predictions achieved better forecasting accuracy, as shown by the positive skill scores. The skill scores show that the predictions improved after standardization.

3.6. Combining Forecasts

Whilst the forecast combination using MCQRNN was applied to all the forecasts from the five validation sets, we present the results from validation set 5 (UNV, UPR, MIN) here. Figure 11 shows plots of combined forecasts with GHI.
Table 8 shows forecast evaluations of all the models based on the validation set 5, models GP and AR, Linear and Standardized GP. MCQRNN represents the forecasts that were combined using the QMCRNN model. It is noted from Table 8 that adjusting the forecasts through standardization and combining forecasts significantly improves the forecast accuracy.
To assess the forecasting results, we use two performance evaluation criteria (RMSE and MAE) with three forecasting horizons, 10-, 15-, and 20-day ahead forecasts. The results are presented in TableTable 9. Using the linear model, the smallest RMSEs for the 10-, 15-, and 20-day ahead forecasts were 62.53, 73.78 and 69.15, respectively. Based on the GP model, the smallest RMSEs were 50.80, 50.32 and 46.10, respectively. Similarly, using the GP-AR model, the RMSEs are 52.96, 51.09 and 45.79, respectively. Based on this validation set and using the RMSE, the results suggest that the GP model is the best for the 10- and 15-day ahead forecasts, while the GP-AR produces good forecasts for the 20-day ahead forecast horizon. It should be noted that this conclusion is based on one validation set. However, this confirms that the spatio-temporal GP regression model produces the most accurate forecasts as summarized in Table 5, Table 6 and Table 8, respectively.

4. Conclusions

This study investigated the spatial–temporal forecasting of global horizontal irradiance data from eight radiometric stations. The data are from SAURAN (https://sauran.ac.za/), accessed on 13 August 2022. This was done through a comparative analysis of two spatio-temporal predictive models, the Gaussian process regression and the autoregressive Gaussian process regression. Variable selection was done using ElaticNet, a hybrid shrinkage method combining Lasso and ridge regressions. The results were compared to the benchmark model, the spatio-temporal linear model. The spatio-temporal GP outperformed the other two models, proving to be the most appropriate model for predicting GHI in South Africa. Most authors (see, for example, [20,31]) applied models based on single-site data sets for predicting solar power irradiation. Accurate forecasts of solar power from multi-sites are important to the system operator, as they facilitate large-scale integration of solar power onto the power grid. This modeling approach exploits production information of solar power from neighboring radiometric stations.
Energy forecasting has many limitations; among them is coming up with accurate forecasts. In order to improve the forecasts, standardization was done, and it proved that the models’ forecasting accuracy improved. The forecasts from the individual models were combined using quantile regression neural networks and additive quantile regression models, which resulted in a significant improvement in the forecast accuracy. As a result, there was a significant improvement in forecast accuracy. This modeling framework could be useful to power utility companies in making informed decisions when planning power grid management.
This study has several possible future research directions. The modeling done in the current paper involved stations in geographically sparse areas. Future research could use more stations and cluster them before the spatio-temporal forecasting of solar power. An in-depth analysis of the extremal dependence of GHI in each cluster could help develop models with high predictive capabilities. Another interesting research could be using large data sets from stations at different periods and comparing them with those from gridded data sets.

Author Contributions

Conceptualization, E.C. and C.S.; methodology, E.C. and C.S.; software, E.C.; validation, E.C. and C.S.; formal analysis, E.C.; investigation, E.C.; data curation, E.C. and C.S.; writing—original draft preparation, E.C.; writing—review and editing, E.C., C.S. and A.B.; visualisation, E.C. and C.S.; supervision, C.S. and A.B.; project administration, C.S. and A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are from Southern African Universities Radiometric Network (SAURAN), website (https://sauran.ac.za/, accessed on 13 August 2022). The analytic data can be downloaded from https://github.com/csigauke/Data-for-the-article-Spatio-temporal-forecasting-of-global-horizontal-irradiance-using-Bayesian-inf.

Acknowledgments

The authors acknowledge SAURAN, Southern African Universities Radiometric Network, for providing the data.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
GHIGlobal horizontal irradiance
GPGaussian Process
GP SpatialGaussian Process Spatial Regression
GP ARSpatialGaussian Process Autoregressive Spatial Model
LSTRLinear Spatial Temporal Regression
ARXautoregressive with exogenous inputs
NNNeural Network
RRFrandom regression forest
RTregression trees
ANNArtificial Neural Network
SVRSupport Vector Regression
ARIMAXAutoregressive Moving Average
DNNDeep Neural Network
MLPartificial neural networks
MGGPmultigene genetic programming
RSMERoot Mean Square Error
MAEMean absolute error
CPRSContinuous Ranked Probability Score
CPPrediction Interval Coverage
SAURANSouthern African Universities Radiometric Network
MDPIMultidisciplinary Digital Publishing Institute
CSIRCSIR Energy Centre
CUTCentral University of technology
UFHUniversity of Fort Hare
UNVuniversity of Venda
UNZUniversity of Zululand
UPRUniversity of Pretoria
MINCRSES Mintek
NUSTNamibian University of Science and Technology
QGAMQuantile Generalized Additive Model
MCQRNN Monotone Composite Quantile Regression Neural Networks

Appendix A

Figure A1 shows the solar resource map for South Africa (SA). It shows the solar radiation each area (region) in SA receives. The largest amount of GHI is in the western part of South Africa.
Figure A1. Solar heat map. Source: https://solargis.com/maps-and-gis-data/download/south-africa, accessed on 8 July 2022.
Figure A1. Solar heat map. Source: https://solargis.com/maps-and-gis-data/download/south-africa, accessed on 8 July 2022.
Applsci 13 00201 g0a1

References

  1. Hong, T.; Shahidehpour, M. Load Forecasting Case Study; US Department of Energy: Washington, DC, USA, 2015. [Google Scholar]
  2. Andre, M.; Dabo-Niang, S.; Soubdhan, T.; Ould-Baba, H. Predictive -temporal model for spatially sparse global solar radiation data. Energy 2016, 111, 599–608. [Google Scholar] [CrossRef]
  3. Liu, Y.; Qin, H.; Zhang, Z.; Pei, S.; Wang, C.; Yu, X.; Jiang, Z.; Zhou, J. Ensemble temporal forecasting of solar irradiation using variational Bayesian convolutional gate recurrent unit network. Appl. Energy 2019, 253, 113596. [Google Scholar] [CrossRef]
  4. Yang, D. Ultra-fast pre-selection in lasso-type -temporal solar forecasting problems. Sol. Energy 2018, 176, 788–796. [Google Scholar] [CrossRef]
  5. Eschenbach, A.; Yepes, G.; Tenllado, C.; G<i>o</i>´mez-P<i>e</i>´rez, J.I.; Pi<i>n</i>˜uel, L.; Zarzalejo, L.F.; Wilbert, S. Spatio-Temporal Resolution of Irradiance Samples in Machine Learning Approaches for Irradiance Forecasting. IEEE Acess 2020, 8, 51518–51531. [Google Scholar] [CrossRef]
  6. Kim, B.; Suh, D. A Hybrid -Temporal Prediction Model for Solar Photovoltaic Generation Using Numerical Weather Data and Satellite Images. Remote Sens. 2020, 12, 3706. [Google Scholar] [CrossRef]
  7. Zhang, J.; Ju, Y.; Mu, B.; Zhong, R.; Chen, T. An efficient implementation for spatial–temporal Gaussian process regression and its applications. Automatica 2023, 147, 110679. [Google Scholar] [CrossRef]
  8. Todescato, M.; Carron, A.; Carli, R.; Pillonetto, G.; Schenato, L. Efficient spatio-temporal Gaussian regression via Kalman filtering. Automatica 2022, 118, 109032. [Google Scholar] [CrossRef]
  9. Hamelijnck, O.; Wilkinson, W.; Loppi, N.; Solin, A.; Damoulas, T. Spatio-temporal variational Gaussian processes. Adv. Neural Inf. Process. Syst. 2021, 2021 34, 23621–23633. [Google Scholar]
  10. Agoua, X.G.; Girard, R.; Kariniotakis, G. Probabilistic models for spatio-temporal photovoltaic power forecasting. IEEE Trans. Sustain. Energy 2018, 10, 780–789. [Google Scholar] [CrossRef] [Green Version]
  11. Banerjee, S.; Gelfand, A.E.; Finley, A.O.; Sang, H. Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2008, 70, 825–848. [Google Scholar] [CrossRef] [Green Version]
  12. Luttinen, J.; Ilin, A. Variational Gaussian-process factor analysis for modeling spatio-temporal data. Adv. Neural Inf. Process. Syst. 2009, 22. Available online: https://proceedings.neurips.cc/paper/2009/file/4a47d2983c8bd392b120b627e0e1cab4-Paper.pdf (accessed on 19 September 2022).
  13. Tomizawa, Y.; Yoshida, I. Benchmarking of Gaussian Process Regression with Multiple Random Fields for Spatial Variability Estimation. ASCE-ASME J. Risk Uncertain. Eng. Syst. Part A Civ. Eng. 2022, 8, 04022052. [Google Scholar] [CrossRef]
  14. Comber, A.; Harris, P.; Brunsdon, C. Spatially Varying Coefficient Regression with GAM Gaussian Process splines: GAM (e)-on. AGILE GISci. Ser. 2022, 2022 3, 1–6. [Google Scholar] [CrossRef]
  15. Najibi, F.; Apostolopoulou, D.; Alonso, E. Enhanced performance Gaussian process regression for probabilistic short-term solar output forecast. Int. J. Electr. Power Energy Syst. 2021, 130, 106916. [Google Scholar] [CrossRef]
  16. Najibi, F.; Apostolopoulou, D.; Alonso, E. Clustering Sensitivity Analysis for Gaussian Process Regression Based Solar Output Forecast. IEEE Madr. PowerTech 2021, 1–6. [Google Scholar] [CrossRef]
  17. de Paiva, G.M.; Pimentel, S.P.; Alvarenga, B.P.; Marra, E.G.; Mussetta, M.; Leva, S. Multiple Site Intraday Solar Irradiance Forecasting by Machine Learning Algorithms: MGGP and MLP Neural Networks. Energies 2020, 13, 3005. [Google Scholar] [CrossRef]
  18. Wang, F.; Chen, P.; Zhen, Z.; Yin, R.; Cao, C.; Zhang, Y.; Duic, N. Dynamic -temporal correlation and hierarchical directed graph structure based ultra-short-term wind farm cluster power forecasting method. Appl. Energy 2022, 323, 119579. [Google Scholar] [CrossRef]
  19. Gill, J.; Rubiera, J.; Martin, C.; Cacic, I.; Mylne, K.; Dehui, C.; Jiafeng, G.; Xu, T.; Yamaguchi, M.; Foamouhoue, K.; et al. Guidelines on Communicating for Uncertainty; World Meteorological Organisation: Geneva, Switzerland, 2008; Available online: https://public.wmo.int/en (accessed on 6 July 2022).
  20. Chandiwana, E.; Sigauke, C.; Bere, A. Twenty-four-hour ahead probabilistic global horizontal irradiance forecasting using Gaussian process regression. Algorithms 2021, 14, 177. [Google Scholar] [CrossRef]
  21. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
  22. Williams, C.K.I.; Rasmussen, C.E. Gaussian processes for machine learning. Cambridge 2006, 2, 69–106. [Google Scholar]
  23. Sahu, S.K. Bayesian Modeling of Spatio-Temporal Data with R, 1st ed.; CRC Press/Taylor and Francis: New York, NY, USA, 2022. [Google Scholar]
  24. Zou, H.; Yuan, M. Composite quantile regression and the oracle model selection theory. Ann. Stat. 2008, 36, 1108–1126. [Google Scholar] [CrossRef]
  25. Jin, J.; Zhao, Z. Composite Quantile Regression Neural Network for Massive Datasets. Math. Probl. Eng. 2021, 1–10. [Google Scholar] [CrossRef]
  26. Fasiolo, M.; Wood, S.N.; Zaffran, M.; Nedellec, R.; Goude, Y. QGAM: Bayesian Nonparametric Quantile Regression Modeling in R. J. Stat. Softw. 2021, 100, 1–31. [Google Scholar] [CrossRef]
  27. Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [Google Scholar] [CrossRef]
  28. Triacca, U. Comparing Predictive Accuracy of Two Forecasts. 2018. Available online: http://www.phdeconomics.sssup.it/documents/Lesson19.pdf (accessed on 7 September 2022).
  29. Giacomini, R.; White, H. Tests of conditional predictive ability. Econometrica 2006, 74, 1545–1578. [Google Scholar] [CrossRef] [Green Version]
  30. Hastie, T.; Qian, J.; Tay, K. An Introduction to Glmnet. Available online: https://cran.r-project.org/web/packages/glmnet/vignettes/glmnet.pdf (accessed on 23 August 2022).
  31. Sigauke, C.; Ravele, T.; Jhamba, L. Extremal Dependence Modelling of Global Horizontal Irradiance with Temperature and Humidity: An Application Using South African Data. Energies 2022, 15, 5965. [Google Scholar] [CrossRef]
Figure 1. Flow chart of the structure of the paper.
Figure 1. Flow chart of the structure of the paper.
Applsci 13 00201 g001
Figure 2. Map showing radiometric stations. Source: Author’s creation.
Figure 2. Map showing radiometric stations. Source: Author’s creation.
Applsci 13 00201 g002
Figure 3. Variogram cloud (left panel) and an empirical variogram (right panel) for the GHI at the eight radiometric stations. The distance is measured in kilometers.
Figure 3. Variogram cloud (left panel) and an empirical variogram (right panel) for the GHI at the eight radiometric stations. The distance is measured in kilometers.
Applsci 13 00201 g003
Figure 4. Time series plot of the GHI data at the eight radiometric stations.
Figure 4. Time series plot of the GHI data at the eight radiometric stations.
Applsci 13 00201 g004
Figure 5. Box plots of GHI at the eight radiometric stations.
Figure 5. Box plots of GHI at the eight radiometric stations.
Applsci 13 00201 g005
Figure 6. Histograms (diagonal), pairwise scatter plots (bottom left), and pairwise Kendall’s rank correlation coefficients (top right) of GHI for all the eight radiometric stations.
Figure 6. Histograms (diagonal), pairwise scatter plots (bottom left), and pairwise Kendall’s rank correlation coefficients (top right) of GHI for all the eight radiometric stations.
Applsci 13 00201 g006
Figure 7. Variables selected using the elastic net.
Figure 7. Variables selected using the elastic net.
Applsci 13 00201 g007
Figure 8. Relative importance for GHI.
Figure 8. Relative importance for GHI.
Applsci 13 00201 g008
Figure 9. Predictions at CUT, UNV, and UPR radiometric stations.
Figure 9. Predictions at CUT, UNV, and UPR radiometric stations.
Applsci 13 00201 g009
Figure 10. Histograms (diagonal), pairwise scatter plots (bottom left), and pairwise Kendall’s rank correlation coefficients (top right) of the prediction errors for stations 2, 4 and 6, which are CUT, UNV and UPR.
Figure 10. Histograms (diagonal), pairwise scatter plots (bottom left), and pairwise Kendall’s rank correlation coefficients (top right) of the prediction errors for stations 2, 4 and 6, which are CUT, UNV and UPR.
Applsci 13 00201 g010
Figure 11. Plots of GHI with forecasts from (a) GP, (b) GP adjusted (c), AR (d) Linear models and combined forecasts using (e) MCQRNN and (f) QGAM.
Figure 11. Plots of GHI with forecasts from (a) GP, (b) GP adjusted (c), AR (d) Linear models and combined forecasts using (e) MCQRNN and (f) QGAM.
Applsci 13 00201 g011
Table 1. Summary of some previous studies on temporal dependence modeling of GHI.
Table 1. Summary of some previous studies on temporal dependence modeling of GHI.
Ref. DataModelsMain Findings
[2]Solar power dataTemporal vector autoregressive modelResults show that the significant best order was equal to 1.
[4]Solar irradiation dataUltra fast pre-selection algorithmThe method produced results superior to those that adopted classical methods.
[3]Solar irradiation dataVariational Bayesian conditional gate recurrent unit networkResults of the study provided efficient uncertainty estimation of solar irradiation prediction.
[6]Numerical weather data and satellite-based imagesSpatial, temporal prediction using ARIMAX, SVR, ANN, and DNNThe results based on spatial data produced results that outperformed those based on numerical data.
[7] Temperature data Gaussian Process Analysis, Spatial-Temporal Analysis, Kalman Filter and smoother There was a confirmation of improvement of model predictions performance through spatial analysis.
[9] Air quality data Variational GP, Markov kernel analysis, Spatial-Temporal Analysis The results produced showed a great improvement as a result of combining variational inference and spatial–temporal filtering.
[10] Photovoltaic plants data Lasso, Quantile Regression, Spatial Analysis, and Kernel Density Function The model produced results that were superior to those of KDM.
[11] Point referenced biomass data Gaussian Process Regression, Hierarchical modelling, Markov chain, Monte Carlo methods Findings addressed the problem of fitting hierarchical spatial modeling on large datasets.
[12] Sea surface temperature data Gaussian Process Regression, Factor Analysis, Principal Component Analysis Findings proved that the performance of GPR improved as a result of combining it with factor analysis.
[13] Real ground measure dataset Gaussian Process Regression, Gaussian random fields Analysis, Markov Analysis The results showed that GPR with random fields using a Whittle Marten produced the best results.
[14] Social economic data Gaussian Process Regression, Spatial Analysis, Splines Regression Predictions were improved by making use of GAMs model that accommodated spatial heterogeneity.
[15] Photovoltaic data Gaussian Process Regression, feature selection, k-means clustering The prediction accuracy of the GPR model improves as a result of the k-means clustering.
[16] Photovoltaic data Gaussian Process Regression, k-means clustering, Elbow and Gap method Optimal clusters used are four, and forecast error was reduced by optimizing the clusters.
[18]WindGraph structured spatial-temporal analysisThe results indicated that the method outperformed the benchmark models.
[17]Solar irradiance dataMultigene Genetic Programming (MGGP) and Multilayer perception.Results showed that the forecast horizon, location, and error metric accuracy affected the model. MGGP gave more accurate results.
[5]Solar irradiance dataARX, NN, RRF, and RTResults showed that the short term is more effective on data from a dense enough network.
Table 2. Model comparisons of the proposed techniques.
Table 2. Model comparisons of the proposed techniques.
ModelsStrengthsWeaknesses
Spatial Linear1. It is easy to implement,1. It is sensitive to outliers
train and interpret.
2. The algorithm is less complex
than other models used.
3. Uncertainty is captured directly.
Spatial GP1. It uses kernel functions,1. Spatial GP is
making the modelcomputationally expensive.
computationally cheaper,
giving minimum loss.
2. Can handle complex datasets.
3. The kernel function uses an
inner product to combine features
from different locations.
Spatial AR1. They allow the investigations of1. Robustness is limited
the associativeness of an individual.against non-stationarity
independent variables, adjustingand additive noise.
the non-independent variables.
2. The model can determine if
randomness is lacking in the
spatial function.
3. It can predict patterns that are
related to recurring datasets.
Table 3. (a) Longitude, latitude and elevation. (b) Distance (km) matrix for the eight radiometric stations.
Table 3. (a) Longitude, latitude and elevation. (b) Distance (km) matrix for the eight radiometric stations.
(a)
StationLongitudeLatitudeElevation (m)
CSIR28.279−25.7471400
CUT26.216−29.1211397
UNZ31.852−28.85390
UNV30.424−23.131628
UFH26.845−32.785540
UPR28.229−25.7531410
MIN27.978−26.0891521
NUST17.075−22.5651683
(b)
CSIRCUTUNZUNVUFHUPRMINNUST
CSIR00.420.340.110.080.90.480.1
CUT45600.140.040.010.410.30.13
UNZ66477900.01−0.030.300.24−0.02
UNV4418989070−0.080.10.140.01
UFH986527878142200.110.00−0.23
UPR6.946066244099000.510.06
MIN56841064149394054.700.06
NUST16511520229620871796164916110
Table 4. Summary statistics of GHI for the eight stations.
Table 4. Summary statistics of GHI for the eight stations.
StationMin.1st Qu.MedianMean3rd Qu.Max.SkewnessKurtosis
CSIR37.53187.16267.44250.35331.92386.41−0.53826352.395768
CUT43.45257.17318.27301.15365.21504.55−0.7192093.5455
UFH50.86204.34272.86260.89339.02389.10−0.5517942.503086
UNV49.89178.34245.59233.37302.89356.90−0.53872492.228135
UNZ62.69167.36269.01245.12321.84350.45−0.49541721.928348
UPR31.16191.20254.57246.36332.57379.90−0.59525332.499935
MIN60.22208.79289.37259.85336.06377.33−0.71168372.564314
NUST148.6264.9323.8317.6381.1397.6−0.61467152.306814
Table 5. Overall model evaluation for the three models.
Table 5. Overall model evaluation for the three models.
StationRMSEMAECRPSCP
Validation set 1 (CSIR, UNZ, NUST)
M1162.497113.54746.21174.731
M261.89349.11368.769100
M364.10550.30865.013100
Validation set 2 (CUT, UNV, UPR)
M174.58560.51841.41091.935
M267.67553.14959.32399.462
M368.15653.30258.81399.462
Validation set 3 (UFH, UNV, MIN)
M182.74168.08537.56690.323
M281.29763.43954.31196.774
M383.74063.43763.49698.925
Validation set 4 (UPR, MIN, NUST)
M1152.961107.35445.98976.344
M263.90553.02867.072100
M364.74853.00762.307100
Validation set 5 (UNZ, UPR, MIN)
M174.78562.49940.55894.624
M270.18555.99369.137100
M373.34157.24559.01698.924
Table 6. Model comparisons (M1(Linear), M2(GP), M3(GP-AR)) using validation set CSIR, UNZ, and NUST.
Table 6. Model comparisons (M1(Linear), M2(GP), M3(GP-AR)) using validation set CSIR, UNZ, and NUST.
Diebold-Mariano Test
Null HypothesisTest Statisticp-ValueResult
M2 = M3−6.8411.116 × 10 010 Not equally accurate
M2 = M16.9366.546 × 10 011 Not equally accurate
M3 = M11.9130.0574Not equally accurate
Giacommini-White Test
Null HypothesisTest Statisticp-ValueResult
M2 = M33.6060.058Sign of mean loss is (-). M2 dominates M3
M2 = M138.3875.802 × 10 010 Sign of mean loss is (-). M2 dominates M1
M3 = M137.5508.909 × 10 0.010 Sign of mean loss is (-). M3 dominates M1
Table 7. Standardized based models evaluation.
Table 7. Standardized based models evaluation.
ModelMAERMSESkill Score
predicted4 (UNV)34.36549.682
predicted4.adjusted33.73849.1910.988
predicted6 (UPR)21.43531.851
predicted6.adjusted20.76631.2171.991
predicted7 (MIN)39.96256.482
predicted7.adjusted39.71456.5440.110
Table 8. Accuracy evaluations (Validation set (467)).
Table 8. Accuracy evaluations (Validation set (467)).
ModelRMSEMAE
GP67.0152.99
Standardised GP66.5352.52
AR67.7052.70
Linear74.5361.49
MCQRNN 51.5337.97
QGAM 52.3938.85
Table 9. Accuracy evaluations for three forecast horizons using validation set UNV, UPR and MIN.
Table 9. Accuracy evaluations for three forecast horizons using validation set UNV, UPR and MIN.
Linear Model
Forecast HorizonRMSEMAE
UNVUPRMINUNVUPRMIN
10-day72.8580.7962.5354.2671.7857.85
15-day73.8674.9873.7853.5966.7064 60
20-day72.9569.1570.8754.9560.4459.94
GP Model
Forecast HorizonRMSEMAE
UNVUPRMINUNVUPRMIN
10-day75.5950.8053.5858.7644.4444.81
15-day77.9850.3263.7857.5945.0154.12
20-day77.3146.1062.2759.2039.5954.06
GP-AR Model
Forecast HorizonRMSEMAE
UNVUPRMINUNVUPRMIN
10-day77.0052.9658.1759.6645.8845.11
15-day76.6751.0968.2058.2644.9755.44
20-day76.9845.7967.0359.2538.3156.51
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sigauke, C.; Chandiwana, E.; Bere, A. Spatio-Temporal Forecasting of Global Horizontal Irradiance Using Bayesian Inference. Appl. Sci. 2023, 13, 201. https://doi.org/10.3390/app13010201

AMA Style

Sigauke C, Chandiwana E, Bere A. Spatio-Temporal Forecasting of Global Horizontal Irradiance Using Bayesian Inference. Applied Sciences. 2023; 13(1):201. https://doi.org/10.3390/app13010201

Chicago/Turabian Style

Sigauke, Caston, Edina Chandiwana, and Alphonce Bere. 2023. "Spatio-Temporal Forecasting of Global Horizontal Irradiance Using Bayesian Inference" Applied Sciences 13, no. 1: 201. https://doi.org/10.3390/app13010201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop