Domain Hybrid Day-Ahead Solar Radiation Forecasting Scheme

Park, Jinwoong; Park, Sungwoo; Shim, Jonghwa; Hwang, Eenjun

doi:10.3390/rs15061622

Open AccessArticle

Domain Hybrid Day-Ahead Solar Radiation Forecasting Scheme

School of Electrical Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(6), 1622; https://doi.org/10.3390/rs15061622

Submission received: 8 February 2023 / Revised: 9 March 2023 / Accepted: 15 March 2023 / Published: 17 March 2023

(This article belongs to the Special Issue New Challenges in Solar Radiation, Modeling and Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Recently, energy procurement by renewable energy sources has increased. In particular, as solar power generation has a high penetration rate among them, solar radiation predictions at the site are attracting much attention for efficient operation. Various approaches have been proposed to forecast solar radiation accurately. Recently, hybrid models have been proposed to improve performance through forecasting in the frequency domain using past solar radiation. Since solar radiation data have a pattern, forecasting in the frequency domain can be effective. However, forecasting performance deteriorates on days when the weather suddenly changes. In this paper, we propose a domain hybrid forecasting model that can respond to weather changes and exhibit improved performance. The proposed model consists of two stages. In the first stage, forecasting is performed in the frequency domain using wavelet transform, complete ensemble empirical mode decomposition, and multilayer perceptron, while forecasting in the sequence domain is accomplished using light gradient boosting machine. In the second stage, a multilayer perceptron-based domain hybrid model is constructed using the forecast values of the first stage as the input. Compared with the frequency-domain model, our proposed model exhibits an improvement of up to 36.38% in the normalized root-mean-square error.

Keywords:

smart grid; renewable energy sources; solar radiation forecasting; wavelet transform; complete ensemble empirical mode decomposition with adaptive noise

Graphical Abstract

1. Introduction

In recent years, renewable energy generation, from sources such as solar and wind energy, has emerged as a crucial component of electrical energy production due to its ability to reduce carbon emissions and serve as an alternative to the rapidly depleting fossil fuels [1]. Photovoltaics accounted for about 45% of global renewable energy capacity additions in 2020 and showed a high penetration rate among renewable energy sources [2,3]. Photovoltaic power relies on uncontrollable solar radiation, which is not conducive to energy management planning. Additionally, an inconsistent photovoltaic power reduces the dependence on photovoltaic power on the supply side of the power grid [3]. Therefore, to stably integrate photovoltaic power into the power grid, it is essential to accurately forecast solar radiation, which has the most significant impact on photovoltaic power generation [4].

Solar radiation forecasting models based on various methods have been proposed to forecast solar radiation accurately. For example, models based on statistical methods include autoregressive integrated moving average (ARIMA) [5], multilinear regression (MLR) [6], and holt winters [7]. These models perform well when the inputs and outputs are linear, but the forecasting performance deteriorates when the inputs and outputs are nonlinear [8,9]. Artificial intelligence (AI)-based solar radiation forecasting models such as support vector regression (SVR) [10] and neural network (NN) [11] have been proposed to solve the performance degradation issues arising from the nonlinear relationship between the input and output. However, although AI-based forecasting models perform well for nonlinear data, their forecasting performance is greatly affected by the number of input variables or the amount of input data. In order to compensate for the degradation of forecasting performance according to the number of input variables and amount of data, a hybrid forecasting model in the frequency domain based on preprocessing methods such as Fourier transformation (FT) and wavelet transformation (WT) has been proposed for data transformation and decomposition [12]. Such a hybrid model showed improved solar radiation forecasting performance by decomposing the original solar radiation data and making them suitable for modeling nonstationary data with a large amount of information [13,14]. However, since this approach uses only past solar radiation data for forecasting, it has a limited ability to cope with the changes in solar radiation caused by exogenous variables such as air temperature and relative humidity, and it cannot respond to rapid weather changes [15].

In order to overcome the limitations of existing hybrid prediction models, in this paper, we propose a domain hybrid solar radiation forecasting model that combines forecasting in the sequence domain using exogenous variables and forecasting in the frequency domain using past solar radiation. The proposed solar radiation forecasting method consists of two stages, and each model uses algorithms with a relatively low learning time and high accuracy [16]. In the first stage, solar radiation forecasting is performed in the sequence and frequency domains using exogenous variables and past solar radiation data as inputs, respectively. A forecasting model in the sequence domain is constructed using the light gradient boosting machine (LightGBM) [17] and time series cross-validation (TSCV). Because the forecasting model in the sequence domain applies TSCV, it was built on the basis of LightGBM, which is fast and has excellent performance. The forecasting model in the frequency domain uses WT [18] and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [19] to transform past solar radiation data into the frequency domain and perform signal decomposition. We used CEEMDAN to solve the mode mixing problem in data decomposition and to minimize errors in data reconstruction. Then, forecasting models based on multilayer perceptron (MLP) [20] were constructed using each decomposed solar radiation dataset as input. In the second stage, based on the MLP, more accurate domain hybrid day-ahead solar radiation forecasting is performed by considering solar radiation patterns and exogenous factors in the sequence domain and frequency domain, respectively. The contributions of this paper are as follows:

We present a domain hybrid day-ahead solar radiation model that combines forecasting in the sequence and frequency domains for an accurate solar radiation forecasting.
We further improve the solar radiation forecasting performance by ensembling the forecasting results of the two domains.
The proposed model performs day-ahead forecasting at 1 h intervals and shows a high accuracy.

This paper is organized as follows: Section 2 introduces several related works. Section 3 presents the overall structure of the proposed domain hybrid day-ahead solar radiation forecasting model. Section 4 illustrates the experiments and their results. Lastly, Section 5 presents the major conclusions of the study.

2. Related Works

Recently, solar radiation forecasting models using AI methods such as SVR [21,22,23] and artificial neural networks (ANNs) [24,25,26] have been proposed to overcome the nonlinearity and complex relationships of time series. For instance, Mellit et al. [25] presented a method for forecasting day-ahead solar radiation using air temperature values based on the MLP algorithm. This method was validated using data collected in the Italian city of Trieste. Yildirim et al. [27] studied solar radiation forecasting using regression analysis and ANNs for four different sites in Turkey. The proposed model uses longitude, sunshine hours, relative humidity, air temperature, and time information as the input variables. The authors obtained the most accurate results from the ANN-based model. Kaba et al. [28] performed solar radiation forecasting at different sites in Turkey using deep learning algorithms. They used sunshine hours, cloud cover, and daily minimum and maximum temperature data as the input variables, and then compared and analyzed the change in accuracy according to different combinations of input variables. Yu et al. [29] proposed a short-term solar radiation forecasting model based on the long-short term memory (LSTM) algorithm. They considered relative humidity, cloud type, dew point, solar zenith, wind speed, etc. as the input variables and verified the applicability of the proposed method in three sites in the United States. Their results confirmed that the LSTM-based forecasting model showed an excellent performance. He et al. [30] proposed a hybrid probabilistic solar radiation forecasting model that combined LSTM and residual modeling. LSTM-based forecasting was used for deterministic forecasting, whose value was used to calculate the residual distribution. The input variables of the model were relative humidity, dew point temperature, cloudiness, wind speed, and time information. The authors verified that the proposed model outperformed the existing deep learning-based models.

Solar radiation forecasting using the aforementioned exogenous factors as input variables demonstrated an excellent accuracy in the sequence domain. Nevertheless, there is a limit to the improvement in prediction accuracy when the number of input variables is small. Various forecasting models that use past solar radiation data in the sequence domain as input variables have been proposed to solve this problem. Huang et al. [12] proposed a solar radiation forecasting model in the frequency domain based on discrete Fourier transform (DFT), principal component analysis (PCA), and Elman neural network (ENN). The authors confirmed that the performance of the proposed forecasting model in the frequency domain was superior to that of the existing ones. Shamshirband et al. [31] proposed a solar radiation forecasting model in the frequency domain using WT and support vector machine (SVM). WT was used to decompose the solar radiation data, which were the input variables, and each decomposed datapoint was used as the input to individual SVM models. The authors verified that the developed model performed better than other models. Gao et al. [15] proposed a solar radiation forecasting model combining CEEMDAN, convolution neural networks (CNNs), and LSTM. The authors verified that the forecasting accuracy, which is a noisy time series, can be improved by decomposing a complex signal into several relatively simple signals using CEEMDAN. Zhang et al. [32] proposed a model to improve the solar radiation forecasting performance in the frequency domain by combining WT, CEEMDAN, improved atom search optimization (IASO), and outlier robust extreme learning machine (ORELM). The authors showed that WT can improve the performance through an appropriate denoising and decomposition of the signal data through CEEMDAN. In addition, it was revealed that the performance could be further enhanced by optimizing the model using IASO. Although the forecasting performance in the frequency domain was excellent, the response of the model to weather changes such as rainy and cloudy days was limited because it did not consider the exogenous factors [15]. In addition, since only past solar radiation was considered, the accuracy of forecasting instantaneous changes in solar radiation was limited.

In this paper, we present a domain hybrid day-ahead solar radiation forecasting model that combines sequence- and frequency-domain forecasting to compensate for these weaknesses and provide a more robust and superior performance.

3. Methodology

In this section, we describe our domain hybrid day-ahead solar radiation forecasting model. Figure 1 illustrates the overall architecture of the model. In the first stage, day-ahead solar radiation forecasting is performed in the sequence and frequency domains. In the sequence domain, solar radiation is forecasted using exogenous factors as inputs to LightGBM and the TSCV-based forecasting model. In the frequency domain, the solar radiation data are transformed into frequency domain using WT, and then the transformed solar radiation data are decomposed using CEEMDAN. After that, the decomposed signal data are used to train an MLP-based model for solar radiation prediction. In the second stage, domain hybrid day-ahead solar radiation forecasting is performed using the forecasting results obtained in the sequence and frequency domains as inputs to a model based on MLP. We used the data obtained from January 2016 to December 2018 as the training dataset and those from January 2019 to December 2020 as the test dataset. Details are provided In the following subsections.

3.1. Data Collection

In this study, a solar radiation forecasting model was constructed using the date/time, meteorological data, and past solar radiation data, provided by Korea Meteorological Administration (KMA), as the inputs. We considered three regions located in the Republic of Korea. Table 1 shows the latitude, longitude, and elevation of the three regions selected to confirm the forecasting performance of the model. The data collection period was from 8:00 a.m. to 6:00 p.m. for a total of 5 years, from 2016 to 2020, and the collected data were air temperature, relative humidity, wind speed, and solar radiation [33]. Additionally, date and time information was used as input for forecasting in the sequence domain.

3.2. Solar Radiation Forecasting in the Sequence Domain Using Exogenous Factors

In the sequence domain within the first stage, day-ahead solar radiation forecasting was performed using LightGBM and the TSCV-based model. The forecast values were fed into the second-stage model. Specifically, the first-year data were used as training data, and then TSCV was performed on the data of the next 4 years. Section 3.2.1 and Section 3.2.2 describe the construction of the models in the sequence domain.

3.2.1. LightGBM

LightGBM is a high-performance algorithm based on a decision tree for regression or classification tasks [34]. This algorithm reduces modeling time by rapidly calculating the information gained with only a portion of the data through gradient-based one-side sampling (GOSS) and by reducing the feature factors with exclusive feature bundling (EFB) [17]. GOSS calculates by internally decreasing the number of datapoints via sampling based on the gradient magnitude. Specifically, it involves excluding data points with large gradients (i.e., where the loss function is changing rapidly with respect to the model’s predictions) and instead performing random sampling on the data points with small gradient values (i.e., where the loss function is changing slowly with respect to the model’s predictions) [35]. EFB reduces the computation by integrating the exclusive variables into one variable according to the characteristics of the sparse variable space. In addition, unlike other boosting algorithms that perform depth-wise or level-wise splitting, LightGBM uses a leaf-wise method to reduce losses and, thus, shows faster processing and higher accuracy than the existing boosting algorithms. LightGBM has been used to forecast renewable energy sources such as solar radiation and wind speed, where it has been proven to be fast and accurate [17,34]. Thus, we developed our model using LightGBM to afford rapid learning and accurate forecasting by applying TSCV in the sequence domain.

3.2.2. Time-Series Cross-Validation

Typically, data are collected and divided into training and test sets to create a forecasting model. The training set is used to construct the forecasting model, and the test set is used to evaluate its performance. In the traditional validation method, when the amount of training data is small, the accuracy decreases as the training timepoint and forecasting point get farther away [36]. TSCV is useful for improving the performance of time series models because it considers temporal dependencies that often appear in time series data. TSCV forecasts by using all the data, before a forecasting point, as the training data, setting the next point as the test set, and iterating through it. However, training and forecasting all points via TSCV are time-intensive tasks. We used monthly TSCV, as shown in Figure 2, to alleviate overhead while taking advantage of the benefits of TSCV.

3.3. Solar Radiation Forecasting Using Past Solar Radiation Data in the Frequency Domain

In the frequency domain within the first stage, the past solar radiation data were first converted into the frequency domain using WT, and then noise was removed. Subsequently, a preprocessing process was applied to decompose the signal data using CEEMDAN, and day-ahead solar radiation forecasting was performed using MLP-based models that take each decomposed signal data point as an input. The proposed forecasting model in the frequency domain uses data of 1 year for training. Section 3.3.1, Section 3.3.2 and Section 3.3.3 elucidate the forecasting model construction.

3.3.1. Wavelet Transform

WT is a mathematical method for expressing a function or signal as a superposition of scaling and transformed wavelets. WT is a robust data analysis and processing tool and is used in various fields such as image processing, signal analysis, and data compression. WT has several advantages over other signal representation methods, such as FT. For example, a wavelet has a local time–frequency representation; that is, both time-varying and frequency-varying characteristics of the signal can be captured. In addition, it can be easily applied to various types of signals and data, thus enabling an efficient and effective data analysis with various functions and characteristics. In general, WT involves decomposition of a signal into a series of wavelets, each characterized by a scale and a temporal position. The scale of a wavelet corresponds to its frequency content, while its position in time corresponds to its phase. WT can be used to represent signals in either the time or frequency domain, depending on the specific requirements of the application. The WT process is illustrated in Figure 3.

The first step in wavelet transform is to select a mother wavelet, a small oscillatory function to base the wavelet on in the transform. There are many different types of mother wavelets available, and the choice of a mother wavelet depends on the specific characteristics of the signal being analyzed, as well as the goal of the analysis. In the second step, the mother wavelet is scaled and transformed to create a series of wavelets that are used to represent the signal. Scaling a wavelet corresponds to a change in frequency, and WT corresponds to a shift in position in time. In the next step, the signal is scaled and decomposed by convolution with each transformed wavelet. This process creates a series of coefficients that are used to represent the signal in the wavelet domain. In the final step, the WT representation of the signal is constructed by plotting the wavelet coefficients as functions of scale and time. The result is a two-dimensional signal representation that can be used to analyze the time-varying frequencies in the signal.

3.3.2. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

Empirical mode decomposition (EMD) is an algorithm that decomposes a signal into a set of intrinsic mode functions (IMFs), which are oscillatory functions that capture the signal’s underlying time-varying pattern. Although EMD is beneficial for analyzing nonstationary and nonlinear signals, it suffers from mode mixing drawbacks, i.e., the existence of different oscillation modes in one IMF or the same oscillation mode across several IMFs in an EMD [37]. An ensemble empirical mode decomposition (EEMD) algorithm is proposed to solve the mode mixing problem. This algorithm first adds a Gaussian white noise to the signal data before the EMD process. Although EEMD solves the mode mixing problem, reconstruction errors appear, because the added Gaussian white noise cannot be completely removed during the signal reconstruction step. To address this limitation, CEEMDAN with white noise has been proposed, which effectively circumvents the mode mixing issue, and the reconstruction error converges to near zero. The process of performing CEEMDAN on a signal is illustrated in Figure 4.

In the first step in CEEMDAN, a copy of the original signal is generated, and an adaptive white noise is added to this copy. This noise ensures that the decomposition process converges into a stable and meaningful set of IMFs. In the second step, the noisy signal is decomposed into a series of IMFs by applying an EMD algorithm. This involves iteratively identifying the local extrema of the signal and constructing the IMF from the envelope defined by these extrema. The process is repeated until the residual signal is a monotonically increasing or decreasing function, which is called the residual signal.

3.3.3. Multilayer Perceptron

An ANN is a type of AI algorithm containing many nodes [38], and a perceptron is a type of ANN. Equation (1) is an expression for a single neuron perceptron with one output value connected to all inputs.

y = f (z), z = \sum_{i = 0}^{n} w_{i} x_{i},

(1)

where

x

is the input value,

y

is the output value,

w

is the weight, and

n

is the number of input variables. The calculated value has various forms based on the activation function

f

, and a bias characteristic is added [39]. An MLP is one of the most basic and widely used types of ANN, consisting of an input layer, one or more hidden layers, and an output layer. MLP can overcome the limitations of a single perceptron that can only solve linearly separable problems [40]. It can handle nonlinearly separable problems by adding hidden layers between the input and output layers. The hidden layers perform nonlinear transformations of the inputs, allowing the network to learn complex representations of the data [41]. Models for each decomposed signal are constructed in the frequency domain for day-ahead solar radiation forecasting. The structure of each MLP network in the frequency domain consists of an input layer with 11 nodes, five hidden layers with eight nodes, and an output layer of one node. Figure 5 illustrates an example of forecasting in the frequency domain.

If the network lacks a connection, insufficient adjustable parameters may occur, and excessive connectivity can lead to overfitting of the network to the training data [42]. Therefore, it is necessary to set the number of hidden layers and nodes suitable for the data. The MLP is trained using a backpropagation algorithm, and each neuron uses a backpropagation algorithm to identify the optimal parameters that minimize the errors. In this study, we used adaptive moment estimation (Adam) [43] as the optimization method and scaled exponential linear unit (SELU) [44] as the activation function. We constructed MLP-based forecasting models that incorporate IMFs and residual signals, decomposed through WT and CEEMDAN, as inputs, and then performed forecasting for each signal. Next, the forecasting of day-ahead solar radiation in the frequency domain was performed by reconstructing the forecast signal data.

3.4. Domain Hybrid Day-Ahead Solar Radiation Forecasting

In the second stage, the domain hybrid solar radiation forecasting model, constructed using the MLP algorithm, takes two forecast values in the sequence and frequency domains as inputs. We used 4 years of forecasting values obtained in the two domains from the first stage as the model’s training and test set. The hidden layer of the model consisted of seven layers with 11 nodes, and Adam and SELU were used as the optimization method and activation function, respectively. Furthermore, the learning rate and epochs were set to 0.0001 and 250, respectively. The network structure of the domain hybrid day-ahead solar radiation forecasting model is depicted in Figure 6.

The proposed solar radiation forecasting model considers solar radiation patterns in the frequency domain and the meteorological factors in the sequence domain to achieve better forecasting performance than forecasting in the individual domain.

4. Results and Discussion

In this section, we describe data analysis, along with the comparative experimental results of the proposed domain hybrid solar radiation forecasting model.

4.1. Data Analysis

We used weather and solar radiation data collected from three regions in Korea at 1 h intervals. First, the solar radiation characteristics of the three sites were investigated using box plots and various statistical analyses, as illustrated in Figure 7 and Table 2. Table 2 shows various statistical data for solar radiation data by region, including skewness, kurtosis, standard deviation, and maximum/minimum of solar radiation.

To reflect all data with the same degree of importance, the input data were preprocessed by min–max normalization, as defined in Equation (2).

x_{n o r m} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}},

(2)

where

x

represents the original data, and

x_{m i n}

and

x_{m a x}

represent the minimum and maximum values of the original data, respectively. Lastly, all the values are normalized with respect to values between 0 and 1.

To evaluate the forecasting performance of the proposed model, we used three metrics: mean absolute error (MAE), root-mean-square error (RMSE), and normalized root-mean-square error (NRMSE), represented by Equations (3)–(5).

M A E = \frac{1}{n} \sum_{t = 1}^{n} |A_{t} - F_{t}|,

(3)

R M S E = \sqrt{\frac{\sum_{t = 1}^{n} {(F_{t} - A_{t})}^{2}}{n}},

(4)

N R M S E = \frac{\sqrt{\frac{\sum_{t = 1}^{n} {(F_{t} - A_{t})}^{2}}{n}}}{A_{m a x} - A_{m i n}} \times 100,

(5)

where

A_{t}

and

F_{t}

represent the actual and forecast values, respectively, at time

t

,

n

indicates the number of observations, and

A_{m a x}

and

A_{m i n}

represent the maximum and minimum of the actual values, respectively.

The experiment was performed using Windows 10 and an Intel (R) Core (TM) i7-9700K CPU, Samsung 32G DDR4 memory, and an NVIDIA GeForce RTX 2080 SUPER graphics card. Python 3.9 was employed to perform the day-ahead solar radiation forecasting using our proposed model. The LightGBM-based prediction model in the sequence domain was constructed using Scikit-learn (v.1.1.3), and the parameters were tuned using Grid Search [45]. The frequency-domain and sequence-domain hybrid forecasts were performed using Pytorch 1.12.1 [46]. The corresponding experiments and results are described below.

4.2. Experimental Results

Extensive experiments were conducted with various solar radiation forecasting models to evaluate the performance of the proposed domain hybrid day-ahead solar radiation forecasting model. As mentioned above, in this experiment, data of 3 years were used as training data, and data of 2 years were used as test data for three regions in Korea. The periods of the training and test dataset used for each model are shown in Table 3, and Table 4 illustrates the input variables used for each model.

4.2.1. Comparative Experiment

To verify the performance of the proposed prediction model, we performed a comparison experiment with forecasting models based on various AI algorithms and a state-of-the-art model, WT-CEEMDAN-IASO-ORELM [32]. The experimental results are presented in Table 5, Table 6 and Table 7.

Values in bold in each table represent the best accuracy for each metric. In the comparison experiment, all models except WT-CEEMDAN-IASO-ORELM [32] and the proposed model used the input variables for forecasting in the sequence domain as inputs. Sequence-domain forecasting models consider only exogenous factors such as time information, air temperature, relative humidity, and wind speed. Therefore, sequence-domain forecasting has fundamental limitations in that the performance is greatly affected by the number of input variables. In addition, the state-of-the-art model WT-CEEMDAN-IASO-ORELM confirmed that frequency-domain forecasting can outperform sequence-domain forecasting based on deep learning and ensemble learning. Our proposed model showed the highest forecasting performance in all regions and all evaluation metrics, confirming that the domain hybrid model combining sequence- and frequency-domain forecasting performed even better than the sequence- or frequency-domain forecasting models. Additional comparisons of average performance are illustrated in Figure 8, Figure 9 and Figure 10.

In the figures, the models are sorted by performance. The figures show that the proposed model achieved the best performance in all evaluation indicators. Also, the WT-CEEMDAN-IASO-ORELM model showed the second-best performance, and the ensemble learning-based models showed good performance among sequence domain forecasting models.

4.2.2. Ablation Study

Ablation studies were performed to verify the effectiveness of our proposed model. The composition of the ablation studies is shown in Table 8, and the results of the ablation studies are presented in Figure 11.

We performed experiments on three regions datasets and evaluated each ablation study with three evaluation metrics. The proposed model shows the best forecasting performance in all three regions with an error rate of 7–8% in terms of the NRMSE metric. In addition, our proposed model showed a performance improvement of up to 6% compared to sequence-domain forecasting and about 3% performance improvement compared to frequency-domain forecasting. The comparison experiment with Case 3 confirmed that our proposed domain hybrid model was more accurate than the method using forecasted values in the frequency domain and exogenous variables as inputs. In addition, a comparative experiment with Case 4 confirmed that the forecasting performance deteriorated when the exogenous variables were used as additional inputs to the domain hybrid. Similar to the NRMSE metric, in terms of the RMSE metric, the proposed model showed the best forecasting performance for all regions. In addition, its forecasting performance was stable irrespective of region, and the error rate difference between the regions was small. Lastly, we reconfirmed the best performance of the proposed model in all regions in terms of the MAE metric. These experimental results indicate that the proposed model exhibited the best forecasting performance in terms of all three evaluation metrics at all regions, and its performance stability was independent of the evaluation region.

By contrast, the performance of the forecasting model in the sequence domain is limited because it considers time information and only three exogenous factors. In addition, hybrid forecasting models in the frequency domain, such as WT-CEEMDAN-IASO-ORELM, consider only solar radiation and do not consider exogenous factors; thus, there is still a limit to improving forecasting performance. The performance of our proposed domain hybrid forecasting model is more stable and better than that of existing single-domain forecasting models as it considers the solar radiation pattern and exogenous factor information of each frequency- and sequence-domain forecasting.

5. Conclusions

In this paper, we proposed a domain hybrid day-ahead solar radiation forecasting model that combines sequence-domain forecasting using exogenous data and frequency-domain forecasting using solar radiation. We performed extensive experiments with state-of-the-art and other popular forecasting models for three regions in South Korea. The proposed model showed an error rate of 7–8% in terms of NRMSE and the best performance in all three regions. In addition, it achieved up to 6% of performance improvement compared to individual domain predictions. In the future, we plan to develop a photovoltaic power generation forecasting method connected to solar radiation forecasting and an economical energy operation scheduling method related to electricity load forecasting and photovoltaic power generation forecasting.

Author Contributions

Conceptualization, J.P.; methodology, J.P.; software, J.P. and J.S.; validation, J.P.; formal analysis, S.P.; investigation, J.S.; data curation, J.P. and S.P.; writing—original draft preparation, J.P.; writing—review and editing, E.H.; visualization, S.P.; supervision, E.H.; project administration, E.H.; funding acquisition, E.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Energy Cloud R&D Program (grant number: 2019M3F2A1073184) through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ameur, A.; Berrada, A.; Loudiyi, K.; Aggour, M. Forecast modeling and performance assessment of solar PV systems. J. Clean. Prod. 2020, 267, 122167. [Google Scholar] [CrossRef]
Krishnan, N.; Kumar, K.R.; Inda, C.S. How solar radiation forecasting impacts the utilization of solar energy: A critical review. J. Clean. Prod. 2023, 388, 135860. [Google Scholar] [CrossRef]
de Freitas Viscondi, G.; Alves-Souza, S.N. A Systematic Literature Review on big data for solar photovoltaic electricity generation forecasting. Sustain. Energy Technol. Assess. 2019, 31, 54–63. [Google Scholar] [CrossRef]
Espinar, B.; Aznarte, J.-L.; Girard, R.; Moussa, A.M.; Kariniotakis, G. Photovoltaic Forecasting: A state of the art. In Proceedings of the 5th European PV-Hybrid and Mini-Grid Conference, Tarragona, Spain, 29–30 April 2010; pp. 250–255, ISBN 978-3-941785-15-1. [Google Scholar]
Colak, I.; Yesilbudak, M.; Genc, N.; Bayindir, R. Multi-period prediction of solar radiation using ARMA and ARIMA models. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 9–11 December 2015; pp. 1045–1049. [Google Scholar]
Bilgili, M.; Ozgoren, M. Daily total global solar radiation modeling from several meteorological data. Meteorol. Atmos. Phys. 2011, 112, 125–138. [Google Scholar] [CrossRef]
Martín, A.; Trapero, J.R. Recursive Estimation Methods to Forecast Short-Term Solar Irradiation. In Environment, Energy and Climate Change II: Energies from New Resources and the Climate Change; Springer: Berlin/Heidelberg, Germany, 2016; pp. 17–32. [Google Scholar]
Zhang, Z.; Hong, W.-C.; Li, J. Electric Load Forecasting by Hybrid Self-Recurrent Support Vector Regression Model with Variational Mode Decomposition and Improved Cuckoo Search Algorithm. IEEE Access 2020, 8, 14642–14658. [Google Scholar] [CrossRef]
Kelo, S.; Dudul, S. A wavelet Elman neural network for short-term electrical load prediction under the influence of temperature. Int. J. Electr. Power Energy Syst. 2012, 43, 1063–1071. [Google Scholar] [CrossRef]
Mohammadi, K.; Shamshirband, S.; Danesh, A.S.; Zamani, M.; Sudheer, C. Retracted Article: Horizontal global solar radiation estimation using hybrid SVM-firefly and SVM-wavelet algorithms: A case study. Nat. Hazards 2020, 102, 1613–1614. [Google Scholar] [CrossRef]
Gutierrez-Corea, F.-V.; Manso-Callejo, M.-A.; Moreno-Regidor, M.-P.; Manrique-Sancho, M.-T. Forecasting short-term solar irradiance based on artificial neural networks and data from neighboring meteorological stations. Sol. Energy 2016, 134, 119–131. [Google Scholar] [CrossRef]
Huang, X.; Shi, J.; Gao, B.; Tai, Y.; Chen, Z.; Zhang, J. Forecasting Hourly Solar Irradiance Using Hybrid Wavelet Transformation and Elman Model in Smart Grid. IEEE Access 2019, 7, 139909–139923. [Google Scholar] [CrossRef]
Peng, S.; Chen, R.; Yu, B.; Xiang, M.; Lin, X.; Liu, E. Daily natural gas load forecasting based on the combination of long short term memory, local mean decomposition, and wavelet threshold denoising algorithm. J. Nat. Gas Sci. Eng. 2021, 95, 104175. [Google Scholar] [CrossRef]
Sharma, V.; Yang, D.; Walsh, W.; Reindl, T. Short term solar irradiance forecasting using a mixed wavelet neural network. Renew. Energy 2016, 90, 481–492. [Google Scholar] [CrossRef]
Gao, B.; Huang, X.; Shi, J.; Tai, Y.; Zhang, J. Hourly forecasting of solar irradiance based on CEEMDAN and multi-strategy CNN-LSTM neural networks. Renew. Energy 2020, 162, 1665–1683. [Google Scholar] [CrossRef]
Massaoudi, M.; Refaat, S.S.; Chihi, I.; Trabelsi, M.; Oueslati, F.S.; Abu-Rub, H. A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for Short-Term Load Forecasting. Energy 2021, 214, 118874. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
Zhang, D.; Zhang, D. Wavelet transform. In Fundamentals of Image Data Mining: Analysis, Features, Classification and Retrieval; Springer: Berlin/Heidelberg, Germany, 2019; pp. 35–44. [Google Scholar]
Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 4144–4147. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
Olatomiwa, L.; Mekhilef, S.; Shamshirband, S.; Mohammadi, K.; Petković, D.; Sudheer, C. A support vector machine–firefly algorithm-based model for global solar radiation prediction. Sol. Energy 2015, 115, 632–644. [Google Scholar] [CrossRef]
Jiang, H.; Dong, Y. A nonlinear support vector machine model with hard penalty function based on glowworm swarm optimization for forecasting daily global solar radiation. Energy Convers. Manag. 2016, 126, 991–1002. [Google Scholar] [CrossRef]
Guermoui, M.; Abdelaziz, R.; Gairaa, K.; Djemoui, L.; Benkaciali, S. New temperature-based predicting model for global solar radiation using support vector regression. Int. J. Ambient Energy 2022, 43, 1397–1407. [Google Scholar] [CrossRef]
Sfetsos, A.; Coonick, A. Univariate and multivariate forecasting of hourly solar radiation with artificial intelligence techniques. Sol. Energy 2000, 68, 169–178. [Google Scholar] [CrossRef]
Mellit, A.; Pavan, A.M. A 24-h forecast of solar irradiance using artificial neural network: Application for performance prediction of a grid-connected PV plant at Trieste, Italy. Sol. Energy 2010, 84, 807–821. [Google Scholar] [CrossRef]
Yacef, R.; Benghanem, M.; Mellit, A. Prediction of daily global solar irradiation data using Bayesian neural network: A comparative study. Renew. Energy 2012, 48, 146–154. [Google Scholar] [CrossRef]
Yıldırım, H.B.; Çelik, Ö.; Teke, A.; Barutçu, B. Estimating daily Global solar radiation with graphical user interface in Eastern Mediterranean region of Turkey. Renew. Sustain. Energy Rev. 2018, 82, 1528–1537. [Google Scholar] [CrossRef]
Kaba, K.; Sarıgül, M.; Avcı, M.; Kandırmaz, H.M. Estimation of daily global solar radiation using deep learning model. Energy 2018, 162, 126–135. [Google Scholar] [CrossRef]
Yu, Y.; Cao, J.; Zhu, J. An LSTM Short-Term Solar Irradiance Forecasting Under Complicated Weather Conditions. IEEE Access 2019, 7, 145651–145666. [Google Scholar] [CrossRef]
He, H.; Lu, N.; Jie, Y.; Chen, B.; Jiao, R. Probabilistic solar irradiance forecasting via a deep learning-based hybrid approach. IEEJ Trans. Electr. Electron. Eng. 2020, 15, 1604–1612. [Google Scholar] [CrossRef]
Shamshirband, S.; Mohammadi, K.; Khorasanizadeh, H.; Yee, L.; Lee, M.; Petković, D.; Zalnezhad, E. Estimating the diffuse solar radiation using a coupled support vector machine–wavelet transform model. Renew. Sustain. Energy Rev. 2016, 56, 428–435. [Google Scholar] [CrossRef]
Zhang, C.; Hua, L.; Ji, C.; Nazir, M.S.; Peng, T. An evolutionary robust solar radiation prediction model based on WT-CEEMDAN and IASO-optimized outlier robust extreme learning machine. Appl. Energy 2022, 322, 119518. [Google Scholar] [CrossRef]
Kim, J.; Moon, J.; Hwang, E.; Kang, P. Recurrent inception convolution neural network for multi short-term load forecasting. Energy Build. 2019, 194, 328–341. [Google Scholar] [CrossRef]
Park, J.; Moon, J.; Jung, S.; Hwang, E. Multistep-Ahead Solar Radiation Forecasting Scheme Based on the Light Gradient Boosting Machine: A Case Study of Jeju Island. Remote Sens. 2020, 12, 2271. [Google Scholar] [CrossRef]
Zhang, Y.; Yu, W.; Li, Z.; Raza, S.; Cao, H. Detecting Ethereum Ponzi Schemes Based on Improved LightGBM Algorithm. IEEE Trans. Comput. Soc. Syst. 2021, 9, 624–637. [Google Scholar] [CrossRef]
De Livera, A.M.; Hyndman, R.J.; Snyder, R.D. Forecasting Time Series with Complex Seasonal Patterns Using Exponential Smoothing. J. Am. Stat. Assoc. 2011, 106, 1513–1527. [Google Scholar] [CrossRef] [Green Version]
Ho, R.; Hung, K. A Comparative Investigation of Mode Mixing in EEG Decomposition Using EMD, EEMD and M-EMD. In Proceedings of the 2020 IEEE 10th Symposium on Computer Applications & Industrial Electronics (ISCAIE), Penang, Malaysia, 18–19 April 2020; pp. 203–210. [Google Scholar] [CrossRef]
Gulli, A.; Kapoor, A.; Pal, S. Deep Learning with TensorFlow 2 and Keras: Regression, ConvNets, GANs, RNNs, NLP, and More with TensorFlow 2 and the Keras API; Packt Publishing Ltd.: Birmingham, UK, 2019. [Google Scholar]
Camacho Olmedo, M.T.; Paegelow, M.; Mas, J.F.; Escobar, F. Geomatic Approaches for Modeling Land Change Scenarios. An Introduction; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2022. [Google Scholar]
Ramchoun, H.; Idrissi, M.A.J.; Ghanou, Y.; Ettaouil, M.; Janati Idrissi, M.A. Multilayer Perceptron: Architecture Optimization and Training. Int. J. Interact. Multimed. Artif. Intell. 2016, 4, 26. [Google Scholar] [CrossRef] [Green Version]
Lins, A.; Ludermir, T.B. Hybrid optimization algorithm for the definition of MLP neural network architectures and weights. In Proceedings of the Fifth International Conference on Hybrid Intelligent Systems (HIS’05), Rio de Janeiro, Brazil, 6–9 November 2005; p. 6. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-Normalizing Neural Networks. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2019; Volume 32. [Google Scholar]

Figure 1. Architecture of the proposed domain hybrid day-ahead solar radiation forecasting model.

Figure 2. Example of monthly time series cross-validation.

Figure 3. Overall procedure of wavelet transform.

Figure 4. CEEMDAN signal decomposition process.

Figure 5. Structure of the day-ahead solar radiation forecasting model in the frequency domain.

Figure 6. Structure of the domain hybrid day-ahead solar radiation forecasting model.

Figure 7. Box plots by region (MJ/m²).

Figure 8. Average NRMSE comparison of solar radiation forecasting models (%).

Figure 9. Average RMSE comparison of solar radiation forecasting models (MJ/m²).

Figure 10. Average MAE comparison of solar radiation forecasting models (MJ/m²).

Figure 11. Ablation study result.

Table 1. Latitudes, longitudes, and elevations of the data collection region.

Region Name	Latitude	Longitude	Elevation (m)
Gosan	33.29382	126.16283	71.39
Jeju	33.51411	126.52969	20.79
Gangneung	37.75147	128.89099	27.12

Table 2. Statistical analysis of the solar radiation data by region (MJ/m²).

	Gosan		Jeju		Gangneung
	Training Set	Test Set	Training Set	Test Set	Training Set	Test Set
Mean	0.873	1.17	1.216	1.229	1.31	1.313
Standard error	0.007	0.008	0.011	0.009	0.011	0.009
Median	0.71	0.95	0.96	0.97	1.12	1.15
Mode	0	0	0	0	0	0
Standard deviation	0.699	0.942	0.996	0.997	0.993	0.991
Sample variance	0.488	0.887	0.992	0.995	0.987	0.983
Kurtosis	−0.724	−0.857	−0.792	−0.813	−0.76	−0.795
Skewness	0.618	0.571	0.626	0.606	0.54	0.514
Range	2.8	3.55	3.72	3.69	4.17	3.99
Minimum	0	0	0	0	0	0
Maximum	2.8	3.55	3.72	3.69	4.17	3.99
Sum	7012.45	14,095.12	9782.53	14,827.55	10,538.5	15,836.58
Count	8041	12,056	8041	12,056	8041	12,056

Table 3. Period of data used for the training and testing of the forecasting model.

Model	Training Set Period	Test Set Period
Sequence-domain model	2016.01~2016.12	2017.01~2020.12
Frequency-domain model	2016.01~2016.12	2017.01~2020.12
Domain hybrid model	2017.01~2018.12	2019.01~2020.12

Table 4. Input variables used for each model.

Model	Input Variables
Sequence-domain model	Time information, exogenous variables
Frequency-domain model	Past solar radiation
WT-CEEMDAN-IASO-ORELM [32]	Past solar radiation
Our proposed model	Frequency- and sequence-domain forecast values

Table 5. NRMSE results of solar radiation forecasting models in three regions (%).

Model	Gosan	Jeju	Gangneung
Decision tree	17.83	14.94	13.67
Random forest	16.46	13.39	11.72
XGBoost	16.73	13.08	11.52
LightGBM	16.91	13.31	11.56
MLP	16.12	15.78	13.14
Recurrent neural network	15.94	14.89	13.18
Gated recurrent unit	15.82	14.09	12.05
LSTM	15.36	13.97	12.03
WT-CEEMDAN-IASO-ORELM [32]	9.65	9.23	9.65
Our proposed model	8.53	7.51	7.31

Table 6. RMSE results of solar radiation forecasting models in three regions (MJ/m²).

Model	Gosan	Jeju	Gangneung
Decision tree	0.6241	0.5514	0.5222
Random forest	0.5764	0.4941	0.4478
XGBoost	0.5856	0.4828	0.4401
LightGBM	0.5918	0.4913	0.4419
MLP	0.5644	0.5823	0.502
Recurrent neural network	0.558	0.5495	0.5035
Gated recurrent unit	0.5538	0.52	0.4606
LSTM	0.5378	0.5155	0.4596
WT-CEEMDAN-IASO-ORELM [32]	0.3429	0.3406	0.3581
Our proposed model	0.2986	0.2774	0.292

Table 7. MAE results of solar radiation forecasting models in three regions (MJ/m²).

Model	Gosan	Jeju	Gangneung
Decision tree	0.4676	0.4017	0.3495
Random forest	0.4341	0.3565	0.3045
XGBoost	0.446	0.3477	0.3074
LightGBM	0.4498	0.3614	0.3047
MLP	0.4247	0.4582	0.3834
Recurrent neural network	0.4406	0.4267	0.3763
Gated recurrent unit	0.4321	0.3889	0.3241
LSTM	0.4167	0.3855	0.3197
WT-CEEMDAN-IASO-ORELM [32]	0.2703	0.261	0.2893
Our proposed model	0.2171	0.212	0.219

Table 8. Configuration of the ablation studies.

Case No.	Configuration
1	Use the sequence-domain forecasting of the proposed model.
2	Use the frequency-domain forecasting of the proposed model.
3	Use the forecasted value in the frequency model, air temperature, relative humidity, and wind speed as inputs to the second stage of the proposed model.
4	Use the forecasted values in the sequence domain and frequency domain, air temperature, relative humidity, and wind speed as inputs to the second stage of the proposed model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, J.; Park, S.; Shim, J.; Hwang, E. Domain Hybrid Day-Ahead Solar Radiation Forecasting Scheme. Remote Sens. 2023, 15, 1622. https://doi.org/10.3390/rs15061622

AMA Style

Park J, Park S, Shim J, Hwang E. Domain Hybrid Day-Ahead Solar Radiation Forecasting Scheme. Remote Sensing. 2023; 15(6):1622. https://doi.org/10.3390/rs15061622

Chicago/Turabian Style

Park, Jinwoong, Sungwoo Park, Jonghwa Shim, and Eenjun Hwang. 2023. "Domain Hybrid Day-Ahead Solar Radiation Forecasting Scheme" Remote Sensing 15, no. 6: 1622. https://doi.org/10.3390/rs15061622

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Domain Hybrid Day-Ahead Solar Radiation Forecasting Scheme

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Data Collection

3.2. Solar Radiation Forecasting in the Sequence Domain Using Exogenous Factors

3.2.1. LightGBM

3.2.2. Time-Series Cross-Validation

3.3. Solar Radiation Forecasting Using Past Solar Radiation Data in the Frequency Domain

3.3.1. Wavelet Transform

3.3.2. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

3.3.3. Multilayer Perceptron

3.4. Domain Hybrid Day-Ahead Solar Radiation Forecasting

4. Results and Discussion

4.1. Data Analysis

4.2. Experimental Results

4.2.1. Comparative Experiment

4.2.2. Ablation Study

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI