Evaluation of Scikit-Learn Machine Learning Algorithms for Improving CMA-WSP v2.0 Solar Radiation Prediction

Wang, Dan; Shen, Yanbo; Ye, Dong; Yang, Yanchao; Da, Xuanfang; Mo, Jingyue

doi:10.3390/atmos15080994

Open AccessArticle

Evaluation of Scikit-Learn Machine Learning Algorithms for Improving CMA-WSP v2.0 Solar Radiation Prediction

by

Dan Wang

^1,2

,

Yanbo Shen

^2,3,4,*,

Dong Ye

^2,3,

Yanchao Yang

¹,

Xuanfang Da

^2,5 and

Jingyue Mo

^2,3

¹

Shaanxi Provincial Meteorological Service Center, Xi’an 710014, China

²

Key Laboratory of Energy Meteorology, China Meteorological Administration, Beijing 100081, China

³

Public Meteorologocal Service Center, China Meteorological Administration, Beijing 100081, China

⁴

Institute of Desert Meteorology, China Meteorological Administration, Urumqi 830002, China

⁵

Gansu Provincial Meteorological Service Center, Lanzhou 730020, China

^*

Author to whom correspondence should be addressed.

Atmosphere 2024, 15(8), 994; https://doi.org/10.3390/atmos15080994

Submission received: 4 July 2024 / Revised: 12 August 2024 / Accepted: 14 August 2024 / Published: 19 August 2024

(This article belongs to the Special Issue Solar Irradiance and Wind Forecasting)

Download

Browse Figures

Versions Notes

Abstract

:

This article aims to evaluate the performance of solar radiation forecasts produced by CMA-WSP v2.0 (version 2 of the China Meteorological Administration Wind and Solar Energy Prediction System) and to explore the application of machine learning algorithms from the scikit-learn Python library to improve the solar radiation prediction made by the CMA-WSP v2.0. It is found that the performance of the solar radiation forecasting from the CMA-WSP v2.0 is closely related to the weather conditions, with notable diurnal fluctuations. The mean absolute percentage error (MAPE) produced by the CMA-WSP v2.0 is approximately 74% between 11:00 and 13:00. However, the MAPE ranges from 193% to 242% at 07:00–08:00 and 17:00–18:00, which is greater than that observed at other daytime periods. The MAPE is relatively low (high) for both sunny and cloudy (overcast and rainy) conditions, with a high probability of an absolute percentage error below 25% (above 100%). The forecasts tend to underestimate (overestimate) the observed solar radiation in sunny and cloudy (overcast and rainy) conditions. By applying machine learning models (such as linear regression, decision trees, K-nearest neighbors, random forests regression, adaptive boosting, and gradient boosting regression) to revise the solar radiation forecasts, the MAPE produced by the CMA-WSP v2.0 is significantly reduced. The reduction in the MAPE is closely connected to the weather conditions. The models of K-nearest neighbors, random forests regression, and decision trees can reduce the MAPE in all weather conditions. The K-nearest neighbor model exhibits the most optimal performance among these models, particularly in rainy conditions. The random forest regression model demonstrates the second-best performance compared to that of the K-nearest neighbor model. The gradient boosting regression model has been observed to reduce the MAPE of the CMA-WSP v2.0 in all weather conditions except rainy. In contrast, the adaptive boosting (linear regression) model exhibited a diminished capacity to improve the CMA-WSP v2.0 solar radiation prediction, with a slight reduction in MAPE observed only in sunny (sunny and cloudy) conditions. In addition, the input feature selection has a considerable influence on the performance of the machine learning model. The incorporation of the time series data associated with the diurnal variation of solar radiation as an input feature can further improve the model’s performance.

Keywords:

CMA-WSP; solar radiation forecast; machine learning model; K-nearest neighbor; random forest regression

1. Introduction

In light of the increasingly evident effects of global climate change, there has been a growing consensus among the international community that a green and low-carbon transition is necessary. The replacement of fossil fuels with clean energy sources has therefore become an inevitable trend. These clean energy sources include wind power, solar photovoltaic power, thermal power, nuclear power, hydropower, and so on. Among them, solar photovoltaic power has a number of advantages, including no fuel consumption, no geographical constraints, flexible scale, no pollution, safety, reliability and ease of maintenance. As a result, it has become an important force in promoting the transformation of the energy structure. In addition to reducing fossil fuel consumption and greenhouse gas emissions and mitigating climate change, the large-scale deployment of solar photovoltaic power plants will promote the diversification and cleanliness of the energy structure and improve energy security. The “Global Status Report on Renewable Energy 2021” indicates that solar photovoltaic has become a major contributor, accounting for more than half of the newly installed renewable energy capacity, totaling 175 GWs [1]. This growth highlights the potential of solar photovoltaic in the future of sustainable energy. However, solar energy also faces unique challenges, such as intermittency, variability, and instability [2]. These issues complicate the integration of photovoltaic energy into the complex electrical grid [3]. The accurate forecasting of solar radiation plays an important role in the ability to cope with this task. It can facilitate grid scheduling decisions, optimize the allocation of power resources, and mitigate the risks associated with solar power generation’s volatility [4].

The Numerical Weather Prediction (NWP) model is fundamental to solar resource forecasting [5,6,7,8,9,10,11,12,13]. The NWP models such as the Global Forecast System, the European Centre for Medium-Range Weather Forecasts, and the Weather Research and Forecasting model have played significant roles in the prediction of solar radiation forecasts for the subsequent few hours to days [14,15]. Furthermore, climate models, such as Global Climate Models, Regional Climate Models, Providing Regional Climates for Impacts Studies, can be utilized to anticipate the future trajectory of solar radiation over the coming decades, and even longer periods [16,17]. In recent years, the China Meteorological Administration (CMA) developed a specialized weather forecasting system known as the CMA Wind and Solar Energy Prediction System (CMA-WSP) [18] for the purpose of forecasting wind and solar energy for a period of 0–14 days. In May 2023, this system underwent a major upgrade to version 2.0, now capable of providing a comprehensive range of meteorological forecast elements. These include stratified temperature, humidity, and wind field information within the boundary layer over China, as well as critical data such as surface shortwave radiation, ground pressure, and precipitation. These elements are essential for forecasting wind power and photovoltaic power generation. The forecasts have a temporal resolution of up to 15 min and a spatial resolution accurate to 9 km. CMA-WSP v2.0 incorporates multiple factors, including aerosol radiation effects, cloud radiation effects, terrain slope, and shadow effects, optimizing its radiation forecasts.

The NWP model describes weather processes at different scales through the parameterization of physical processes and can predict solar radiation using complex physical models. Although the NWP models account for major physical processes, uncertainties in initial conditions, complex local terrain, and numerical errors can still lead to inaccurate results [19,20,21]. These inaccuracies may include systematic errors, causing outputs of the NWP model to fall short of the expected precision [22]. To achieve more accurate forecasts, model output statistics methods have been proposed to correct forecasts of the NWP model. The model output statistics methods focus on minimizing a target statistic, such as bias or mean squared error, to enhance the accuracy and reliability of weather forecasts. These methods can be broadly categorized into three main types: physical models, traditional time series models, and machine learning (ML) models. Driven by advancements in high-performance computing, big data mining technologies, and artificial intelligence theories, ML models are now more widely used. The scikit-learn library, a popular Python library for machine learning, provides a rich set of ML models and tools supporting research and applications in weather forecasting. Machine learning models in scikit-learn, such as linear regression (LR) [23], decision trees (DT) [24], K-nearest neighbors (KNN) [25], random forest regression (RFR) [26,27], adaptive boosting (AdaBoost) [28], and gradient boosting regression (GBR) [29], have been extensively used for the validation and optimization of NWP models [30]. These models not only offer easy-to-use interfaces and consistent programming models but also provide efficient performance and scalability, enabling the handling of large datasets and high-dimensional features. However, it is important to note that no single ML model is universally applicable to all situations [31]. To illustrate, each solar photovoltaic station is situated in a distinctive geographical location, subject to specific climatic conditions, equipped with particular equipment, and characterized by a unique historical operational dataset. Therefore, when utilizing an ML model for solar radiation prediction, it is imperative to select an appropriate ML model to guarantee that the model can accurately capture the principal influencing factors and changing trends in solar radiation at the site.

Solar radiation at the land surface is closely related to meteorological variables, such as cloud cover, sunshine duration, air temperature, relative humidity, wind speed, and so on [32,33,34,35]. Previous studies have used the data of these meteorological variables as the model input to estimate solar radiation [36,37,38]. Sky cloud cover has the greatest effect on the solar radiation attenuation among these variables. The four IPCC (Intergovernmental Panel on Climate Change) assessment reports (IPCC, 1990, 1992, 1996, 2001) identified cloud radiation parameterization as a significant limitation on the level of simulation currently achievable in climate models [39,40,41,42]. The NWP model is capable of simulating solar radiation with greater precision on days with minimal cloud cover. Nevertheless, it should be noted that simulations of solar radiation may be less accurate on days with extensive cloud cover [43]. It is essential to enhance the precision of solar radiation forecasting under overcast and rainy conditions. This will facilitate the reduction in the impact of power generation on the grid under these weather conditions and enhance the efficiency of solar resource utilization.

The CMA-WSP v2.0 has been applied extensively in meteorological institutions across China. Nevertheless, there is a paucity of comprehensive assessment and investigation into the performance of solar radiation forecasting produced by the CMA-WSP v2.0, particularly in overcast and rainy conditions. This article will focus on evaluations and improvement of solar radiation forecasts generated by the CMA-WSP v2.0. Firstly, the study analyses the solar radiation forecasting performance of the CMA-WSP v2.0 in detail under different weather conditions, including sunny, cloudy, rainy, and overcast, at different moments in the day. Subsequently, six ML models from the Python Scikit-learn library are employed to revise the solar radiation forecast produced by the CMA-WSP v2.0. A comparison is made to evaluate the capacity of the ML models to predict solar radiation under different weather conditions. Finally, the performance of the ML models is enhanced by employing time series associated with the diurnal fluctuations in solar radiation as an input feature. The remainder of the paper is organized as follows: Section 2 and Section 3 introduce the data and methods, respectively. Section 4 and Section 5 analyze the performance of the solar radiation prediction produced by the CMA-WSP v2.0 and the six ML models, respectively. Section 6 provides the conclusion and further discussion.

2. Data

Dataset of meteorological variables, such as longwave and shortwave radiation, cloud cover, albedo, rainfall, snowfall, temperature, humidity, wind, and air temperature (Table 1), forecasted by the CMA-WSP v2.0 are used in this study. Dataset of solar radiation observations collected every 15 min from a photovoltaic power station located in Guizhou Province, China are also used in this study. Using the nearest point method, forecast data from the grid point closest to the target photovoltaic power station are extracted from the CMA-WSP v2.0 as the forecast dataset. The forecast data also have a time step of 15 min, with the starting forecast time being 20:00 Beijing time. The dataset covers the periods from 1 January 2022 to 31 December 2022, and from 2 August 2023 to 26 November 2023. Forecasts from 1 January 2022 to 31 December 2022 are historical retrospective products, while forecasts from 1 August 2023 to 31 October 2023 are real-time forecast products. The above-mentioned datasets are provided by the CMA Wind and Solar Power Generation Refinement Meteorological Service Demonstration Project. Additionally, total cloud cover and precipitation observation data are collected from a national meteorological station located 13 km from the photovoltaic power station from 2 August 2023 to 26 November 2023. All times mentioned in this paper are in Beijing time.

Forecasts for the next 24 h are particularly crucial for decision-making in the energy market. The objective of this paper is to establish a model for predicting the solar radiation within 0–24 h. Considering the time lag between the generation of NWP model products and their application in the business, the solar radiation forecast in the 28–52 h from the CMA-WSP v2.0 is used to produce the 0–24 h forecast. In other words, the 28–52 h solar radiation forecast of the CMA-WSP v2.0 with 20:00 initiation time of the previous day is used to produce the solar radiation forecast from 00:00 to 23:45 of the next day.

3. Methods

First, missing values, outliers, and nighttime zero values are removed. Then, the data are normalized using the min-max normalization method, scaling the data to the range [0,1]. Here, a zero value of astronomical solar radiation at 15 min intervals is used as the criterion for nighttime. Astronomical solar radiation is calculated using Equation (1):

I_{0} = 3.6 \times 10^{3} γ E_{s c} (s i n (δ) s i n (\emptyset) + c o s (δ) c o s (\emptyset) c o s (ω)) .

(1)

In Equation (1),

E_{s c}

is solar constant,

γ

is the Sun–Earth distance correction factor,

δ

is solar declination,

\emptyset

is the latitude at the earth, and

ω

represents the solar hour angle.

Secondly, six ML models from the Python Scikit-learn package, including LR, DT, KNN, RFR, AdaBoost, and GBR (https://scikit-learn.org/stable/index.html accessed on 3 July 2024), are employed to improve the solar radiation produced by the CMA-WSP v2.0. In this study, observed solar radiation is employed as a forecasting object, while forecasted solar radiation (downward shortwave radiation at the surface) and other relevant meteorological variables (Table 1) from the CMA-WSP v2.0 are considered as potential predictors. The datasets from 1 January 2022 to 31 December 2022 and from 1 August 2023 to 31 October 2023 are used as the training set and validation set, respectively. The training set is used to build the prediction models, while the test set is used to evaluate the models’ performance.

Finally, the solar radiation forecasts produced by the CMA-WSP v2.0 and the six ML models are tested under different weather conditions. The weather conditions are classified according to hourly data of cloud cover and precipitation observed at the national meteorological station, situated 13 km from the photovoltaic power station. The weather conditions are classified into four categories: sunny, cloudy, overcast, and rainy (Table 2). To illustrate, if the time of 13:00 is observed to be overcast, the subsequent times of 13:05, 13:15, 13:30, and 13:45 are also classified as overcast.

The forecast accuracy is evaluated by bias error (BE), absolute percentage error (APE), mean absolute percentage error (MAPE), and Pearson’s correlation coefficient (R). They are defined as

B E = S_{f o r e, i} - S_{o b s, i},

(2)

A P E = |\frac{S_{f o r e, i} - S_{o b s, i}}{S_{o b s, i}}|,

(3)

M A P E = \frac{1}{N} \sum_{i = 1}^{N} |\frac{S_{f o r e, i} - S_{o b s, i}}{S_{o b s, i}}|,

(4)

R = \frac{\sum_{i = 1}^{N} (S_{o b s, i} - {\bar{S}}_{o b s}) (S_{f o r e, i} - {\bar{S}}_{f o r e})}{\sqrt{{\sum_{i = 1}^{N} (S_{o b s, i} - {\bar{S}}_{o b s})}^{2}} \sqrt{\sum_{i = 1}^{N} {(S_{f o r e, i} - {\bar{S}}_{f o r e})}^{2}}}, - 1 < R < 1

(5)

where

S_{o b s, i}

and

S_{f o r e, i}

are the

i

-th observed and forecasted solar radiation, respectively,

{\bar{S}}_{o b s}

and

{\bar{S}}_{f o r e}

are the mean of the observed and predicted solar radiation data, respectively, and

N

is the whole number of samples. The APE and MAPE are dimensionless metric that captures the proportion of error relative to the magnitude of the observation. Thus, they allow for a fairer assessment of the model’s forecast accuracy at different locations or at different times. The BE denotes the tendency of the model to under or overestimate of the estimated values.

Difference in the indexes, such as the MAPE, APE, and R, between model 1 and model 2 are calculated by

D_{i n d e x} = i n d e x 1 - i n d e x 2,

(6)

where

D_{i n d e x} > 0

and

D_{i n d e x} < 0

indicate

i n d e x 1

is greater (less) than

i n d e x 2

.

Moreover, APEs and BEs are calculated using forecasts and observations obtained at 15 min intervals. The occurrence percentage of APEs is calculated within four error ranges: those below 25%, between 25% and 75%, between 75% and 100%, and above 100%. The APE below 25% (above 100%) is taken to indicate a favorable (unfavorable) forecast, while an APE between 25% and 100% is regarded as a neutral forecast. Furthermore, the occurrence percentage of BEs is calculated within two error ranges: those below 0 W/m² and above 0 W/m². The percentage of BEs below 0 W/m² (above 0 W/m²) greater than 50% indicates that the predictions are likely to underestimate (overestimate) the observed value.

4. Performance Analysis of the CMA-WSP v2.0 Model

The performance of the CMA-WSP v2.0 in forecasting solar radiation is evaluated using solar radiation observations collected between January and December 2022 and between August and November 2023. Firstly, the percentages of the APE in four distinct error ranges are calculated for each month (Figure 1a). The results show that the percentages of the APE below 25% in March, July, and August 2022 are approximately 50%, which is higher than that in the other months. As a result, the MAPE in these three months is low, with a value of 61% in August 2022 (Figure 1b). Nevertheless, the proportion of APE exceeding 100% in February and September 2022 is 31% and 38%, respectively, while the MAPEs are 170% and 220%, which are markedly higher than those observed in the other months. A comparison is made between the predicted and observed solar radiation at one-hour intervals for the months of February, July, August, and September 2022, taking into account the weather conditions at the same time (Figure 2). Here, the hourly solar radiation forecast (or observation) is obtained by accumulating 15 min forecasts (or observation). The results illustrate a notable discrepancy between the predicted and observed solar radiation under overcast and rainy conditions. Nevertheless, the forecast curve for solar radiation is in close alignment with the observed curve under both sunny and cloudy conditions. Consequently, the MAPE is greater in February and September 2022, which is characterized by a higher prevalence of cloudy and rainy hours. Conversely, the MAPE is smaller in July and August 2022, which had fewer cloudy and rainy hours. It can be observed that the performance of solar radiation forecasting by the CMA-WSP v2.0 is influenced by weather conditions.

The performance of the solar radiation forecast generated by the CMA-WSP v2.0 is further investigated under different weather conditions, including sunny, cloudy, overcast, and rainy. It can be observed that the MAPE is 59%, 68%, 114%, and 403% for sunny, cloudy, overcast and rainy conditions, respectively (Figure 3a). In this weather order, the probability of the APE below 25% (above 100%) is 41%, 34%, 24%, and 18% (13%, 10%, 27%, and 42%) respectively (Figure 3b). It can also be observed that the APEs of CMA-WSP v2.0 are centered around the error ranges below 25%, between 25% and 75%, between 25% and 75%, and above 100% in sunny, cloudy, overcast, and rainy conditions, respectively. This indicates that an increase in cloud cover is correlated with an increase in MAPE, especially when precipitation is present. Figure 3c illustrates the proportion of the BE below and above 0 W/m² in response to varying weather conditions. The probability of BE below 0 W/m² is 59% for sunny conditions and 79% for cloudy conditions. Conversely, the probability of BE above 0 W/m² is 59% under cloudy conditions and 68% under rainy conditions. This suggests that the solar radiation predicted by the CMA-WSP v2.0 is more likely to be lower (higher) than the observed solar radiation in sunny and cloudy (overcast and precipitation) conditions. Furthermore, the correlation coefficients (R) between the solar radiation forecast produced by CMA-WSP v2.0 and its observed values are 0.76, 0.51, 0.61, and 0.54 for sunny, cloudy, overcast, and rainy conditions, in that order. All of these coefficients exceed the 95% significance threshold (Figure 3d). It is clear that there is a significant positive correlation between the observed solar radiation and the solar radiation predicted by the CMA-WSP v2.0, indicating consistent trends in the variation between them.

Considering obvious diurnal variation of the solar radiation, we further analyze the diurnal variation of forecast errors of the solar radiation generated by CMA-WSP v2.0 in the daytime. The MAPE varies from 74% to 242%, with bigger values occurring at 07:00–08:00 and 17:00–18:00 and small values occurring at 11:00–13:00 (Figure 4a). This is consistent with the variation in the percentage of the APE below 25% and above 100% (Figure 4b). The percentage of APE below 25% ranges from 34% to 37% at 11:00–13:00, but varies from 15% to 25% at 07:00–08:00, while the percentage of APE above 100% ranges from 16% to 17% at 11:00–13:00, but varies from 26% to 43% at 07:00–08:00 and 17:00–18:00. It is clear that the performance of the CMA-WSP v2.0 is better at mid-day than in the early morning and late evening. This may be due to the rapid changes in the intensity of solar radiation and the angle of the sun’s elevation at 07:00–08:00 and 17:00–18:00, which make it difficult for the NWP model to accurately capture such sharp changes, resulting in large errors. Figure 4c further shows the diurnal variation of the percentage of the BE below 0 W/m² and above 0 W/m². It shows that the percentage of the BE below 0 W/m² ranges from 54% to 62% between 10:00 and 16:00, and the percentage of the BE above 0 W/m² ranges from 55% to 83% between 7:00 and 9:00 and between 17:00 and 18:00. This indicates that the solar radiation predicted by CMA-WSP v2.0 tends to be lower than that observed from 10:00 to 16:00 and higher than that observed from 7:00 to 9:00 and from 17:00 to 18:00. The correlation coefficients between the solar radiation predicted by CMA-WSP v2.0 and the observed solar radiation are calculated for all hours from 7:00 to 18:00 under different weather conditions (Figure 4d). The correlation coefficients are positive and pass the 95% significance test for all hours under sunny and overcast conditions. However, the positive correlation coefficients for cloudy and rainy conditions do not pass the 95% significance test for some hours; in particular, negative correlation coefficients are observed from 10:00 to 14:00 for rainy conditions. This may be due to uncertainties in the NWP model in predicting the distribution and variability of cloudiness and precipitation, leading to an increase in the errors in predicting solar radiation.

5. Performance Analysis of the ML Models

Based on a dataset of solar radiation observed from August 2023 to November 2023, the performance of the solar radiation forecasts generated by the six ML models is analyzed. Considering the significant diurnal variation of the solar radiation forecast errors generated by the CMA-WSP v2.0, the performances of the six ML models are first examined at hourly intervals (Figure 5). The results show that the MAPEs of the solar radiation forecasts produced by the six ML models are all small at noon, but larger at 7:00–8:00 and 17:00–18:00 than at other times, which is similar to the CMA-WSP v2.0. The MAPEs of the RFR, DT, KNN, and GBR models are all lower than that of the CMA-WSP v2.0 at most times from 9:00 to 16:00, which is caused by the phenomenon that the percentage of the APE below 25% for these ML models are all higher than that of the CMA-WSP v2.0 at most times. The MAPE of the KNN model is lower than that of the other ML models, followed by the RFR model. The MAPEs of the AdaBoost and LR models are close to or greater than that of CMA-WSP v2.0 at most times. This is because the percentages of the APE below 25% for the AdaBoost and LR models are not significantly higher than that of the CMA-WSP v2.0, but the percentages of the APE above 100% for them are greater than that of the CMA-WSP v2.0. Therefore, the ability of the AdaBoost and LR to improve CMA-WSP v2.0 in predicting solar radiation is limited. The MAPEs of the RFR and DT models at 7:00–8:00 and 18:00 are also smaller than that of the CMA-WSP v2.0. However, the MAPEs produced by the KNN, GBR, AdaBoost and LR models are abnormally higher than that produced by the CMA-WSP v2.0 at 7:00–8:00 and 17:00–18:00. Meanwhile, the percentages of the APE above 100% for these four ML models are all abnormally higher than that of the CMA-WSP v2.0. This indicates that these four ML methods increase the uncertainty of the solar radiation forecast produced by CMA-WSP v2.0 during this time period. Since the performance of the solar radiation forecast produced by four of the six ML models at 07:00–08:00 and 17:00–18:00 is extremely poor, this paper will focus on analyzing the performance of the ML models from 09:00 to 16:00, as this time period is more important for solar photovoltaic power than the time periods between 07:00 and 08:00 and between 17:00 and 18:00.

Figure 6 shows the performance analysis of the six ML models in predicting solar radiation between 9:00 and 16:00 in August 2023 and November 2023 compared to the CMA-WSP v2.0. The result shows that the MAPEs of the RFR, DT, KNN, and GBR models for solar radiation prediction are 57%, 66%, 50%, and 66%, respectively, which are 15%, 5%, 22%, and 5% lower than those of the CMA-WSP v2.0. This is because the percentage of their APE is below 25%, being 45%, 43%, 51%, and 37%, respectively, which is also higher than those of the CMA-WSP v2.0. The MAPEs of the AdaBoost and LR models are 84% and 74%, respectively, representing an increase of 12% and 2% compared to the CMA-WSP v2.0. This is consistent with the percentages of the APE below 25% (above 100%) for the AdaBoost and LR models decreasing by 22% and 27% (increasing by 19% and 17%) in comparison to the CMA-WSP v2.0. In addition, the correlation coefficients between the observed and forecasted solar radiation for the RFR, DT, KNN, GBR, AdaBoost, and LR are 0.70, 0.53, 0.69, 0.64, 0.53, 0.57, which are 0.28, 0.10, 0.27, 0.21, 0.11, and 0.15 higher than those for the CMA-WSP v2.0, in that order. Overall, the RFR, DT, KNN, and GBR models are able to improve the performance of the solar radiation forecast produced by the CMA-WSP v2.0. The KNN model perform best than other ML models, followed by RFR. AdaBoost and LR have limited effect in reducing the MAPE of the solar radiation forecast produced by CMA-WSP v2.0, although they do improve the correlation coefficients between the forecast and observed solar radiation.

The performance of the ML models in improving the CMA-WSP v2.0 solar radiation forecast is further analyzed under different weather conditions (Figure 7). In sunny hours, the MAPEs of the RFR, DT, KNN, GBR, AdaBoost, and LR models are, in order, 35%, 42%, 37%, 39%, 40%, 40%, reduced by 13%, 7%, 12%, 10%, 8%, 9% compared to CMA-WSP v2.0. In cloudy conditions, the MAPEs of the RFR, DT, KNN, GBR, AdaBoost, and LR models are 30%, 36%, 26%, 37%, 40%, and 40%, respectively, which are 16%, 10%, 20%, 9%, 0%, and 6% less than that of the CMA-WSP v2.0. On overcast hours, the MAPEs of the RFR, DT, KNN, and GBR (AdaBoost and LR) models are 66%, 79%, 58%, and 74%, in that order (96% and 85%), decreasing (increasing) by 15%, 1%, 23%, and 6% (15% and 4%) compared to the CMA-WSP v2.0. On rainy hours, the MAPEs of RFR, DT, and KNN (GBR, AdaBoost. and LR) are, in order, 160%, 171%, and 124% (206%, 288%, and 235%) and decreased (increased) by 20%, 10%, and 56% (26%, 108%, and 55%) compared to CMA-WSP v2.0. It can be seen that the MAPE produced by CMA-WSP v2.0 is significantly reduced by the KNN, RFR, and DT models under different weather conditions (Figure 7a). The reduction in the MAPE produced by KNN gradually increases under sunny, cloudy, overcast, and rainy conditions. In particular, the solar radiation prediction performance of the KNN model is significantly better than that of the CMA-WSP v2.0 and other ML models under cloudy and rainy conditions. The GBR model can improve the radiance prediction performance of the CMA-WSP v2.0 on sunny, cloudy, and overcast conditions. Although the performance of the AdaBoost (LR) model to revise the CMA-WSP v2.0 solar radiation forecast is not good, it still reduces the MAPE produced by the CMA-WSP v2.0 under sunny condition (sunny and cloudy conditions). The reductions in MAPE for the KNN, RFR, DT, and GBR models relative to the CMA-WSP v2.0 are mainly due to the reduction in the percentage of the APE below 25% (Figure 7b). In rainy conditions, the percentage of APEs below 25% is less than 30%, and the probability of the APEs above 100% is more than 30%, or even more than 60% for ML models except for KNN. Therefore, the performance of the KNN model in predicting solar radiation in rainy conditions is significantly better than that of the other ML methods and the CMA-WSP v2.0. The correlation coefficients between the observed and predicted solar radiation are also calculated for the six ML models (Figure 7c), which are higher than that for the CMA-WSP v2.0 under sunny, cloudy, and overcast conditions. However, only the correlation coefficients for the KNN, RFR, and GBR models are higher than that of the CMA-WSP v2.0 under rainy conditions. In addition, the predicted solar radiation values of the ML models are more likely to be higher (lower) than the observed values in sunny and cloudy (cloudy and rainy) conditions (Figure 7d).

In order to further reduce the MAPE of the CMA-WSP v2.0, two additional improvement scheme have been designed for the ML models, named Scheme 2 and Scheme 3. The method described in Section 3 and analyzed in Section 4 for the ML models is named as Scheme 1. Considering that there is an obvious diurnal variation in the MAPE for the CMA-WSP v2.0 and each of the ML models, we added the minute of the day in the Scheme 2 as a new input feature to the existing features in Scheme 1. Specifically, we matched the 15 min interval forecasts of the CMA-WSP v2.0 by assigning values from 0 to 720, with 15 min intervals, for the period from 7:00 to 19:00. In Scheme 3, we divided the daytime period based on time labels, marking the periods between 07:00 and 08:00 as label 1, the periods between 09:00 and 16:00 as label 2, and the periods between 17:00 and 18:00 as label 3. This time label is then converted into one-hot encoded features. Based on Scheme 2 and Scheme 3, we performed solar radiation prediction experiments using six ML models and evaluated them against Scheme 1. For each forecast time in the 15 min interval, the difference in the APE between the ML model based on Scheme 2 (or Scheme 3) and the ML model based on Scheme 1 is calculated. Negative difference indicates that the APE of the ML model based on Scheme 2 (or Scheme 3) is lower than that based on Scheme 1. Table 3 shows the percentages of the negative differences in APE between the ML models based on Scheme 2 (or Scheme 3) and the ML models based on Scheme 1 for different times. It is found that the percentages of the negative differences in the APE for the RFR, GBR, AdaBoost, and LR (DT, KNN) models between Scheme 2 (or Scheme 3) and the Scheme 1 are higher (lower) than 50% at most moments. This indicates that the performances of the RFR, GBR, AdaBoost, and LR models based on Scheme 1 are all improved to some extent by applying Scheme 2 or Scheme 3. Figure 8 compares the MAPE of the solar radiation forecasts produced by the ML models based on Schemes 1, 2, and 3. The result shows that the MAPE produced by the RFR, GBR, AdaBoost, and LR based on Schemes 2 and 3 are reduced by 1% to 3% compared to those based on Scheme 1. The reductions in the MAPE for the RFR based on Schemes 2 and 3 are greater than for the other ML models, with the reduction in the MAPE based on Scheme 2 being more significant than that based on Scheme 3. Despite the reduction in the MAPE for the KNN model being almost 0%, the MAPE of the KNN model is also the lowest among the ML models. In addition, neither Scheme 2 nor Scheme 3 solves the problem of unusually large forecast errors in the CMA-WSP solar radiation forecasts at 7:00–8:00 and 17:00–18:00, as well as under rainy conditions. It is therefore necessary to experiment with more schemes to improve the performance of ML models in predicting solar radiation in the future.

6. Conclusions and Discussion

6.1. Conclusions

This study focuses on the evaluation and improvement of the solar radiation forecasts generated by the CMA-WSP v2.0. The performance of the CMA-WSP v2.0 is analyzed using observed solar radiation from January to December 2022 and from August to November 2023. The ML models, including RFR, DT, KNN, GBR, AdaBoost, and LR, are used to revise the CAM-WSP v2.0’s solar radiation forecast during August and November 2023. A comparative evaluation is then made of the performance of the CMA-WSP v2.0 and the six ML models. It is found that the performance of the CMA-WSP v2.0 in predicting solar radiation is reasonably good, especially in sunny and cloudy conditions, and can be further improved by the ML models.

The performance of the solar radiation forecast produced by the CMA-WSP v2.0 is closely related to the weather conditions. In sunny and cloudy conditions, the percentages of the APE below 25% (above 100%) are 41% and 34% (13% and 10%), respectively, which are higher (lower) than those in overcast and rainy conditions. The MAPEs generated by the CMA-WSP v2.0 are 59%, 68%, 114%, and 403% for sunny, cloudy, overcast, and rainy conditions. The abnormally high MAPE in rainy conditions is due to the percentage of the APE above 100% reaching 42%. There is an obvious diurnal variation in the solar radiation forecast errors of the CMA-WSP v2.0. The MAPE between midday hours such as 11:00–13:00 is small, around 74%, while the MAPE between 07:00 and 08:00 and 17:00–18:00 is large, ranging from 193% to 242%. The CMA-WSP v2.0 is likely to underestimate the observed solar radiation between 11:00 and 16:00, but overestimate the observed solar radiation between 07:00 and 09:00 and 17:0–18:00. The correlation coefficients between the solar radiation predicted by CMA-WSP v2.0 and the observed solar radiation pass the 95% significance test at every moment in the sunny and cloudy conditions, but not for most moments in the cloudy and rainy conditions.

The ML models such as the RFR, KNN, GBR, AdaBoost, LR, and DT perform differently in revising the solar radiation forecasted by the CMA-WSP v2.0. Except at the 7:00–8:00 and 17:00–18:00, the RFR, DT, KNN, and GBR models are able to reduce the MAPE of the CMA-WSP v2.0 and improve the correlation coefficients between the observations and CMA-WSP2.0’s forecasts at most times. Although AdaBoost and LR also improve the correlation coefficients between the observation and CMA-WSP v2.0’s forecasts, they do not reduce the MAPE of the CMA-WSP v2.0 well. The ability of the ML models to revise the solar radiation forecast of the CMA-WSP v2.0 is closely related to the weather conditions. The MAPE of the CMA-WSP v2.0 is significantly reduced by the KNN, RFR, and DT models in all weather conditions, with KNN performing best, followed by RFR. The GBR model is able to reduce the MAPE of the CMA-WSP v2.0 in all weather conditions except rain. The AdaBoost (LR) is also able to reduce the MAPE of the CMA-WSP v2.0 to some extent in sunny (sunny and cloudy) conditions. In the rainy condition, for the ML models except for KNN, the percentage of the APE below 25% is less than 30%, and the probability of the APE above 100% is more than 30% or even more than 60%, which is higher than that for the CMA-WSP v2.0. Thus, the ML models, with the exception of the KNN, increase the uncertainty in the solar radiation forecast of the CMA-WSP v2.0 for rainy conditions. The prediction of solar radiation is likely to be lower than its observation in sunny and cloudy conditions, but higher than the observation in overcast and rainy conditions for the ML models, which is the same for the CMA-WSP v2.0. In addition, the input feature selection has a considerable influence on the performance of the ML models. The incorporation of the time series data associated with the diurnal variation of solar radiation as an input feature can further improve the performance of the ML models.

6.2. Discussion

In this study, the MAPE and the APE are calculated using the observed and predicted solar radiation with 15 min intervals. This method very strictly analyzes the performance of the NWP model, which requires the NWP model to be able to accurately capture and predict small variations in weather and solar radiation over a short period of time (e.g., within 1 h). Therefore, a different method is introduced to calculate the MAPE and the APE, as described in reference [44]. First, the observed and predicted daily total solar radiations are obtained by summing 15 min solar radiation values from observations and CMA-WSP v2.0 predictions, respectively, over a day. Then, the MAPE and the percentage of the APE below 25% are calculated using the daily total solar radiation between January and December 2022 and between August and November 2023 (Figure 9), which are compared with the performance of the WRF model using ECMWF and GFS as the initial field [44]. The MAPEs of the CMA-WSP v2.0 are lower than those of the WRF model in most months; in particular, the percentages of the APE below 25% exceed 74% for the CMA-WSP v2.0 in March, July, and August 2022. This indicates that the performance of the CMA-WSP v2.0 in predicting solar radiation is good and valuable for business applications. Furthermore, it is found that the KNN, RFR, and DT models can improve the performance of the CMA-WSP v2.0 under different weather conditions. Among them, the KNN model performs best, especially in cloudy and rainy conditions. It is known that solar radiation simulations for the NWP model can be less accurate on days with extensive cloud cover or rain [43]. Therefore, the findings of this study are very valuable to improve the solar radiation prediction of the NWP model under cloudy and rainy conditions and to increase the efficiency of solar resource utilization. This study is an experiment applied to the photovoltaic power plant in Guizhou, China, and the experiment will be further applied to other photovoltaic power plants as much as possible. However, it is necessary to consider the economic costs associated with the business application of ML models, including the cost of data collection; the need for high performance computing resources, maintenance, and updates of the models; and the implementation of the model in real-world scenarios.

Author Contributions

Conceptualization, D.W. and Y.S.; methodology, D.W., D.Y. and Y.Y.; validation, Y.S.; formal analysis, D.W.; investigation, X.D. and J.M., data curation, D.Y.; writing—original draft, D.W. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the regional collaborative innovation project of Xinjiang Uygur Autonomous Region (grant number 2023E01011), “Tianchi Talents” Introduction Plan (2023), Key Research Project of Shaanxi Provincial Department of Science and Technology (grant number 2023-YBSF-235), and Meteorological Science and Technology Innovation Platform of China Meteorological Service Association (grant number CMSA2023MB024).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request due to privacy.

Acknowledgments

We thank the editor and four anonymous reviewers for their constructive comments, which helped to improve the quality of the manuscript. We also thank Ailiyaer Aihaiti from the Institute of Desert Meteorology, China Meteorological Administration, for his valuable suggestions during the revision of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

CMA	China Meteorological Administrations
CMA-WSP v2.0	Version 2 of the China Meteorological Administration Wind and Solar Energy Prediction System
IPCC	Intergovernmental Panel on Climate Change
ML	Machine learning
LR	Linear regression
DT	Decision trees
KNN	K-nearest neighbors
RFR	Random forest regressions
AdaBoost	Adaptive boosting
GBR	Gradient boosting regression
BE	Bias error
APE	Absolute percentage error
MAPE	Mean absolute percentage error
R	Pearson’s correlation coefficient

References

Murdock, H.E.; Gibb, D.; André, T.; Sawin, J.L.; Brown, A.; Ranalder, L.; Collier, U.; Dent, C.; Epp, B.; Hareesh Kumar, C.; et al. Renewables 2021-Global Status Report; UN Environment Programme: Paris, France, 2021. [Google Scholar]
Liu, Y.Q.; Qin, H.; Zhang, Z.D.; Pei, S.Q.; Wang, C.; Yu, X.; Jiang, Z.Q.; Zhou, J.Z. Ensemble Spatiotemporal Forecasting of Solar Irradiation Using Variational Bayesian Convolutional Gate Recurrent Unit Network. Appl. Energy 2019, 253, 113596. [Google Scholar] [CrossRef]
Sun, S.L.; Wang, S.Y.; Zhang, G.W.; Zheng, J.L. A Decomposition-Clustering-Ensemble Learning Approach for Solar Radiation Forecasting. Sol. Energy 2018, 163, 189–199. [Google Scholar] [CrossRef]
Dong, J.; Olama, M.M.; Kuruganti, T.; Melin, A.M.; Djouadi, S.M.; Zhang, Y.C.; Xue, Y.S. Novel Stochastic Methods to Predict Short-Term Solar Radiation and Photovoltaic Power. Renew. Energy 2020, 145, 333–346. [Google Scholar] [CrossRef]
Mathiesen, P.; Kleissl, J. Evaluation of numerical weather prediction for intraday solar forecasting in the continental United States. Sol. Energy 2011, 85, 967–977. [Google Scholar] [CrossRef]
Lara-Fanego, V.; Ruiz-Arias, J.A.; Pozo-Vázquez, D.; Santos-Alamillos, F.J.; TovarPescador, J. Evaluation of the WRF model solar irradiance forecasts in Andalusia (southern Spain). Sol. Energy 2012, 86, 2200–2217. [Google Scholar] [CrossRef]
Cheng, X.; Liu, R.; Shen, Y.; Zhu, R.; Peng, J.; Yang, Z.; Xu, H. Improved method of solar radiation simulation under cloudy days with LAPS-WRF model system based on satellite data assimilation. Chin. J. Atmos. Sci. 2014, 38, 577–589. [Google Scholar] [CrossRef]
Wolff, B.; Kühnert, J.; Lorenz, E.; Kramer, O.; Heinemann, D. Comparing support vector regression for PV power forecasting to a physical modeling approach using measurement, numerical weather prediction, and cloud motion data. Sol. Energy 2016, 135, 197–208. [Google Scholar] [CrossRef]
Jimenez, P.A.; Hacker, J.P.; Dudhia, J.; Haupt, S.E.; Ruiz-Arias, J.A.; Gueymard, C.A.; Thompson, G.; Eidhammer, T.; Deng, A. WRF-solar: Description and clear-sky assessment of an augmented NWP model for solar power prediction. Bull. Am. Meteorol. Soc. 2016, 97, 1249–1264. [Google Scholar] [CrossRef]
Jiménez, P.A.; Alessandrini, S.; Haupt, S.E.; Deng, A.; Kosovic, B.; Lee, J.A.; Monache, L.D. The role of unresolved clouds on short-range global horizontal irradiance predictability. Mon. Weather Rev. 2016, 144, 3099–3107. [Google Scholar] [CrossRef]
Lee, J.A.; Haupt, S.E.; Jiménez, P.A.; Rogers, M.A.; Miller, S.D.; McCandless, T.C. Solar irradiance nowcasting case studies near Sacramento. J. Appl. Meteorol. Climatol. 2017, 56, 85–108. [Google Scholar] [CrossRef]
Prasad, A.A.; Kay, M. Assessment of simulated solar irradiance on days of high intermittency using WRF-solar. Energies 2020, 13, 385. [Google Scholar] [CrossRef]
Rodríguez-Benítez, F.J.; Arbizu-Barrena, C.; Huertas-Tato, J.; Aler-Mur, R.; Galván-León, I.; Pozo- Vázquez, D. A short-term solar radiation forecasting system for the Iberian Peninsula. Part 1: Models description and performance assessment. Sol. Energy 2020, 195, 396–412. [Google Scholar] [CrossRef]
Blaga, R.; Sabadus, A.; Stefu, N.; Dughir, C.; Paulescu, M.; Badescu, V. A current perspective on the accuracy of incoming solar energy forecasting. Progress Energy Combust. Sci. 2019, 70, 119–144. [Google Scholar] [CrossRef]
Perez, R.; Kivalov, S.; Schlemmer, J.; Hemker, K.; Renné, D.; Hoff, T.E. Validation of short and medium term operational solar radiation forecasts in the US. Sol. Energy 2010, 84, 2161–2172. [Google Scholar] [CrossRef]
Tang, C.; Morel, B.; Wild, M.; Pohl, B.; Abiodun, B.; Lennard, C.; Bessafi, M. Numerical simulation of surface solar radiation over Southern Africa. Part 2: Projections of regional and global climate models. Clim. Dyn. 2019, 53, 2197–2227. [Google Scholar] [CrossRef]
Islam, S.u.; Rehman, N.; Sheikh, M.M. Future change in the frequency of warm and cold spells over Pakistan simulated by the PRECIS regional climate model. Clim. Chang. 2009, 94, 35–45. [Google Scholar] [CrossRef]
Zhang, M.; Yuan, X.Y.; Zhang, G.; Wang, B.N.; Sun, M.; Haung, L.; Chen, Z.H.; Ge, X.C.; Zhou, X.C. Validation and evaluation of CMA-WSP v2.0 in surface solar radiation forecasting inJiangsu. Chin. J. Meteorol. Res. Appl. 2024, 45, 17–22. [Google Scholar]
Remund, J.; Perez, R.; Lorenz, E. Comparison of Solar Radiation Forecasts for the USA. In Proceedings of the 23rd European Photovoltaic and Solar Energy Conference and Exhibition, Valencia, Spain, 1–5 September 2008. [Google Scholar]
Espinar, B.; Ramires, L.; Drews, A.; Beyer, H.G.; Zarzalejo, L.F.; Polo, J.; Martin, L. Analysis of different comparison parameters applied to solar radiation data from satellite and German radiometric stations. Sol. Energy 2009, 83, 118–125. [Google Scholar] [CrossRef]
Lorenz, E.; Remund, J.; Müller, S.; Traunmüller, W.; Steinmaurer, G.; Ruiz-Arias, J.; Fanego, V.; Ramirez, L.; Romeo, M.; Kurz, C.; et al. Benchmarking of different approaches to forecast solar irradiance. In Proceedings of the 24th European Photovoltaic Solar Energy Conference, Hamburg, Germany, 21–25 September 2009; pp. 4199–4208. [Google Scholar]
Pierro, M.; Bucci, F.; Cornaro, C.; Maggioni, E.; Perotto, A.; Pravettoni, M.; Spada, F. Model output statistics cascade to improve day ahead solar irradiance forecast. Sol. Energy 2015, 117, 99–113. [Google Scholar] [CrossRef]
Laory, I.; Trinh, T.N.; Smith, I.F.C.; Brownjohn, J.M.W. Methodologies for predicting natural frequency variation of a suspension bridge. Eng. Struct. 2014, 80, 211–221. [Google Scholar] [CrossRef]
Colquhoun, J.R. A decision tree method of forecasting thunderstorms, severe thunderstorms and tornadoes. Wea. Forecast. 1987, 2, 337–345. [Google Scholar] [CrossRef]
Seo, B.C. A Data-Driven Approach for Winter Precipitation Classification Using Weather Radar and NWP Data. Atmosphere 2020, 11, 701. [Google Scholar] [CrossRef]
Rebala, G.; Ravi, A.; Churiwala, S. Random Forests. In An Introduction to Machine Learning; Springer: Cham, Switzerland, 2019; pp. 77–94. [Google Scholar] [CrossRef]
Bakker, K.; Whan, K.; Knap, W.; Schmeits, M. Comparison of Statistical Post-Processing Methods for Probabilistic NWP Forecasts of Solar Radiation. Sol. Energy 2019, 191, 138–150. [Google Scholar] [CrossRef]
Buhan, S.; Özkazanç, Y.; Cadirci, I. Wind Pattern Recognition and Reference Wind Mast Data Correlations with NWP for Improved Wind-Electric Power Forecasts. IEEE Trans. Ind. Inform. 2016, 12, 991–1004. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Gala, Y.; Fernández, A.; Díaz, J.; Dorronsoro, J.R. Hybrid machine learning forecasting of solar radiation values. Neurocomputing 2015, 176, 48–59. [Google Scholar] [CrossRef]
Yagli, G.M.; Yang, D.; Srinivasan, D. Automatic Hourly Solar Forecasting Using Machine Learning Models. Renew. Sustain. Energy Rev. 2019, 105, 487–498. [Google Scholar] [CrossRef]
Yazdani, M.G.; Salam, M.A.; Rahman, Q.M. Investigation of the effect of weather conditions on solar radiation in Brunei Darussalam. Int. J. Sustain. Energy 2016, 35, 982–995. [Google Scholar] [CrossRef]
Bakirci, K. Correlations for estimation of daily global solar radiation with hours of bright sunshine in Turkey. Energy 2009, 34, 485–501. [Google Scholar] [CrossRef]
Besharat, F.; Dehghan, A.A.; Faghih, A.R. Empirical models for estimating global solar radiation: A review and case study. Renew. Sustain. Energy Rev. 2013, 21, 798–821. [Google Scholar] [CrossRef]
Bett, P.E.; Thornton, H.E. The climatological relationships between wind and solar energy supply in Britain. Renew. Energy 2016, 87, 96–110. [Google Scholar] [CrossRef]
Zhang, G.; Band, S.S.; Jun, C.; Bateni, S.M.; Chuang, H.M.; Turabieh, H.; Moslehpour, M. Solar radiation estimation in different climates with meteorological variables using Bayesian model averaging and new soft computing models. Energy Rep. 2021, 7, 8973–8996. [Google Scholar] [CrossRef]
Ismail, A.H.; Dawi, E.A.; Almokdad, N.; Abdelkader, A.; Salem, O. Estimation and Comparison of the Clearness Index using Mathematical Models-Case study in the United Arab Emirates. Evergreen 2023, 10, 863–869. [Google Scholar] [CrossRef]
Ismail, H.; Karim, A.A. Prediction of Global Solar Radiation from Sunrise Duration Using Regression Functions. Kuwait J. Sci. 2022, 49, 1–8. [Google Scholar] [CrossRef]
IPCC. Climate Change 1989: The IPCC Scientifie Assessment; Houghton, J.T., Jenkins, G.J., Ephraums, I.J., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 1990; pp. 195–238. [Google Scholar]
IPCC. Climate Change 1991: The IPCC Scientific Assessment; Houghton, J.T., Callander, B.A., Varney, S.K., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 1992; pp. 69–95. [Google Scholar]
IPCC. Climate Change 1995: The Science of Climate Change; Houghton, J.T., Meira Filho, L.G., Callander, B.A., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 1996; p. 572. [Google Scholar]
IPCC. Climate Change 2001: The Scientific Basis; Houghton, J.T., Ding, Y., Griggs, D.I., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2002; p. 881. [Google Scholar]
Sun, Z.; Liu, J.; Zeng, X.; Liang, H. Parameterization of instantaneous global horizontal irradiance: Cloudy-sky components. J. Geophys. Res. 2012, 117, D14202. [Google Scholar] [CrossRef]
He, X.F.; Yuan, C.H.; Yang, Z.B. Performance evaluation of chinese solar radiation forecast based on three global forecast back groud fields. Chin. Acta Energiae Solaris Sin. 2016, 37, 897–904. [Google Scholar]

Figure 1. The percentages of the APE in different ranges (a) and the MAPE (b) for each month during January and December 2022 and during August and November 2023.

Figure 2. Comparison between the forecast (red line) and observed (blue line) solar radiation at one-hour intervals for the months of February (a), July (b), August (c), and September (d) 2022, with the overcast (green dot) and rainy conditions (black dot) at the same time.

Figure 3. The MAPE (a), the percentage of the APE in different ranges (b), the percentage of the BE below and above 0 W/m² (c), and R (d) of the solar radiation predicted by the CMA-WSP v2.0 under different weather conditions, such as sunny, cloudy, overcast, and rainy, during January and December 2022 and during August and November 2023.

Figure 4. Diurnal variation of the MAPE (a), the percentage of the APE in different ranges (b), the percentage of the BE below and above 0 W/m² (c), and R (d) of the solar radiation generated by CMA-WSP v2.0 during January and December 2022 and during August and November 2023. The bolded lines in (d) indicate that the correlation coefficients exceed the 95% significance level.

Figure 5. Diurnal variation of the MAPE (a) and the percentage of the APE below 25% (b) and above 100% (c) of the solar radiation generated by CMA-WSP v2.0 and six ML models during August 2023 and November 2023.

Figure 6. The MAPE (a), the percentage of the APE in different ranges (b), and the R of the solar radiation predicted by the CMA-WSP v2.0 and the six ML models from August 2023 to November 2023. The black and red fonts in (a) indicate the MAPE and its increase compared to CMA-WSP v2.0, respectively. Similarly, the black and red text in (c) denote the R and its increase over CMA-WSP v2.0, respectively.

Figure 7. The MAPE (a), the percentage of the APE in different ranges (b), the R (c), and the percentage of the BE below and above 0 W/m² (d) of the solar radiation generated by the CMA-WSP v2.0 and the six ML models under different weather conditions from August 2023 to November 2023. The black and red text in (a) indicates the MAPE and its increase compared to CMA-WSP v2.0, respectively. Similarly, the black and red text in (c) indicate the R and its increase relative to CMA-WSP v2.0.

Figure 8. Comparison of the MAPE of the solar radiation forecasts produced by the ML models based on Schemes 1, 2, and 3 between 09:00 and 16:00 from August 2023 to November 2023.

Figure 9. The MAPE (a) and the percentages of the APE in different ranges (b) for each month during January and December 2022 and during August and November 2023. The MAPE and the APE are calculated by using the daily total solar radiation.

Table 1. List of meteorological variables forecasted by the CMA-WSP v2.0.

Number	Name	Number	Name
1	Accumulated precipitation	20	Downward shortwave radiation at the surface
2	Accumulated snow and ice	21	Normal shortwave radiation
3	Total cloud cover	22	Direct downward radiation at the surface
4	Low level cloud cover	23	Clear-sky direct downward radiation at the surface
5	Mid level cloud cover	24	Diffuse downward radiation at the surface
6	High level cloud cover	25	Downward longwave radiation at the surface
7	Visibility	26	Surface skin temperature
8	Surface pressure	27	Boundary layer height
9	Topographic height	28	Relative humidity at 2 m
10	Accumulated snows	29	Dew point temperature at 2 m
11	Surface albedo	30	Sea-level pressure
12	Wind speed at 10 m	31	Convective available potential energy
13	Gust wind speed at 10 m	32	Convective inhibition
14	Wind speed at 70 m	33~38	Horizontal wind from 200 to 1000 hPa
15	Wind speed at 80 m	39~44	Vertical wind from 200 to 1000 hPa
16	Wind speed at 100 m	45~50	Vertical wind from 200 to 1000 hPa
17	Wind speed at 120 m	51~55	Temperature from 200 to 1000 hPa
18	Air temperature at 2 m	56~61	Relative humidity from 200 to 1000 hPa
19	Specific humidity at 2 m	62~67	Specific humidity from 200 to 1000 hPa

Note: There are six distinct layers within the pressure range of 200 to 1000 hPa, including 200, 500, 700, 850, 925, and 1000 hPa.

Table 2. Classification standard of weather conditions.

Weather Condition	Cloud Cover (%)	Precipitation (mm)
sunny	[0,30]	0
cloudy	[31,89]	0
overcast	[90,100]	0
rainy	[90,100]	≥0.1

Table 3. Percentages of the negative differences in APE between the ML models based on Scheme 2 (or Scheme 3) and the ML models based on Scheme 1 for different times. The percentages of the negative differences above 50% are heighted by red fonts.

Scheme	Time	RFR (%)	DT (%)	KNN (%)	GBR (%)	AdaBoost (%)	LR (%)
2	07:00	52	32	14	45	63	16
	08:00	51	34	16	53	55	32
	09:00	51	36	14	54	51	41
	10:00	51	37	20	48	51	45
	11:00	51	40	24	49	47	51
	12:00	53	39	27	51	51	55
	13:00	55	40	27	53	49	56
	14:00	55	38	26	55	51	56
	15:00	57	38	21	55	47	60
	16:00	56	37	16	51	46	54
	17:00	50	36	13	49	48	58
	18:00	50	31	10	54	50	74
3	07:00	51	25	0	44	45	21
	08:00	50	36	1	51	52	48
	09:00	51	43	7	48	51	54
	10:00	49	43	29	48	50	50
	11:00	50	39	26	54	49	52
	12:00	54	39	15	54	50	51
	13:00	51	39	8	53	48	57
	14:00	52	39	9	56	54	59
	15:00	56	44	24	56	56	61
	16:00	49	44	24	42	46	47
	17:00	48	37	2	49	51	54
	18:00	49	26	0	53	47	21

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, D.; Shen, Y.; Ye, D.; Yang, Y.; Da, X.; Mo, J. Evaluation of Scikit-Learn Machine Learning Algorithms for Improving CMA-WSP v2.0 Solar Radiation Prediction. Atmosphere 2024, 15, 994. https://doi.org/10.3390/atmos15080994

AMA Style

Wang D, Shen Y, Ye D, Yang Y, Da X, Mo J. Evaluation of Scikit-Learn Machine Learning Algorithms for Improving CMA-WSP v2.0 Solar Radiation Prediction. Atmosphere. 2024; 15(8):994. https://doi.org/10.3390/atmos15080994

Chicago/Turabian Style

Wang, Dan, Yanbo Shen, Dong Ye, Yanchao Yang, Xuanfang Da, and Jingyue Mo. 2024. "Evaluation of Scikit-Learn Machine Learning Algorithms for Improving CMA-WSP v2.0 Solar Radiation Prediction" Atmosphere 15, no. 8: 994. https://doi.org/10.3390/atmos15080994

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluation of Scikit-Learn Machine Learning Algorithms for Improving CMA-WSP v2.0 Solar Radiation Prediction

Abstract

1. Introduction

2. Data

3. Methods

4. Performance Analysis of the CMA-WSP v2.0 Model

5. Performance Analysis of the ML Models

6. Conclusions and Discussion

6.1. Conclusions

6.2. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI