Next Article in Journal
Understanding Potential Cyber-Armies in Elections: A Study of Taiwan
Previous Article in Journal
Psychological Capital Protects Social Workers from Burnout and Secondary Traumatic Stress
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Daily Photovoltaic Power Generation Forecasting Model Based on Random Forest Algorithm for North China in Winter

Department of Economics and Management, North China Electric Power University, Baoding 071003, China
*
Author to whom correspondence should be addressed.
Sustainability 2020, 12(6), 2247; https://doi.org/10.3390/su12062247
Submission received: 17 February 2020 / Revised: 6 March 2020 / Accepted: 12 March 2020 / Published: 13 March 2020
(This article belongs to the Section Energy Sustainability)

Abstract

:
North China is one of the country’s most important socio-economic centers, but its severe air pollution is a huge concern. In this region, precisely forecasting the daily photovoltaic power generation in winter is essential to improve equipment utilization rate and mitigate effects of power system on the environment. Considering the climatic characteristics of North China, the winter days are divided into three classifications. A forecasting model based on random forest algorithm is then designed for each classification. To evaluate its performance, the proposed model and three other methods are separately used to forecast the daily power generation at the Zhonghe PV station, which is located in the center of North China. Empirical results show that, because of its ability to reduce the risk of overfitting by balancing decision trees, the proposed model obtains mean absolute percentage errors as low as 2.83% and 3.89% for clear and cloudy days, respectively. For days in which weather conditions are unusual, forecasting errors are relatively large. On these days, enlarging training samples, performing subdivision, and imposing manual intervention can improve the forecasting precision. Generally, the proposed model is better than the other three methods for nearly all error evaluation indicators in each classification.

1. Introduction

North China, the geographical boundary of the area shown in Figure 2 [1], is one of the country’s most important socio-economic centers. In recent years, this region has accumulated over one quarter of China’s population and has produced a similar share of the country’s GDP [2].
In this region, vast amounts of electric power are consumed to meet the requirements of socio-economic development. In 2017, electricity consumption had reached 1500.5 billion kWh, an increase by 70 billion kWh from 2016 [3]. To ensure the electric power supply, North China has built large-scale thermal power plants in the past several decades [4]. Given its fragile environment, the operation of thermal power plants further deteriorates the prevailing air pollution. Among China’s 169 large- and medium-sized cities, the top 20 with the worst air conditions are all located in North China, and 24%–38% of different pollutants in this region are emitted from thermal power plants [5,6]. During winter, poor diffusion conditions of air pollutants, high power load, and the lack of other clean power sources significantly increase the effects of thermal power plants on air quality [7]. It is both necessary to promote the development of photovoltaic (PV) power generation and improve the utilization efficiency of power that are generated by solar PV in North China.
North China has abundant solar resources. Its annual solar horizontal radiation is higher than 1050 kWh/m2 [8]. With the advancement of technology and the decrease in electricity generation cost, this region has recently constructed many photovoltaic power plants to mitigate the air pollution. Following the development plan released by the National Energy Bureau of China, the installed capacity of PV power generation in this region is expected to reach 49 million kilowatts in 2020, which is four times that of 2015 [9].
The power generation of PV stations is considerably unstable, and specifically, determined by weather conditions to a great extent [10]. For the dispatch department of the grid company, the next day power generation of PV stations must be learned to plan the operation of the power system. Inaccurate prediction of daily PV power generation leads to low utilization rate of equipment. China has published a target of maintaining the utilization rate of PV equipment above 95% in the 2020s [11]. Achieving this target requires the support of an advanced forecasting method that is suitable for North China. Due to the worse air pollution in winter, when the PV power is more significant in this period than during other days, this research focuses on the daily generation forecasting for North China in this season.
At present, several methods of PV electricity generation forecasting have been proposed. According to time span, previous studies can be divided into: (1) very short-term—seconds, minutes, or hours ahead [12], (2) short-term—one or several days ahead [13], and (3) mid- and long-term—one or several weeks, months, or years ahead forecasting [14]. The present research belongs to the short-term forecasting. In addition, most studies follow one of two ideas, namely, indirect and direct predictions. The former establishes solar radiation forecast model and then converts the radiation data to PV power values, whereas the latter directly forecasts the PV electricity generation using historical and weather data. Solar radiation is closely related to the running state of the PV system, while the daily electricity generation is affected by various other factors; as such, the indirect prediction is usually more suitable than the direct method for forecasting the power output of PV panels [15,16]. The present research concentrates on the electricity generation forecasting and then does not adopt the indirect prediction.
When forecasting the daily PV power generation using the direct prediction, trend extrapolation methods, e.g., time series methods [17] and hybrid trend model [18], we can forecast according to the development trend of historical data. However, without considering weather conditions, these methods have limited accuracy, though the forecasting process is relatively simple. Artificial intelligence methods, e.g., artificial neural network [19] and genetic algorithm [20], can fit the relationship between weather conditions and PV power generation, but need large samples for model training. Otherwise, the generalization capacity degrades due to overfitting. In North China, the climate is changeable in winter. As a result, the sample number in each weather type is commonly limited to meet the requirements of the above intelligent methods. In recent years, tree-based methods began to be used for solar PV prediction. Random forest (RF) is a newly developed collection and forecasting method that can reduce the risk of overfitting by balancing decision trees [21,22]. Compared with other machine learning methods, RF presents a pronounced advantage in handling small samples. Relevant researches have proved that RF has effective capability in both solar radiation forecasting and PV power forecasting. Huang et al. [23] predicted the daily solar radiation of four sites in Australia, and show that the RF is the most accurate model. Another example of RF application in solar radiation prediction is detailed in the research of Liu et al. [24]. According to Ahmad et al. [25], two tree-based ensemble methods, including extra trees and RF, were performed marginally better than SVR in hourly PV output prediction. The authors of Ref. [26] indicate that RF and gradient boosting regression give the best results in forecasting solar power in GEF2014 competition. Zamo et al. [27] undertake a comparison of several forecasts based on statistical techniques to forecast hourly PV electricity production at some power plants in mainland France. The results have shown that RF has a superior performance. In a recent article by Kim et al. [28], a two-step solar power generation prediction model was proposed. On comparing the different machine learning methods, RF was also proved to be the most accurate model.
However, few of these previous studies made in-depth analysis on the selection of influence factors, especially in the complex climate background. On the basis of climatic characteristics in North China, the present research classifies the winter days in this region to different weather types and then constructs the RF model to forecast the corresponding daily PV power generation.
The contributions of this paper are as follows:
  • Topic: In this paper, a daily PV power generation forecasting model for North China in winter is proposed. The proposed forecasting model is based on the random forest algorithm and can obtain satisfactory forecasting results using small samples. The results of this study provide a reference to the sustainable development of PV generation in this area.
  • Influence factor selection: The unique winter climatic characteristics of North China were considered. In consideration of the serious air pollution in winter, “PM2.5” is especially selected as one of the influence factors.
  • Weather classification: To ensure the operation of the model, the weather classification analysis is used to fix the varied weather. By combining weather types with similar characteristics, the problem of balance between category and sample number was solved.
  • Methodology: The application of RF for solar PV systems, as most of the previous researches are focused on trend extrapolation methods, artificial intelligence methods or support vector machines.
The remainder of this paper is organized as follows. Section 2 introduces the factors affecting PV power generation and weather classification. The principles of the method used for forecasting are described in Section 3. A case study of Zhonghe PV power station and result discussions are provided in Section 4. Conclusions and perspectives are summarized in the last section.

2. Influence Factors and Weather Classification

2.1. Influence Factor Selection

As mentioned in Section 1, weather conditions affect the electricity power generated by PV stations to a large extent. Involving these key factors into the forecasting model is essential to improve the prediction precision. When selecting key factors, the technical requirements of PV stations and the unique climatic characteristics of North China in winter require consideration.
In this research, six factors are involved in the forecasting model. (1) Total solar radiation. Solar radiation is the sum of direct and diffuse radiation and is the base of PV power generation. On this basis, this research takes total solar radiation as a selected factor. (2) Mean atmospheric pressure and (3) Wind speed. North China is located in the southeast end of Eurasia; winter in this area is usually from November to January, weather conditions are significantly affected by the monsoon [29]. The winter monsoon, which is caused by the gradient between the Siberian high and Aleutian low pressures, is the most important atmospheric circulation in the Northern Hemisphere [30]. This research uses the daily mean atmospheric pressure and wind speed to describe the influence of the winter monsoon. (4) Mean temperature. The winter monsoon brings strong cold air from Siberia to North China, and thus, the winter temperature in this region is generally lower and changeable than that in other areas of the same latitude [1]. Hence, the daily mean temperature is included into the forecasting model. (5) Relative humidity. Most areas of North China belong to temperate and semi-humid zones. During winter, different levels of fog commonly appear due to vapor in the air. In this research, the daily mean of relative humidity is considered in the forecasting model. (6) PM2.5 concentration. As mentioned in the preceding sections, air pollution further deteriorates in North China in the winter. Air pollutants can affect the generation of PV power stations on the ground. PM 2.5 concentration has considerable influence on the loss of power generation. This relationship has a coincidence level of over 90% [31]. In North China, PM2.5 is monitored once an hour in winter. This research excludes the nighttime monitored results and only considers the daytime mean values of PM2.5 concentration as a variable in the forecasting model.

2.2. Weather Classification

Dividing the observations into several classes according to weather characteristics and then building the prediction model for each classification is essential to improving the forecasting precision. However, a large classification number does not necessarily indicate an improvement, but rather means a small amount of information in training samples, thereby shortening the training time. However, small samples inevitably weaken the generalization ability of the forecasting model. As a result, observation classification should balance the classification and the sample numbers.
Ways to describe the weather state are varied. The China Meteorological Administration classifies the weather state into more than 30 types [32]. According to historical observation data of major cities in North China, seven main types of weather occur in winter: sunny, cloudy, overcast, rainy, snowy, fog and haze. Considering the sunshine conditions and the number of samples of these different weather types, we combined the cloudy, overcast, fog and haze into one class and treated rainy or snowy days as another. The main weather type, the one that lasts the longest in a day, is used as the classification criteria. In this manner, all observations are divided into three categories: clear, cloudy, and rainy or snowy days.

3. RF Forecasting Model

3.1. RF Algorithm

RF is a machine learning algorithm proposed by Breiman in 2001 [33]. As an ensemble method, RF improves learning performance with a voting system given a set number of decision trees. RF exhibits the characteristics of (1) random feature selection, (2) bootstrap sampling, (3) out-of-bag error estimation, and (4) full depth decision tree growing [34]. These features make random forest suitable for PV power generation prediction. PV power generation is easily affected by environmental factors, the data series usually contain a lot of noise. These noises may reduce the generalization ability of the model. After inputting data samples, RF model will first extract some of the samples by bootstrap sampling, and then randomly select the features of these samples. These two steps of random sampling make RF more tolerant to outliers and noise, and reduce the possibility of over fitting. Besides, due to the special winter climate conditions in North China, there are few samples of rainy and snowy days. However, RF can be applied to data sets of various sizes.
In this research, the RF regression algorithm is used when establishing the forecast model. The main steps to set up a RF algorithm are as follows:
  • Select samples by Bootstrap method [35] and regard them as a training set.
  • Grow an initial tree in the set.
  • Calculate the best node split of the initial tree according to its features.
  • Split the nodes until the samples belong to the same class.
  • Aggregate all trees into a forest, and then consider the mean value of the results given by each tree as the final prediction of the forest.
In addition, model building inevitably involves the determination of model parameters. The RF regression model has several main parameters, namely, number of estimators, criterion index, and max features. The functions of each parameter are as follows:
  • Number of estimators or the number of trees in the forest.
  • Criterion index measures the quality of the split. Alternatives include the mean absolute error (MAE) or mean square error (MSE) criteria.
  • Max features. A function is chosen to select the best number of features when searching for the best node. Three options are available: (1) original value, corresponding to the function auto, (2) square root of original value, corresponding to sqrt, and (3) the logarithm of original value, corresponding to log2.
This research selects the optimal values of the above parameters by comparing the performance of different parameter values. In addition, this research adopts the default values in the Sklearn module, which is written using the Python programming language [36].

3.2. Forecasting Process

Figure 1 shows the process of forecasting the daily PV power generation.
To build a forecast model, the collected raw data require initial cleaning and filtering. In other words, fragmentary data—mainly referring to days for technical breakdown, and evidently outliers—mainly referring to statistic errors, must be amended. For preprocessed observations, the classification should be performed in accordance with their weather conditions. In this research, for each classification, a RF model was used for PV power generation. After training, the forecasting results are obtained by inputting the weather conditions of the target days into the model. When evaluating the performance of the RF model, the forecasting results for the three classifications are considered comprehensively.

3.3. Performance Evaluation Indicators

To understand the performance of the forecasting model, error analysis indicators are necessary. This research uses MAE, mean absolute percentage error (MAPE), root mean square error (RMSE), and explained variance (EV).
MAE represents the mean of the absolute errors and is used to reflect the real situation of the forecast error. MAE is calculated as:
MAE = 1 n i = 1 n | y i ^ y i | ,
where y i ^ is the prediction and yi is the real value. The range of values for MAE is 0 to +∞. Evidently, as the result of MAE increases, the size of error likewise increases.
MAPE is a straightforward measure of the prediction accuracy of a forecasting method, and is thus usually considered as the fairest indicator [37]. On this basis, MAPE is usually considered as the most important precision evaluation indicator. Dividing each error by the real value to provide an average result in percentage, MAPE is defined by:
MAPE = 1 n i = 1 n | y i ^ y i y i | · 100 % ,
RMSE is another common indicator used in the evaluation of the accuracy of the forecasting model. RMSE is the square root of MSE and is more sensitive to large errors than MAPE. The formula is:
RMSE = 1 n i = 1 n ( y i ^ y i ) 2 ,
EV measures the part of the variation in a given data set that can be explained by a model. EV is calculated as:
EV = 1 V a r ( y i y i ^ ) Var ( y i ) ,
where Var is the variance of a sequence of values.
The largest value of EV is 1, which represents the best prediction.

4. Model Application and Evaluation

4.1. Model Application

In this research, the power generation from Zhonghe PV station is used to test the above forecasting method. The central position of the station is 114.32°E, 37.45°N, in Xingtai city, Hebei province, the center of North China, see Figure 2.
In addition to the daily power generation, meteorological data are also needed in the forecasting model. These data are selected from the Xingtai Meteorological Bureau in Hebei province.
Following the process shown in Figure 1, before the raw data being classified and introducing into the forecasting model, these data are preprocessed by cleaning and filtering to mend fragmentary information and eliminate statistical errors. In particular, data on PM2.5 and total solar radiation are preprocessed. In the sequence of PM2.5, approximately 10% of data was missing due to observation equipment failure and other reasons. To fill the missing values, linear interpolation method is adopted in this research. Regarding the total solar radiation, despite the possibility of automatic solar radiation recording, erroneous data remain because of the measuring instruments [38]. To replace problematic data, alternate values are obtained through different measures: (1) analysis of data in similar days; (2) calculation of mean values of observations in adjacent areas; and (3) linear interpolation.
Finally, data from November to December 2016 and from January to November in 2017 are set as the training samples. The testing samples are composed of data from 1 November to 31 December 2018. Value distribution ranges of different indicators in the training and testing samples are shown in Table 1.
As shown in Table 1, the training and testing samples have similar distribution range for most indicators. Thus, the RF model obtained by the training samples can be used for the forecasting analysis of the testing samples.
The next step is to classify the samples in both training set and test set. According to Section 2.2, both of the two data sets are divided into three categories: clear, cloudy, rainy or snowy days. The data in the same category of training set and test set are corresponding to each other. For instance, the data under the “clear days” classification in the test set will be input into the RF model trained by the data under the same classification in training set.
After classification, RF models will be established for each weather category. To ensure the good performance of the proposed model, ascertaining the optimal parameters is necessary after inputting the training samples into the corresponding model. As mentioned in Section 3.1, the proposed model can automatically search for the optimal parameters when providing a rough range of the number of trees. On the basis of the sample size, this research limited the tree number to the range of 100–1000. The final results of optimal parameters under different weather classification are shown in Table 2. After the above steps are completed, the influencing factors in the training set were input into the corresponding RF model, and then the prediction results can be obtained.

4.2. Forecasting Results Analysis

As mentioned in Section 4.1, the RF models are used to forecast the daily power generation of the Zhonghe PV station from 1 November to 31 December 2018. This period comprises 61 days. However, 11 days are abandoned due to the serious loss of weather records and equipment maintenance, leaving only 50 days. To facilitate the following analysis, each day was assigned a code. These results are listed in Table 3.
To understand the forecasting results, a line chart is used to demonstrate the forecasting performance, and a histogram is used to show the forecasting percentage errors. The line chart and histogram are shown in Figure 3a,b, respectively.
Figure 3a shows that affected by weather conditions, the real power generation of the PV station extremely changes. Generally, forecasting results can always be close to the real values. Hence, the proposed model can be applied to this problem.
Figure 3b shows three evidently largest forecasting errors, coded as 5, 14, and 46. Figure 3a shows that these days are all extremely rainy or snowy with the least power generation. As these weather conditions seldom appear, such training samples are limited and thereby produce the largest forecasting errors. In fact, rainy or snowy days are rarer than other weather types in North China, and observations in this classification vary greatly. As a result, the forecasting error of rainy or snowy days is evidently larger than that of other weather classifications. Furthermore, enlarging training samples and performing subdivision can improve the forecasting precision of this classification.
Figure 3b shows that in addition to the rainy or snowy days, there are still some other days which have large forecasting errors. The forecasting errors of the 11th, 25th, 30th, and several other days are larger than their neighbors. In accordance with Table 3, these days are all classification conversion days, e.g., the first day of continuous cloudy days. Given that several weather indicators are close to its neighbor classification during these days, the trained RF model in this research cannot perfectly handle such a situation and therefore, cause large forecasting errors. To improve forecasting precision of these days, manual intervention is necessary. Specifically, the forecasting results of these days should be adjusted in accordance with its neighbor classifications.
According to statistics, errors of photovoltaic power generation prediction using random forest method are within the range of 8.5% to 12.3% [39]. Except for the three days that had been explained, the prediction errors of the random forest model established in this paper are almost all less than this range.

4.3. Performance Evaluation

To evaluate the performance of the proposed model, the support vector regression (SVR) with Linear kernel, elastic net (EN), and gradient boosting decision tree (GBDT) models are also used to forecast the power generation of the Zhonghe PV station from 1 November to 31 December 2018. The training samples and their classifications are the same as those of the proposed model. The forecasting results of these three methods are listed in the Appendix A. Figure 4 shows the comparison of the different methods for each classification.
Figure 4 shows that for clear and cloudy days, which have large samples, the forecasting results of RF, SVR, EN, and GBDT are relatively close. However, for rainy or snowy days with small samples, the forecasting performance of SVR and EN presents higher instability than other models. Several results, e.g., 41st, 43rd, and 46th days, have large errors. To evaluate the performance of the above models, the MAE, MAPE, RMSE, and EV are calculated in each classification and listed in Table 4.
By calculating error indicators, as shown in Table 4, except EV slightly smaller than SVR in the clear days, the RF-based method proposed in this research performs the best for all indicators in all classifications. Especially for the MAPE in the clear and cloudy days, the superiority of the proposed model is especially distinct.

5. Conclusion

Better forecasting the daily PV power generation in winter is essential to improving the equipment utilization rate and mitigate effects of power system on the environment in North China. Given that RF algorithms can lower the risk of overfitting by balancing decision trees in small samples, the winter days are divided into three classifications and an RF-based daily PV power generation forecasting model is built for each classification.
The proposed model and three other methods are separately used to forecast the daily power generation from 1 November to 31 December 2018 at the Zhonghe PV station, which is located in the center of North China. Empirical results show the following conclusions: (1) The values of MAPE, which is usually considered as the most important precision evaluation indicator, of the proposed model for clear and cloudy days are as low as 2.83% and 3.89%, respectively. (2) During the rainy or snowy days, which rarely appear in this region, the forecasting errors of the proposed model are relatively larger than those during other weather conditions. In those days, the forecasting precision can be improved by enlarging training samples and performing subdivision. (3) For classification conversion days, manual intervention based on neighbor classifications can further improve the forecasting precision. (4) The proposed model is better than the other three methods for nearly all error evaluation indicators in each classification.

Author Contributions

The authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 71471061, and the Fundamental Research Funds for the Central Universities, grant number 2017MS171.

Conflicts of Interest

There are no conflicting interests needing to be clarified.

Appendix A

Table A1. Training samples.
Table A1. Training samples.
DateClassification 1Temperature
(°C)
Atmospheric Pressure
(kPa)
Relative Humidity
(%)
Wind Speed
(m/s)
PM2.5
(μg/m3)
Radiant Exposure
(0.01 MJ/m2)
Daily Power Generation
(MWh)
Nov.2.2016Clear6.1101.34851.21101251233.72
Nov.3.2016Clear7.4100.53901.81291120197.05
Nov.5.2016Clear10.299.92902.41271164208.24
Nov.7.2016Cloudy7.3101.56911.310942476.66
Nov.8.2016Clear6.1102.11771.4811376260.91
Nov.9.2016Cloudy6.1101.52813138577108.90
Nov.10.2016Clear8.1100.25622.1511185231.03
Nov.11.2016Clear8.299.91651.41121118210.87
Nov.12.2016Clear8100.6772115934154.30
Nov.14.2016Cloudy7100.75851.367583108.29
Nov.15.2016Cloudy6101.15731.5161993151.00
Nov.16.2016Cloudy7100.66691.3141861148.59
Nov.17.2016Cloudy6.7100.93791.413252690.14
Nov.19.2016Cloudy9100.44841.115742983.41
Nov.20.2016R or S6.5100.899027423018.55
Nov.21.2016R or S0.9101.47954.643545.98
Nov.23.2016Clear−1.7102.22711.3112910191.26
Nov.24.2016Clear−0.7101.58762166974201.23
Nov.25.2016Clear1.1101.14811.8242848161.25
Nov.26.2016Clear6.6100.78482.21601048210.21
Nov.27.2016Clear4.6101.28582.2108998199.54
Nov.28.2016Cloudy2.7101.85621.4143796147.18
Nov.29.2016Cloudy3.5101.9701.422031350.09
Nov.30.2016Clear4.8101.18721.42221026196.39
Dec.1.2016Clear4.3101.83591.8162906185.11
Dec.2.2016Clear3.3101.58701.1194885180.53
Dec.3.2016Cloudy4.4100.71811.4276621116.64
Dec.4.2016Cloudy3.5100.29801.4260712122.56
Dec.5.2016Cloudy4.4101.31722.99956389.90
Dec.6.2016Clear2.6100.95751.5133947199.79
Dec.7.2016Clear5.7100.9757274894183.91
Dec.8.2016Cloudy5.2100.35532.352791146.21
Dec.10.2016Cloudy−0.5101.53721.179666112.29
Dec.12.2016R or S3.8100.96861.915437258.66
Dec.15.2016Clear−0.2101.8775286860179.06
Dec.16.2016Clear0101.37771.21361091199.70
Dec.17.2016Cloudy0100.78811.1235677120.06
Dec.18.2016Cloudy0.1101.04830.8225635105.17
Dec.19.2016Cloudy0101.14871.4212645114.64
Dec.20.2016R or S−1.7101.45951.218537653.67
Dec.21.2016Cloudy2.8101.22961.121721230.85
Dec.22.2016Clear5.7100.97612.6451145229.62
Dec.23.2016Cloudy0.1101.667283608120.08
Dec.24.2016Cloudy−0.3101.58801.48238259.82
Dec.25.2016R or S0.9101.47901.56721717.42
Dec.27.2016Clear−1.7102.07892.3182902179.27
Dec.28.2016Cloudy−2.1101.87891.8259684114.12
Dec.29.2016Clear−2.4102.14772.289899179.25
Dec.30.2016Cloudy−2101.54851.1122646118.11
Dec.31.2016Cloudy−1.8101.3941.3108771131.67
Jan.1.2017R or S−2.1101.1992.49217525.66
Jan.2.2017Clear−0.6101.25911.8128891176.30
Jan.3.2017Cloudy−1.7100.94991.110937061.80
Jan.4.2017Cloudy−1.5101.14991.614432934.55
Jan.5.2017R or S0.1101.53962.9247868.57
Jan.6.2017R or S2.3101.33921.225028530.22
Jan.7.2017R or S1.9100.97951.9140757.01
Jan.8.2017Cloudy0.1100.94982.39845595.71
Jan.9.2017Cloudy0.2101.26831.5143824153.38
Jan.10.2017Cloudy−0.6101.52681.7127786154.87
Jan.11.2017Clear0.7101.2732.1241854185.42
Jan.12.2017Clear3.6100.73402.291968223.92
Jan.13.2017Clear0.4101.07511.7114867175.80
Jan.14.2017Cloudy−2.3101.75651.4138720122.26
Jan.15.2017Cloudy−2.5101.81641.517953190.27
Jan.16.2017Cloudy−2.3101.66742.321157269.70
Jan.17.2017Cloudy−2.5101.64771.3275575114.43
Jan.18.2017R or S−2.2101.89821.135612618.41
Jan.19.2017Clear−1.2101.62533.72151186229.56
Jan.20.2017Clear−4102382.81161013213.61
Jan.21.2017Clear−2.7101.57451.4190975194.99
Jan.22.2017Clear−3.9102.08411.4109900180.35
Jan.23.2017Cloudy−3.7101.99551.7264776140.14
Jan.24.2017Cloudy−2.2102.21452251679118.29
Jan.25.2017Cloudy−2.5101.78621.4324682116.38
Jan.26.2017Cloudy0.3100.92641.540956586.72
Jan.28.2017Cloudy0.7100.66471.134941364.13
Jan.29.2017Cloudy−0.1101.3641319672096.10
Jan.30.2017Clear−4.1102.35371.6641080226.64
Jan.31.2017Cloudy0.2101.53311.6139798146.80
Nov.2.2017Clear13.2100.31681.91191187227.75
Nov.5.2017Clear9.4100.92751.7611119202.59
Nov.6.2017Clear10.2100.44800.71511023173.67
Nov.7.2017Clear14.7100.59572.8651248245.59
Nov.8.2017Clear10.9101562.594918157.07
Nov.10.2017Clear12.3101.09262.7201209254.84
Nov.11.2017Clear5.6101.22501.9441185239.73
Nov.12.2017Cloudy5.8100.36661.210060573.47
Nov.14.2017Clear4.6101.07381.4381213245.78
Nov.15.2017Clear3.1101.08501.6481168224.78
Nov.16.2017Clear3.3100.75521.295844176.55
Nov.17.2017Cloudy4101.16482.371832151.65
Nov.18.2017Clear−0.5102371.9571179243.87
Nov.19.2017Cloudy−0.3101.43521.312843666.00
Nov.20.2017Clear3.3101.4492.41431113213.70
Nov.21.2017Clear5.6100.78572.61041085199.50
Nov.22.2017Clear8.1101.24184.1161194232.54
Nov.23.2017Clear3.1101.17391.3601119217.04
Nov.24.2017Clear2.2101.24392.6401078216.00
Nov.25.2017Clear5100.55423.1681063204.74
Nov.26.2017Clear3101.52442.364995187.32
Nov.27.2017Cloudy1.2100.7602122655119.84
Nov.28.2017Clear3.1100.94552.1133843165.24
Nov.29.2017Cloudy−0.4102.07311.53146773.71
1 R and S refers to rainy or snowy days.
Table A2. Test samples.
Table A2. Test samples.
DateClassification 1Temperature
(°C)
Atmospheric Pressure
(kPa)
Relative Humidity
(%)
Wind Speed
(m/s)
PM2.5
(μg/m3)
Radiant Exposure
(0.01 MJ/m2)
Daily Power Generation (MWh)
Nov.1.2018Clear10.4101.36641.6120834162.03
Nov.2.2018Clear11.1101.15691.9131822160.79
Nov.3.2018Clear12.4100.79652.976995189.85
Nov.4.2018R or S11.9100.92753.116447161.23
Nov.5.2018R or S7.9101.69782.33535440.19
Nov.6.2018R or S6.5101.83721.14342557.88
Nov.7.2018Cloudy7.8101.44581.651781149.91
Nov.8.2018Clear9.4100.44662.963993190.32
Nov.9.2018Clear8.5100.54671.9881135217.15
Nov.11.2018Clear8.4101.11761.3751045194.27
Nov.12.2018Clear7.4100.9821.61491037183.15
Nov.13.2018Cloudy6.1100.79931.4193760127.50
Nov.14.2018Cloudy8.7100.89931.6167581100.38
Nov.15.2018R or S8.2101.22852.221022728.82
Nov.16.2018Cloudy6.1101.75521.932789152.87
Nov.17.2018Cloudy6.5101.22692.278828157.07
Nov.18.2018Clear8.5100.93451.641921175.82
Nov.19.2018Clear6100.78542651154220.41
Nov.20.2018Cloudy6.1100.87631.9141859157.33
Nov.21.2018Clear6.6101.4512.6881187231.79
Nov.22.2018Clear3.8101.31541.7511195233.79
Nov.23.2018Cloudy4100.71572.5150828156.58
Nov.24.2018Clear3.7101.09681.9153854168.88
Nov.25.2018Clear5100.78671.7179970191.59
Nov.26.2018Cloudy4.4100.61802353769126.30
Nov.27.2018Cloudy9.4101.14392.1118722115.87
Nov.28.2018Cloudy5.1101452127779134.24
Nov.29.2018Cloudy3.5101.49621.6106867155.80
Nov.30.2018Cloudy2.9101.21801.1133581106.35
Dec.1.2018R or S3.2101.17851.516745461.33
Dec.2.2018R or S5.2100.42941.219423628.82
Dec.3.2018R or S5.2100.95731.715148360.99
Dec.4.2018Clear2101.64422.1551001194.41
Dec.6.2018Cloudy−1.4101.98822.494610115.10
Dec.7.2018Clear−5.4102.6154244891175.14
Dec.8.2018Clear−6.7102.7411.546928185.19
Dec.9.2018Cloudy−4.2102.17441.568734123.02
Dec.10.2018Cloudy−4.4102.05561.1129629117.17
Dec.13.2018Cloudy−1.7102.16441.895858158.11
Dec.18.2018Clear2.7100.53461.498879177.99
Dec.20.2018R or S0.9100.85561.612656556.21
Dec.21.2018Cloudy4.3100.98451.6113629117.28
Dec.22.2018R or S4.4101.37431.414658557.68
Dec.24.2018Clear−3.7101.24561.384848168.05
Dec.25.2018Cloudy−0.2101.06451.5159600103.45
Dec.27.2018R or S−6.7102.63762.64919112.78
Dec.28.2018Cloudy−8.4102.98581.993784141.91
Dec.29.2018Clear−7.7102.91521.733865173.72
Dec.30.2018Clear−7.9102.91501.965912182.20
Dec.31.2018Cloudy−5.9102.75582106694132.52
1 R and S refers to rainy or snowy days.
Table A3. Forecasting results of three other methods.
Table A3. Forecasting results of three other methods.
CodeDateReal ValueForecasting Result 1
SVRENGBDT
1Nov.01162.03171.01160.33159.45
2Nov.02160.79168.14155.81154.38
3Nov.03189.85198.04187.03185.39
4Nov.0461.2367.7460.7358.08
5Nov.0540.1947.8441.9257.86
6Nov.0657.8858.8353.3756.02
7Nov.07149.91142.84137.60142.75
8Nov.08190.32197.99188.93184.76
9Nov.09217.15220.35214.86218.87
10Nov.11194.27204.90196.73200.91
11Nov.12183.15201.07193.50178.80
12Nov.13127.50135.87134.44131.76
13Nov.14100.38102.53101.12106.45
14Nov.1528.8231.3022.5218.52
15Nov.16152.87144.70138.88147.24
16Nov.17157.07151.33146.71151.88
17Nov.18175.82189.52183.71165.61
18Nov.19220.41224.09222.86225.36
19Nov.20157.33155.34150.72148.71
20Nov.21231.79230.87230.98236.91
21Nov.22233.79232.78234.45235.77
22Nov.23156.58148.98143.68149.35
23Nov.24168.88173.14168.10176.83
24Nov.25191.59191.44187.63199.46
25Nov.26126.30132.78131.55136.04
26Nov.27115.87129.29122.90101.47
27Nov.28134.24139.97134.02144.52
28Nov.29155.80157.74153.28150.61
29Nov.30106.35102.99100.75103.71
30Dec.0161.3365.2662.9356.51
31Dec.0228.8232.3924.3224.50
32Dec.0360.9969.4467.7158.08
33Dec.04194.41202.75204.88200.24
34Dec.06115.10109.56107.01119.16
35Dec.07175.14183.88188.88185.28
36Dec.08185.19191.53200.60188.96
37Dec.09123.02132.78127.39122.34
38Dec.10117.17111.43107.87109.55
39Dec.13158.11155.71150.20148.44
40Dec.18177.99181.35180.21177.09
41Dec.2056.2181.6788.5257.07
42Dec.21117.28111.78106.13118.41
43Dec.2257.6885.0689.1356.02
44Dec.24168.05175.61177.95179.12
45Dec.25103.45104.8899.89105.10
46Dec.2712.7823.4033.4318.12
47Dec.28141.91141.93138.20148.29
48Dec.29173.72180.26187.17185.04
49Dec.30182.20187.36196.01189.67
50Dec.31132.52124.47120.80120.92
1 The unit of forecasting result is MWh.

References

  1. Zhao, J. The Physical Geography of China (in Chinese), 3rd ed.; Higher Education Press: Beijing, China, 1995; p. 203. [Google Scholar]
  2. National Bureau of Statistics. China Statistical Yearbook 2019 (in Chinese); China Statistics Press: Beijing, China, 2019; pp. 34–35, 69–71.
  3. China Electricity Council. Annual Development Report of China’s Power Industry 2018 (in Chinese); China Electricity Council: Beijing, China, 2018. [Google Scholar]
  4. Pei, Z.; Wang, C.; He, Q.; Wang, Y.; Fan, G. Analysis and suggestions on renewable energy integration problems in China (in Chinese). Electr. Power 2016, 11, 1–7. [Google Scholar] [CrossRef]
  5. Ministry of Ecology and Environment. Bulletin of China’s Ecological Environment (in Chinese). Available online: http://www.mee.gov.cn/hjzl/zghjzkgb/lnzghjzkgb/201905/P020190619587632630618.pdf (accessed on 10 January 2020).
  6. Wang, L.; Li, P.; Yu, S.; Mehmood, K.; Li, Z. Predicted impact of thermal power generation emission control measures in the Beijing-Tianjin-Hebei region on air pollution over Beijing, China. Sci. Rep. 2018, 8, 934. [Google Scholar] [CrossRef] [PubMed]
  7. Chen, Q.; Sun, F.; Xu, Y. Does winter heating cause smog? Evidence from a city panel in North China (in Chinese). Nankai Econ. Stud. 2017, 4, 25–40. [Google Scholar] [CrossRef]
  8. China Meteorological Administration. Annual Bulletin of China’s Wind and Solar Energy Resources in 2018 (in Chinese); China Meteorological Administration: Beijing, China, 2019.
  9. National Energy Bureau of China. The 13th Five Year Plan for Solar Energy Development (in Chinese); National Energy Bureau of China: Beijing, China. Available online: http://zfxxgk.nea.gov.cn/auto87/201612/t20161216_2358.htm (accessed on 10 January 2020).
  10. Jin, D.; Olama, M.M.; Kuruganti, T.; Melin, A.M.; Djouadi, S.M.; Zhang, Y.; Xue, Y. Novel stochastic methods to predict short-term solar radiation and photovoltaic power. Renew. Energy 2020, 145, 333–346. [Google Scholar] [CrossRef]
  11. National Development and Reform Commission of the People’s Republic of China. National Energy Bureau of China. Clean Energy Consumption Plan 2018–2020 (in Chinese). Available online: http://zjb.nea.gov.cn/article/zygg/d1/201812/3022.htm (accessed on 10 January 2020).
  12. Kushwaha, V.; Pindoriya, N.M. A SARIMA-RVFL hybrid model assisted by wavelet decomposition for very short-term solar PV power generation forecast. Renew. Energy 2019, 140, 124–139. [Google Scholar] [CrossRef]
  13. Yang, M.; Meng, L. Short-term photovoltaic power dynamic weighted combination forecasting based on least squares method. Ieej Trans. Electr. Electron. Eng. 2019, 14, 1739–1746. [Google Scholar] [CrossRef]
  14. Alanazi, M.; Alanazi, A.; Khodaei, A. Long-term solar generation forecasting. In Proceedings of the IEEE/PES Transmission and Distribution Conference and Exposition (T&D), Dallas, TX, USA, 3–5 May 2016. [Google Scholar] [CrossRef]
  15. Ostrometzky, J.; Bernstein, A.; Zussman, G. Irradiance field reconstruction from partial observability of solar radiation. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1698–1702. [Google Scholar] [CrossRef]
  16. Larson, D.P.; Nonnenmacher, L.; Coimbra, C.F.M. Day-ahead forecasting of solar power output from photovoltaic plants in the American Southwest. Renew. Energy 2016, 91, 11–20. [Google Scholar] [CrossRef]
  17. Yang, Z.; Zhu, F.; Zhang, C.; Ge, L.; Yuan, X. Photovoltaic power generation short-term power forecasting based on adaptive fuzzy time sequence method (in Chinese). J. Nanjing Inst. Technol. (Nat. Sci. Ed.) 2014, 1, 6–13. [Google Scholar] [CrossRef]
  18. Meng, M.; Niu, D.; Shang, W. A small-sample hybrid model for forecasting energy-related CO2 emissions. Energy 2014, 64, 673–677. [Google Scholar] [CrossRef]
  19. Chaouachi, A.; Kamel, R.M.; Nagasaka, K. Neural network ensemble-based solar power generation short-term forecasting. J. Adv. Comput. Intell. Intell. Inform. 2010, 14, 69–75. [Google Scholar] [CrossRef] [Green Version]
  20. Yang, Z.; Cao, Y.; Xiu, J. Power generation forecasting model for photovoltaic array based on generic algorithm and BP neural network. In Proceedings of the 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems, Shenzhen, China, 27–29 November 2014. [Google Scholar] [CrossRef]
  21. Behrens, C.; Pierdzioch, C.; Risse, M. Testing the optimality of inflation forecasts under flexible loss with random forests. Econ. Model. 2018, 72, 270–277. [Google Scholar] [CrossRef]
  22. Lei, C.; Deng, J.; Cao, K.; Ma, L.; Xiao, Y.; Ren, L. A random forest approach for predicting coal spontaneous combustion. Fuel 2018, 223, 63–73. [Google Scholar] [CrossRef]
  23. Huang, J.; Troccoli, A.; Coppin, P. An analytical comparison of four approaches to modelling the daily variability of solar irradiance using meteorological records. Renew. Energy 2014, 72, 195–202. [Google Scholar] [CrossRef]
  24. Liu, J.; Cao, M.; Gao, Z.; Xu, K. A solar radiation prediction model based on random forest (in Chinese). Control. Eng. China 2017, 24, 2472–2477. [Google Scholar] [CrossRef]
  25. Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Tree-based ensemble methods for predicting PV power generation and their comparison with support vector regression. Energy 2018, 164, 465–474. [Google Scholar] [CrossRef]
  26. Mohammed, A.A.; Yaqub, W.; Aung, Z. Probabilistic Forecasting of Solar Power: An Ensemble Learning Approach. Smart Innovation, Systems and Technologies; Neves-Silva, R., Jain, L., Howlett, R., Eds.; Springer: Cham, Switzerland, 2015; Volume 39, pp. 449–458. ISBN 978-3-319-19856-9. [Google Scholar]
  27. Zamo, M.; Mestre, O.; Arbogast, P.; Pannekoucke, O. A benchmark of statistical regression methods for short-term forecasting of photovoltaic electricity production, part I: Deterministic forecast of hourly production. Sol. Energy 2014, 105, 792–803. [Google Scholar] [CrossRef]
  28. Kim, S.G.; Jung, J.Y.; Sim, M.K. A two-step approach to solar power generation prediction based on weather data using machine learning. Sustainability 2019, 11, 1501. [Google Scholar] [CrossRef] [Green Version]
  29. Chernokulsky, A.; Mokhov, I.I.; Nikitina, N. Winter cloudiness variability over Northern Eurasia related to the Siberian High during 1966–2010. Environ. Res. Lett. 2013, 8, 045012. [Google Scholar] [CrossRef] [Green Version]
  30. Chang, C.; Lu, M. Intraseasonal predictability of Siberian High and East Asian Winter Monsoon and Its Interdecadal Variability (in Chinese). J. Clim. 2012, 25, 1773–1778. [Google Scholar] [CrossRef]
  31. Ren, J.; Chen, T.; Xu, Z.; Wu, C.; Zhao, C. Mechanism analysis and experimental study on the influence of haze on photovoltaic power generation (in Chinese). Res. Explor. Lab. 2019, 38, 42–46. [Google Scholar] [CrossRef] [Green Version]
  32. China Meteorological Administration. Public Meteorological Service—Weather Graphic Symbols (in Chinese); China Meteorological Administration: Beijing, China, 2017.
  33. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  34. Jiang, R.; Tang, W.; Wu, X.; Fu, W. A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinform. 2009, 10, S65. [Google Scholar] [CrossRef] [Green Version]
  35. Feng, W.; Dauphin, G.; Huang, W.; Quan, Y.; Liao, W. New margin-based subsampling iterative technique in modified random forests for classification. Knowl. Based Syst. 2019, 182, 104845. [Google Scholar] [CrossRef]
  36. Scikit-learn, Random Forest Regressor. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html (accessed on 2 December 2019).
  37. Paulescu, M.; Paulescu, E. Short-term forecasting of solar irradiance. Renew. Energy 2019, 143, 985–994. [Google Scholar] [CrossRef]
  38. Zarzo, M.; Martí, P. Modeling the variability of solar radiation data among weather stations by means of principal components analysis. Appl. Energy 2011, 88, 2775–2784. [Google Scholar] [CrossRef]
  39. Cui, Y.; Sun, Y.; Chang, Z. A review of short-term solar photovoltaic power generation prediction methods (in Chinese). Resour. Sci. 2013, 35, 1474–1481. [Google Scholar]
Figure 1. Forecasting the daily PV power generation.
Figure 1. Forecasting the daily PV power generation.
Sustainability 12 02247 g001
Figure 2. Geographical boundary of North China and the position of Zhonghe PV station in Xingtai city, Hebei province.
Figure 2. Geographical boundary of North China and the position of Zhonghe PV station in Xingtai city, Hebei province.
Sustainability 12 02247 g002
Figure 3. Forecasting performance and errors.
Figure 3. Forecasting performance and errors.
Sustainability 12 02247 g003
Figure 4. Forecasting result comparison of different methods.
Figure 4. Forecasting result comparison of different methods.
Sustainability 12 02247 g004
Table 1. Value distribution ranges of different indicators in the training and testing samples.
Table 1. Value distribution ranges of different indicators in the training and testing samples.
Data SetIndicatorRange
Training samplesTemperature (°C)−4.1–14.7
Atmospheric pressure (kPa)99.91–102.35
Relative humidity (%)18–99
Wind speed (m/s)0.7–4.6
PM2.5 (μg/m3)16–409.15
Total solar radiation (0.01 MJ/m2) 154–1376
Testing samplesTemperature (°C)−8.4–12.4
Atmospheric pressure (kPa)100.42–102.98
Relative humidity (%)39–94
Wind speed (m/s)1.1–3.1
PM2.5 (μg/m3)31.67–353.33
Total solar radiation (0.01 MJ/m2) 1191–1195
1 Total solar radiation is a cumulative value and the rest are daily mean values.
Table 2. Optimal parameters of RF regression models for three weather classifications.
Table 2. Optimal parameters of RF regression models for three weather classifications.
Weather ClassificationN EstimatorsCriterionMax Features
Clear days900MAEauto
Cloudy days800MAEauto
Rainy or snowy days500MSEauto
Table 3. Code, date, classification, real value, and forecasting result of power generation of the Zhonghe PV station.
Table 3. Code, date, classification, real value, and forecasting result of power generation of the Zhonghe PV station.
Code123456789
DateNov.1Nov.2Nov.3Nov.4Nov.5Nov.6Nov.7Nov.8Nov.9
Classification 1ClearClearClearR or SR or SR or SCloudyClearClear
Real value 2162.03160.79189.8561.2340.1957.88149.91190.32217.15
Forecasting result 2170.98167.27191.6857.8657.9156.21144.53190.85215.87
Code101112131415161718
DateNov.11Nov.12Nov.13Nov.14Nov.15Nov.16Nov.17Nov.18Nov.19
Classification 1ClearClearCloudyCloudyR or SCloudyCloudyClearClear
Real value 2194.27183.15127.50100.3828.82152.87157.07175.82220.41
Forecasting result 2202.29197.10132.65105.5918.53149.32151.11169.59218.26
Code192021222324252627
DateNov.20Nov.21Nov.22Nov.23Nov.24Nov.25Nov.26Nov.27Nov.28
Classification 1CloudyClearClearCloudyClearClearCloudyCloudyCloudy
Real value 2157.33231.79233.79156.58168.88191.59126.30115.87134.24
Forecasting result 2149.50233.37233.30149.43175.86197.06136.88109.43144.84
Code282930313233343536
DateNov.29Nov.30Dec.1Dec.2Dec.3Dec.4Dec.6Dec.7Dec.8
Classification 1CloudyCloudyR or SR or SR or SClearCloudyClearClear
Real value 2155.80106.3561.3328.8260.99194.41115.10175.14185.19
Forecasting result 2150.18104.6757.0629.5057.86202.94113.08182.08185.39
Code373839404142434445
DateDec.9Dec.10Dec.13Dec.18Dec.20Dec.21Dec.22Dec.24Dec.25
Classification 1CloudyCloudyCloudyClearR or SCloudyR or SClearCloudy
Real value 2123.02117.17158.11177.9956.21117.2857.68168.05103.45
Forecasting result 2115.28115.71148.22175.6856.99115.2256.21175.99101.67
Code4647484950
DateDec.27Dec.28Dec.29Dec.30Dec.31
Classification 1R or SCloudyClearClearCloudy
Real value 212.78141.91173.72182.20132.52
Forecasting result 217.42139.03181.70185.84134.14
1 R and S refers to rainy or snowy days; 2 The unit of power generation is MWh.
Table 4. MAE, MAPE, RMSE, and EV of each forecasting model.
Table 4. MAE, MAPE, RMSE, and EV of each forecasting model.
WeatherAlgorithmMAE (MWh)MAPE (%)RMSE(MWh)EV
ClearRF5.072.836.230.95
SVR6.693.697.910.95
EN6.083.367.910.90
GBDT6.023.306.730.91
CloudyRF5.233.896.020.91
SVR5.524.266.390.89
EN7.215.398.220.88
GBDT6.464.877.350.87
Rainy or snowyRF4.8014.296.980.83
SVR9.7024.8413.110.72
EN11.0333.7616.100.30
GBDT5.2916.197.170.82

Share and Cite

MDPI and ACS Style

Meng, M.; Song, C. Daily Photovoltaic Power Generation Forecasting Model Based on Random Forest Algorithm for North China in Winter. Sustainability 2020, 12, 2247. https://doi.org/10.3390/su12062247

AMA Style

Meng M, Song C. Daily Photovoltaic Power Generation Forecasting Model Based on Random Forest Algorithm for North China in Winter. Sustainability. 2020; 12(6):2247. https://doi.org/10.3390/su12062247

Chicago/Turabian Style

Meng, Ming, and Chenge Song. 2020. "Daily Photovoltaic Power Generation Forecasting Model Based on Random Forest Algorithm for North China in Winter" Sustainability 12, no. 6: 2247. https://doi.org/10.3390/su12062247

APA Style

Meng, M., & Song, C. (2020). Daily Photovoltaic Power Generation Forecasting Model Based on Random Forest Algorithm for North China in Winter. Sustainability, 12(6), 2247. https://doi.org/10.3390/su12062247

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop