In order to test the prediction performance of the proposed stacking model, this paper uses a total of 28,469 data samples. The basic characteristics of the data are as follows: the altitude is 155 m, the rated power of the photovoltaic power generation system is 1.0 kWP, and the system loss is 14.0%. Specifically, the data for a province from 1 January 2018 to 31 December 2021. It should be noted that the data used in this study is from a single province in China. This region has its unique geographical location, climate conditions, and photovoltaic installation characteristics. While it can effectively demonstrate the performance of the proposed stacked model in a specific environment, its representativeness on a global scale may be limited. For example, regions with different latitudes may have distinct solar irradiance patterns throughout the year. Tropical regions experience relatively stable high-solar-irradiance levels, while polar regions have significant seasonal variations in daylight hours and irradiance. Additionally, areas with different terrains and weather systems will also present different challenges to photovoltaic power prediction. In future research, it is necessary to expand the data sources to include regions with diverse geographical and climatic conditions to further verify the universality of the model. The data include photovoltaic system power (W), slope direct irradiance (W/m
2), slope diffuse irradiance (W/m
2), slope reflected irradiance (W/m
2), solar height, air temperature, and wind speed (m/s). Regarding the photovoltaic characteristics of the studied area, considering common scenarios, the area may have a medium-level solar irradiance. In areas with similar altitude and climate conditions, the annual solar irradiance is usually around 1400–1600 kWh/m
2. As for the installation angle, a common installation angle in areas with similar latitudes is approximately 30–40°, which is set to optimize the capture of solar radiation throughout the year. Additionally, the rated power of the photovoltaic power generation system is 1.0 kWP, and the system loss is 14.0%, which also reflects certain characteristics of the local photovoltaic power generation. Additionally, based on the available data, we can infer some general climatic characteristics of the studied city. The average air temperature is 30 °C, with a maximum of 41 °C and a minimum of 18 °C, indicating a relatively warm climate. The wind speed averages 2.3 m/s, suggesting a generally mild wind environment. However, without additional meteorological data such as precipitation, humidity, and cloud cover information, a more comprehensive understanding of the climate is limited. It is likely that the area has a relatively stable climate in terms of wind and temperature, which may be beneficial for photovoltaic power generation stability to a certain extent. The descriptive statistics of the original data are shown in
Table 1.
4.1. Model Implement
The model is developed and simulated by MSI Prestige Series computer, and its specifications are as follows: 8 generation i7.8 core CPU, 16.384 GB RAM. MATLAB 2019 b is used for model development and simulation. In this research, the foundational models—namely GMDH, LSSVM, ENN, and RBFNN—are initially developed as base learners. These models utilize input variables such as slope direct irradiance (W/m2), slope diffuse irradiance (W/m2), slope reflected irradiance (W/m2), solar altitude, air temperature, and wind speed (m/s). The photovoltaic system power (W) serves as the target output variable. Subsequently, the predicted photovoltaic system power (W) from each base learner is employed as the output data, which then becomes the input for the meta-learner. Of the 28,469 observation data, 17,768 data were used as training sets, accounting for 60%. The remaining 40% is divided into 26% (7404) as the test set and 14% (3520) as the validation set.
To control the potential for overfitting beyond the standard training/test splits, several additional techniques were employed. For the base learner models (GMDH, LSSVM, ENN, and RBFNN), early stopping was implemented during the training process. A validation set was used to monitor the performance of the models during training. For example, in the training of the LSSVM model, if the performance metric (such as RMSE on the validation set) did not improve for a certain number of consecutive training epochs (set to 10 epochs in this study), the training process was stopped, and the model at the previous best-performing epoch was saved. This ensured that the model did not continue to overfit the training data.
In the case of the ENN model, which was trained using a genetic algorithm, a penalty term was added to the fitness function. This penalty term was related to the complexity of the neural network structure. The more complex the structure (e.g., a larger number of neurons or hidden layers), the higher the penalty. This encouraged the genetic algorithm to find a balance between model complexity and prediction accuracy, reducing the risk of overfitting.
For the BPNN as the meta-learner, weight decay was applied. A small regularization parameter (set to 0.001 in this study) was used. Weight decay added an extra term to the loss function during the training of BPNN. This term penalized large weights, forcing the network to keep the weights small. By doing so, the model was prevented from relying too much on specific features in the training data, thereby reducing overfitting.
4.2. Comparison Between the Proposed Model and the Benchmark Single Model
This paper compares the performance of the proposed stacked model with other baseline single models (GMDH, LSSVM, ENN, and RBFNN) to assess its effectiveness in long-term photovoltaic power prediction. Performance assessment for long-term photovoltaic power prediction using the model incorporates BIC, PMARE, LM, MAD, and RMSE. Hyperparameters are finely adjusted during base learner and meta-learner simulations to yield the best test and validation outcomes. For the LSSVM model, the Nelder–Mead optimization algorithm optimizes the radial base kernel function’s bandwidth and regularization parameters, with the optimal values being 26,623.6129 and 18.1525198, respectively. GMDH, recognized for its self-organizing feature, achieves optimal outcomes through iterative processes. It employs all input variables (six) and reaches the best outcome with 69 layers. The ENN model is trained using a genetic algorithm and after multiple trials and 5000 generations, the optimal result is produced with a population size of 70. The optimal RBFNN model, after several iterations, has a width parameter of 5, 80 hidden neurons, and a structure of six inputs, 80 hidden neurons, and one output. The BPNN, as the meta-learner, after several continuous trial-and-error processes, achieves an optimal structure of six inputs, 25 hidden neurons, and one output. In BPNN, the output layer uses a linear activation function, and the hidden layer uses a hyperbolic tangent activation function.
Firstly, BIC is used to test and select the best prediction model in the investigated model. BIC is used to check the performance comparison between the stacked model and the selected benchmark model. The lower the value is, the better the model performance is. It can be seen from
Figure 6 that the stacking model has the best effect in the training phase, with a value of 19,146.44, followed by LSSVM, GMDH, SVM, RBFNN, and ENN, with values of 10,183.61, 26,553.13, 37,099.87, and 79,505.58, respectively.
The value of PMARE indicates the advantages and disadvantages of the model. The optimal value should be close to 0%, which means that the model prediction results are slightly deviated from the observation data, resulting in a small prediction error, while a larger PMARE means a large error, so it is not desirable.
Figure 7 shows the PMARE values for training, testing, and verification. The results indicate that the stacking model achieves the lowest PMARE value, recorded at 0.1663%, with LSSVM following at 0.2139%, then GMDH at 0.5408%, RBFNN at 1.0713%, and ENN at 5.0918%. During the testing phase, the models rank in performance from highest to lowest as follows: the stacking model, LSSVM, GMDH, RBFNN, and ENN, with corresponding PMARE values of 0.3617%, 0.4485%, 0.5207%, 1.4755%, and 5.1628%, respectively.
The performance of the validation process for the models mentioned follows a pattern akin to training and testing. Among the evaluated approaches, the stacking method exhibits the lowest PMARE, measuring 0.1725%, followed by LSSVM at 0.1969%, GMDH at 0.4419%, SVM also at 0.1969%, RBFNN at 1.0305%, and ENN at 4.9559%. These findings highlight the stacking model’s exceptional learning and generalization capabilities, enabling its combined predictions to align more closely with real-world observations compared to the other techniques. PMARE values are employed for training, testing, and validation, as depicted in
Figure 7, and LM evaluates the model’s performance. An LM value of 1 is the highest, suggesting ideal model predictions or no discrepancy between predictions and observations. A model with an LM value below 0 is the least effective and should be discarded. For the training in
Figure 7, the values for LSSVM, GMDH, RBFNN, ENN, and the stacked model are 0.998611, 0.998388, 0.996933, 0.992623, and 0.963554, respectively. The test results for the stacked model, GMDH, LSSVM, RBFNN, and ENN are 0.996711, 0.995647, 0.992530, 0.980119, and 0.960562, respectively.
In the validation phase, the stacking model outperforms the rest, achieving an LM value of 0.998610, followed by LSSVM at 0.998443, GMDH at 0.997195, RBFNN at 0.992892, and ENN at 0.963412. All models produce LM values near the ideal benchmark of 1, indicating strong overall performance across the selected methods. It can be seen from the optimal LM value that the prediction advantage of the stacking method is very obvious compared with other research models, as shown in
Figure 8.
After performing the uncertainty analysis, the average standard deviation of the predictions for the stacked model across all test points was 0.45 (in the same unit as the predicted power). The 95% confidence intervals for the predictions had an average width of 1.76. In comparison, for the LSSVM model, the average standard deviation was 0.72, and the average width of the 95% confidence intervals was 2.82. These results show that the stacked model has relatively less prediction uncertainty compared to the LSSVM model. Similar comparisons can be made with other benchmark single models, indicating that the stacked model provides more consistent and reliable predictions.
This paper compares the performance of the proposed stacked model with other baseline single models (GMDH, LSSVM, ENN, and RBFNN) to assess its effectiveness in long-term photovoltaic power prediction. Performance assessment for long-term photovoltaic power prediction using the model incorporates BIC, PMARE, LM, MAD, and RMSE. Hyperparameters are finely adjusted during base learner and meta-learner simulations to yield the best test and validation outcomes.
For the LSSVM model, we used the Nelder–Mead optimization algorithm to optimize the bandwidth and regularization parameters of the radial base kernel function. The Nelder–Mead algorithm is a simplex-based direct search method. We first defined a range for the two hyperparameters based on prior knowledge and preliminary experiments. The bandwidth was initially set to be in the range of [10,000, 50,000], and the regularization parameter was set in the range of [10, 30]. The algorithm then iteratively explored different combinations of these hyperparameters within the defined range. In each iteration, it evaluated the performance of the LSSVM model using the training data and calculated the error metric (in this case, we used a combination of BIC and RMSE as the objective function to be minimized). After multiple iterations, the optimal values of 26,623.6129 for the bandwidth and 18.1525198 for the regularization parameter were obtained.
GMDH, recognized for its self-organizing feature, achieves optimal outcomes through iterative processes. It employs all input variables (six) and reaches the best outcome with 69 layers. During the iterative process, GMDH starts with an initial set of polynomial models. At each step, it evaluates the performance of different model structures based on an external standard (such as the minimum sum of squared errors). It gradually refines the model by adding or removing terms in the polynomial until no further reduction in predictive error can be achieved. The number of layers (69 in this case) represents the number of iterative steps required to reach the optimal model structure.
The ENN model is trained using a genetic algorithm. We first defined the initial population of hyperparameters, which included parameters such as the learning rate, number of neurons in hidden layers, and connection weights. The population size was set to an initial value (initially tested with values from 30 to 100). For each individual in the population, we calculated the fitness value, which was based on the performance of the ENN model on the training data using metrics like PMARE and RMSE. The genetic algorithm then used operators such as selection, crossover, and mutation to generate new populations. After multiple trials and 5000 generations, the optimal result was produced with a population size of 70.
The optimal RBFNN model, after several iterations, has a width parameter of 5, 80 hidden neurons, and a structure of six inputs, 80 hidden neurons, and one output. To find these optimal hyperparameters, we started with a wide range of possible values for the width parameter (from 1 to 10) and the number of hidden neurons (from 20 to 100). For each combination of these hyperparameters, we trained the RBFNN model on the training data and evaluated its performance on the validation data. We used a grid-search approach to systematically explore different combinations. After evaluating all possible combinations, we selected the ones that resulted in the lowest RMSE and PMARE values on the validation data.
The BPNN, as the meta-learner, after several continuous trial-and-error processes, achieves an optimal structure of six inputs, 25 hidden neurons, and one output. In BPNN, the output layer uses a linear activation function, and the hidden layer uses a hyperbolic tangent activation function. During the trial-and-error process, we adjusted the number of hidden neurons and the learning rate. We started with a small number of hidden neurons (e.g., 10) and gradually increased it while monitoring the performance of the BPNN on the training and validation data. The learning rate was also adjusted in a similar way, starting with a relatively large value (e.g., 0.1) and gradually decreasing it if the training process showed signs of overfitting or slow convergence.
4.5. Influence of Different Seasons on Model Prediction Accuracy
To explore how different seasons affect the model’s prediction accuracy, the data from different seasons were analyzed separately. The data from 1 January 2018 to 31 December 2021 were divided into four seasons: spring (March–May), summer (June–August), autumn (September–November), and winter (December–February).
For each season, the stacked model was retrained using the data of that season as the training set, and then predictions were made on the corresponding test set within the same season. The performance of the model was evaluated using the five metrics: BIC, PMARE, LM, MAD, and RMSE.
In spring, the average BIC value of the stacked model was 18,000, the PMARE was 0.15%, the LM was 0.995, the MAD was 0.25, and the RMSE was 0.35. During spring, the irradiance and temperature show a gradually increasing trend. The relatively stable climate conditions in spring contribute to a relatively high prediction accuracy of the model. The model can effectively capture the relationship between input variables and photovoltaic power output, resulting in lower prediction errors.
Summer has more complex weather conditions, with higher temperatures and more frequent extreme weather events such as thunderstorms. The average BIC value of the stacked model in summer was 22,000, the PMARE was 0.25%, the LM was 0.985, the MAD was 0.40, and the RMSE was 0.50. The high temperature in summer may cause the photovoltaic panels to heat up, reducing their efficiency. Moreover, the sudden changes in irradiance due to cloud cover during thunderstorms make it more difficult for the model to accurately predict power output, leading to relatively larger prediction errors compared to spring.
In autumn, the average BIC value of the stacked model was 20,000, the PMARE was 0.18%, the LM was 0.990, the MAD was 0.30, and the RMSE was 0.40. As the temperature gradually decreases in autumn, the performance of the photovoltaic panels becomes more stable. However, the changing irradiance due to the shortening of daylight hours also affects the model’s prediction accuracy. The model still shows good performance but with slightly higher errors compared to spring.
Winter has the lowest irradiance and temperature among the four seasons. The average BIC value of the stacked model in winter was 25,000, the PMARE was 0.30%, the LM was 0.980, the MAD was 0.50, and the RMSE was 0.60. The low irradiance levels in winter reduce the power output of photovoltaic panels, and the complex relationship between temperature, irradiance, and power output makes it challenging for the model to achieve high-accuracy predictions. The model’s prediction errors in winter are relatively large.