1. Introduction
As global energy demand continues to rise, China is actively exploring the development and application of new energy sources. Vigorously developing renewable energy not only alleviates the energy crisis caused by the shortage of oil and gas resources but also constitutes an important strategy for the construction of China’s modern energy system [
1]. Wind and solar power generation technologies, as mature and technologically advanced renewable energy generation methods in China’s clean energy development, exhibit vast development prospects. However, the inherent strong randomness of wind and solar energy poses significant challenges to existing modern power systems with a high proportion of new energy sources [
2,
3]. Therefore, accurate prediction of new energy output power is crucial for the stable operation of large-scale new energy integration into the grid. Precise prediction of output power is a vital task in enhancing the stability of new energy output [
4]. This technology requires modeling using relevant parameters of new energy generation based on numerical weather forecasting, historical output power data from new energy stations, and weather observation data. The prediction model forecasts the trend of output changes at specific future times, aiding in the arrangement of maintenance schedules, grid dispatching, safety and stability analysis, and the improvement of new energy consumption rates. With the future development of electricity-related technologies, the advancement of distributed power grids, and the maturation of electricity trading markets, predictions of new energy generation power will provide support for emerging grid businesses, such as integrated energy services, clean energy trading, and demand response [
5,
6].
There is a vast amount of research on new energy power prediction models, both domestically and internationally. Commonly used methods for new energy power prediction include physical methods and statistical methods [
7]. Statistical methods are further divided into traditional statistical methods and new statistical methods based on artificial intelligence algorithms. Physical methods are based on the principles of photovoltaic power generation and directly model the physical characteristics of the photovoltaic power generation process. They require detailed geographic information, module parameters, and environmental meteorological data of the photovoltaic power station and have a high degree of dependence on meteorological data and hardware information [
8]. Although physical methods do not rely on a large amount of historical data, the modeling process is complex, the model robustness is not strong, and the prediction accuracy is relatively low [
9]. Statistical methods do not consider physical processes such as changes in solar radiation intensity but rely on a large amount of historical data to identify statistical laws between input and output variables and establish mapping relationships for prediction. Compared with physical methods, statistical methods are simpler to model, and the information is easier to obtain, so current research methods for photovoltaic power prediction mainly focus on statistical methods. Traditional statistical methods include time series methods [
10], fuzzy logic methods [
11], regression analysis methods [
12], and Markov chain methods [
13]. Using time series as the theoretical basis and modeling with historical photovoltaic power data provides a complete theoretical system and strong model interpretability, but the prediction accuracy is not high. In contrast, artificial intelligence models can fully explore the internal characteristics and hidden changing laws of the data, extract high-dimensional complex nonlinear features, and make predictions. Compared with traditional statistical models, they have greatly improved prediction accuracy, stability, and versatility.
With the development of the big data industry on the Internet, machine learning methods have gradually replaced physical and statistical models to achieve better prediction results in the field of wind power and photovoltaic power output prediction. Neural networks and support vector machines are the two most representative machine learning methods, which describe the randomness of new energy generation by establishing a mapping relationship between input and output data. In neural network prediction models, there are further distinctions between traditional neural networks and deep learning networks. In Reference [
14], to improve the accuracy of traditional neural networks, an optimization method using the Sparrow Search Algorithm (SSA) to optimize the thresholds of neural networks is proposed for effective short-term power prediction of photovoltaic power stations in various environments such as sunny, cloudy, and abnormal weather conditions. The results show that the neural network optimized by the sparrow algorithm has better accuracy than the negative gradient method and particle swarm algorithm and can search for the optimal threshold value in a short time. In Reference [
15], to address the issues of abnormal historical data and unstable numerical weather forecast data, a neural network wind power prediction model based on association rules is adopted, using the Apriori algorithm to associate wind power with meteorological data. Experiments demonstrate that the maximum and minimum relative errors are 5.76% and 0.01%, respectively, proving the effectiveness of this method. In Reference [
16], a deep deterministic and recurrent deterministic policy model under the attention mechanism is proposed, unifying historical output data and numerical weather forecast data, adjusting the weights of different component learning to highlight important information, and obtaining the optimal power prediction value. In Reference [
17], a support vector machine optimized by the Particle Swarm Optimization algorithm (PSO-LSSVM) is proposed for predicting day-ahead photovoltaic power. The penalty function and kernel function width in the Least Squares Support Vector Machine can both be obtained through PSO’s global search. Experiments show that PSO-LSSVM has higher accuracy than the Particle Swarm Optimization-Back Propagation (PSO-BP) neural network algorithm in tests with data samples under different meteorological conditions. Reference [
18] combines the Pearson correlation coefficient with the genetic algorithm to optimize and train data in the ELM hybrid model. In the prediction results for typical days of the four seasons, the estimated deviation rate decreased by 19% compared to a single model. In Reference [
19], an ELM prediction model is proposed that decomposes ELM multiple times and reconstructs it for optimization. This method fully leverages the characteristics of fast adaptive learning and effectively fits historical wind power, but due to the limitations of ELM’s single-layer neural network, the learning model cannot delve deeper and cannot extract deep-level features of wind power signals. In Reference [
20], the extreme learning machine is optimized using dynamic safety and elite opposition-based SSA. This method is affected by meteorological factors, resulting in low prediction accuracy and poor generalization ability for photovoltaic power under rainy weather. In Reference [
21], PSO and ELM are combined to predict photovoltaic power output, with PSO dynamically optimizing parameters at different stages.
Despite significant progress in existing research, renewable energy output power prediction faces the following issues:
The prediction error is relatively large, and the accuracy cannot meet requirements. In China’s research models, the prediction accuracy of new energy output power is relatively low, lagging behind other prediction systems, such as transportation. Therefore, meeting the needs of grid dispatch, reducing the impact of new energy volatility on the grid, and improving power prediction accuracy have become difficult research challenges today.
There are numerous factors affecting new energy generation, resulting in a huge amount of related data samples. An excessive sample size will directly interfere with prediction efficiency and reduce prediction accuracy. If outliers in the data samples can be screened before establishing the prediction model, this will significantly improve prediction accuracy and speed. Selecting the best data screening and classification methods for different new energy generation becomes a key issue in new energy prediction research.
To address this challenge, this paper proposes a physical-knowledge integrated model for renewable energy power forecasting. The core innovations of the proposed framework can be summarized as follows:
(1) A hybrid physics-informed prediction architecture is developed by integrating the FCM clustering algorithm for handling missing data and the VMD technique to decompose renewable power generation into high-frequency and low-frequency components. This dual-processing mechanism enables the precise capture of the dynamic fluctuation characteristics inherent in renewable energy outputs.
(2) A collaborative deep learning paradigm is implemented where high-frequency components are fed into a CNN to extract transient features from rapid fluctuations, and low-frequency components are processed through an LSTM network to leverage its long-term memory capacity for capturing temporal dependencies in sequential patterns. This division-of-labor strategy significantly enhances prediction accuracy by synergizing the strengths of both architectures.
3. Decomposition Technique for Renewable Energy Output Sequences Based on the VMD Algorithm
Renewable energy power data exhibit periodicity, fluctuation, and randomness. To reduce the non-stationarity of renewable energy power sequences and uncover internal data patterns, photovoltaic power data need to be decomposed. Commonly used sequence decomposition methods include wavelet decomposition, empirical mode decomposition (EMD), and VMD. The basis function for wavelet transform (WT) is difficult to select and susceptible to white noise. The basis function for EMD can be obtained from the data themselves and is convenient to use, but it is prone to modal aliasing when processing data with large fluctuations, such as photovoltaic power. When modal aliasing occurs, the resulting intrinsic mode functions (IMFs) are meaningless. Additionally, EMD can produce endpoint effects that affect decomposition results. In contrast, VMD not only does not require preset basis functions but also overcomes modal aliasing and has strong noise resistance and good adaptability. VMD also allows for the manual selection of the number of IMF components based on prediction needs. In summary, this paper uses VMD to decompose renewable energy power data and extract patterns from them. As a signal decomposition method analogous to EMD but grounded in variational principles, VMD aims to decompose a signal into multiple IMFs. Each IMF is constrained to a central frequency band. Mathematically, this involves solving a constrained variational problem where the objective is to minimize the sum of bandwidths of all IMFs while ensuring their summation equals the original signal. The solution typically employs Lagrangian multipliers to handle constraints and alternating Direction Method of Multipliers for iterative optimization. Below is a detailed explanation of the relevant theory.
VMD is a non-recursive signal processing method well-suited for handling non-stationary data such as photovoltaic power. Through iteration, this method decomposes photovoltaic power into several IMFs with limited bandwidths, each centered around its respective central frequency. The goal of VMD is to minimize the sum of the bandwidths of all IMFs, with the constraint that the sum of all IMFs equals the original signal. The frequency spectrum of each mode is calculated using the Hilbert transform, and the frequency spectra of each mode are then input into the basis function to obtain their respective central frequencies. Finally, Gaussian smoothness is used to estimate the bandwidth. The specific expression is as follows:
In the formula, represents the set of IMFs obtained through decomposition; represents the set of central frequencies of each IMF; and represents the unit impulse function.
By introducing a quadratic penalty factor
a and a Lagrange multiplier
, the minimization problem is transformed into an unconstrained optimization problem. The augmented Lagrangian function is as follows:
Using the Alternating Direction Method of Multipliers (ADMM), we iteratively update the variables
,
and
until a preset discriminant accuracy is achieved, at which point the iteration stops. Ultimately, we obtain
K IMFs after decomposing the renewable energy power. The expression is as follows:
5. Case Study
In order to further demonstrate the effectiveness and feasibility of the method proposed in this paper, predictions are conducted separately for wind power and photovoltaic power. The wind power generation system is equipped with an XYZ-120k turbine made in China, which can continuously output 120 kilowatts of power at a rated wind speed of 12 m per second. Its 12.5 m diameter impeller forms a sweeping area of 122.7 square meters, coupled with a low cut in wind speed of 3 m per second and a high cut-out wind speed of 25 m per second design, effectively extending the power generation time. The core power generation component adopts a permanent magnet direct drive synchronous generator, and the gearbox-free design significantly reduces mechanical losses. Paired with a yaw system with a response sensitivity of ±3 degrees, it achieves high-precision wind direction tracking of ±0.5 degrees through a dual braking mechanism of a hydraulic brake and electromagnetic lock. The photovoltaic power generation system is based on ABC-300 W high-efficiency modules and adopts PERC monocrystalline silicon technology. Under standard testing conditions (STC: irradiance 1000 watts/square meter, temperature 25 °C, atmospheric quality AM1.5), the conversion efficiency reaches 21.5%. The component has a temperature coefficient of −0.38%/°C and a weak light power generation capacity of 100 watts/square meter, demonstrating excellent environmental adaptability. The component size is 1956 × 992 × 40 mm, equipped with anodized aluminum frame. The electrical energy is converted by a DEF-5 kW inverter made in China, which supports a wide voltage input of 110–500 volts with a maximum conversion efficiency of 97.5%. It is equipped with six independent MPPT tracking and islanding detection mechanisms and has dual-mode communication functions of RS485 and WiFi. The system is equipped with a 60A intelligent controller to achieve ± 1% accuracy in charge and discharge management. It adopts a four-stage charge and discharge algorithm and integrates a 3.5-inch touch display screen. The bracket system is made of 6063-T5 aluminum alloy material and supports stepless tilt angle adjustment from 5 degrees to 60 degrees. It has a wind load resistance of 2.5 kilonewtons per square meter and a grounding resistance strictly controlled below four ohms. The overall protection level has passed IP65 certification, forming a complete and efficient solution from photovoltaic modules to inverter control and bracket installation.
Even if the wind direction changes frequently, the system can still maintain the blades perpendicular to the wind direction, reducing energy loss. In the dataset of this article, more attention is paid to wind speed rather than wind direction. The wind power data are sourced from Spain [
22], while the photovoltaic power data are sourced from Guoneng Corporation [
23], and the relevant data can be downloaded from the aforementioned links. Here, four commonly used methods are selected for comparison in the case study. Specifically, Methods A, B, C, and D represent the Recurrent Neural Network (RNN) [
24], Long Short-Term Memory (LSTM) network [
25], Support Vector Machine (SVM) [
26], and Convolutional Neural Network (CNN) model [
27], respectively. The relevant prediction results are shown in
Figure 4,
Figure 5,
Figure 6,
Figure 7 and
Figure 8 below. Among them, subgraph (a) in
Figure 4,
Figure 5,
Figure 6,
Figure 7 and
Figure 8 reflects the actual data and predicted wind power using different methods, while the symbols in font (b) represent the difference between predicted data and actual data using different methods. The closer the symbol is to 0, the smaller the prediction error. The programs in this article are all written on MATLAB 2019b software, with a computer configuration of Intel (R) Core (TM) i5-7300HQ CPU @ 2.50 GHz 2.50 GHz and 12 GB of memory. It should be noted that this article did not strictly screen the computing environment. In the field of load forecasting, the focus is more on the accuracy of prediction rather than the efficiency of prediction. This article uses deep learning libraries in MATLAB to call programs. In the CNN model, the sequenceInputLayer is used to define the time series input and adapt it to the temporal characteristics of new energy power. Build convolution kernels using convolution2dLayer, and use batch Normalization Layer and reluLayer to accelerate convergence. Use average pooling 2dLayer for downsampling to preserve the main features. Overlay multiple convolutional structures and use the global Average Pooling2dLayer instead of fully connected layers to reduce parameter count and improve generalization ability. Use regressoLayer to output the prediction results of high-frequency components. In the LSTM model, the sequenceInputLayer is also used to process low-frequency component inputs. Capture long-term dependencies through lstmLayer and set OutputMode to last to only output the prediction at the last moment. Add a dropout layer to prevent overfitting. Use Fully ConnectedLayer and Regression Layer to output low-frequency component prediction values.
Observing the above figure, the hybrid model proposed in this paper demonstrates the highest prediction accuracy. The renewable energy power prediction model presented in this paper may outperform traditional models such as RNN, LSTM, SVM, and CNN in terms of prediction accuracy. Instead of relying solely on a single machine learning or deep learning algorithm, our model combines physical knowledge with data-driven methods. This integration enables a more comprehensive capture of the physical characteristics and dynamic behaviors of renewable energy generation (especially wind power), thereby enhancing prediction accuracy. The FCM clustering method is employed to handle missing data, which is generally more effective than simple interpolation or deletion of missing values in dealing with incomplete datasets, preserving more useful information. The VMD algorithm is adopted to decompose the renewable energy power into high-frequency and low-frequency components. This decomposition helps separate different features within the signal, allowing subsequent prediction models to focus more on learning from their respective features, thus improving prediction accuracy. For high-frequency components, the CNN model is selected for prediction due to its advantages in processing high-frequency features in images and time series data, which is capable of capturing local variations and detailed information in signals. For low-frequency components, the LSTM model is chosen for its proficiency in handling long-term dependencies and global features of sequence data, which is particularly important for predicting low-frequency components. By inputting high-frequency and low-frequency components into different models and fusing them at the final stage, this strategy fully leverages the strengths of different models, enhancing the overall prediction accuracy and robustness.
RNNs face challenges when processing long sequences due to parameter sharing and sequence recursion, which can lead to multiple multiplications of gradients during backpropagation, causing the gradient values to gradually decay to zero (gradient vanishing) or grow excessively large (gradient explosion). Gradient vanishing makes it difficult for the network to learn long-distance dependencies, while gradient explosion may result in unstable training processes and even failure to converge. The computational process of RNNs is based on time-step unfolding, with each time step requiring sequential calculation, leading to relatively low computational efficiency. Especially when dealing with long sequences, the computational complexity increases significantly, affecting training and inference speeds. The sequential computation of RNNs is inherently serial, meaning that the computation of each time step depends on the output result of the previous time step. This dependency limits the parallel computing capabilities of RNNs, making them more time-consuming when processing large-scale data.
Although LSTM alleviates the gradient vanishing problem of RNNs to some extent, it still faces challenges. LSTM introduces gating mechanisms to control the flow and forgetting of information, which increases model complexity and training difficulty. This may result in more time-consuming training processes, difficulty in tuning, and an increased risk of overfitting. The numerous parameters in LSTM models require careful adjustment to achieve optimal performance. The selection and optimization of parameters have a significant impact on the model’s prediction results, increasing the complexity of parameter tuning. Training SVM models on large-scale datasets can consume considerable time and computational resources. The time complexity of the SVM algorithm is proportional to the number of training samples, so training time increases significantly when the dataset is very large. SVM involves important parameters such as the regularization parameter C and kernel function parameters, whose selection has a significant impact on model performance. The reasonable selection of parameters and kernel functions is a critical issue for SVM, but it usually requires cross-validation of different parameter combinations, increasing the complexity of parameter tuning. The SVM algorithm is sensitive to missing data, and even a small number of missing features can degrade model performance. In practical applications, many datasets contain missing values, requiring preprocessing of missing data to ensure model accuracy.
The performance of CNNs largely depends on the quantity and quality of training data. If training data are insufficient or noisy, the prediction capabilities of CNNs may be limited. CNNs excel at extracting local features of images but are relatively weaker in extracting global features of time series data. This may result in CNNs performing less well than models such as LSTM, which specialize in processing time series data in tasks such as wind power prediction. The generalization ability of CNNs is affected to some extent by their network structure and parameter settings. If the network structure is too complex or the parameters are improperly set, the model may perform poorly on unseen data.
Unlike wind power forecasting, photovoltaic power forecasting presents greater challenges. Photovoltaic power generation primarily relies on solar radiation, which is influenced by multiple factors such as cloud cover and weather changes. This results in significant intermittency and uncertainty in photovoltaic power generation. In contrast, although wind speed is also affected by various factors, it exhibits relatively better continuity and smoother variations. Photovoltaic power generation is not only influenced by solar radiation but also by a multitude of meteorological factors, including temperature, humidity, and atmospheric pressure. The complex interactions between these factors make photovoltaic power forecasting even more difficult. While wind power forecasting is also impacted by meteorological factors, the types and complexity of these factors are relatively lower. Due to the multiple and intricate factors affecting photovoltaic power generation and the difficulty in accurately describing their interactions, existing forecasting models have certain limitations in terms of adaptability. The significant variations in solar radiation and meteorological conditions across different regions limit the applicability of forecasting models in those areas. In comparison, wind power forecasting models, although affected by geographical differences, demonstrate stronger adaptability. To further illustrate the accuracy of the method proposed in this paper, four typical photovoltaic scenarios were selected for validation. The relevant results are shown in
Figure 9,
Figure 10,
Figure 11,
Figure 12 and
Figure 13 below.
Furthermore, in order to further verify the robustness of the method proposed in this paper against noise interference, this paper introduces disturbed data into the original wind power and photovoltaic power data and calculates the average prediction error rate. The results are presented in
Table 1 below.
From the table above, it is evident that as the proportion of disturbed data increases, the accuracy of each method decreases accordingly. However, the maximum error of the method proposed in this paper remains within 10%. The proposed method combines physical characteristics with machine learning models. By incorporating physical knowledge, the model can better understand the physical processes of renewable energy generation, thereby improving the accuracy and robustness of predictions. This method not only leverages the advantages of data-driven approaches but also integrates the physical mechanisms of renewable energy generation, enabling the model to more accurately capture core features when facing disturbed data. The use of the FCM method to handle missing data effectively fills in the gaps in the data, reducing the impact of incomplete data on prediction results. The VMD algorithm decomposes renewable energy power into high-frequency and low-frequency components, helping the model better capture feature information at different frequencies. Due to the combination of physical knowledge and advanced machine learning algorithms, the proposed method can better filter out noise and extract useful information from raw data containing disturbed data. This makes the model more stable and reliable in practical applications, capable of adapting to different data environments and conditions.
To further visualize the error distributions, this study conducts a comparative analysis of prediction errors between the proposed method and the baseline approach without data preprocessing, as presented in
Table 2 below.
As demonstrated in the table above, after implementing data preprocessing procedures, both long-term and short-term forecasting errors exhibit significant reductions. When data follow a near-normal distribution, mean imputation may yield relatively accurate results. However, in the presence of skewed distributions or outliers, the mean becomes susceptible to distortion from extreme values, potentially leading to substantial discrepancies between imputed and true values. Median imputation demonstrates greater robustness against skewed data and outliers, as it remains unaffected by extreme values. Conversely, when data exhibit high symmetry without outliers, the discrepancy between median and mean may become negligible, rendering median imputation no more advantageous in terms of accuracy. Furthermore, when missingness occurs completely at random, mean or median imputation can produce reasonable results. Yet, under non-random missingness mechanisms (e.g., missingness dependent on unobserved variables or data values themselves), both imputation methods may introduce systematic biases regardless of distribution characteristics. The exclusion of outliers beyond the [μ − 3σ, μ + 3σ] range helps mitigate the distortive effects of extreme values on central tendency measures. Standardization procedures are implemented to eliminate dimensional discrepancies across features, ensuring the comparability of variables with different scales. The error curve is convex during the training and testing process, indicating that the model is learning steadily during the training process and has not fallen into local optima or saddle points. The error can be reduced by 1–2 orders of magnitude, and there is no overfitting phenomenon.
The error used in
Table 1 reflects the size of MAPE. Choosing appropriate evaluation parameters is crucial in the field of new energy forecasting. Different parameters have different emphases. MAE focuses more on the average size of absolute error, RMSE emphasizes the degree of deviation, and MAPE focuses on the percentage form of relative error. In order to facilitate readers’ better understanding, this article further compares the MAE and RMSE indicators under different prediction methods, as shown in
Table 3.
This paper further selected five typical new energy output datasets provided in previous references, as shown below, to verify the feasibility of the proposed method.
Dataset 1: actual wind power data within 30 days from sites located in Gansu Province [
28]
Dataset 2: actual Klim Wind Farm [
29]
Dataset 3: actual load data in a certain region of China [
30]
Dataset 4: real-world solar farms [
31]
Dataset 5: solar power measurements in California [
32]
Observing
Table 3 and
Figure 14, it can be observed that the method proposed in this article performs the best in both MAE and RMSE indicators. In terms of MAE indicators, the method proposed in this article can reduce errors by up to 49.3%; on the RMSE index, the method proposed in this article can reduce errors by up to 42.7%. The smaller the MAE value, the smaller the prediction error of the model and the stronger its predictive ability. The smaller the RMSE value, the higher the prediction accuracy of the model. In five typical scenarios, the method proposed in this article achieved the best performance in four of them, except for a decreasing trend in accuracy in load power prediction scenarios, which is due to the chaotic spatiotemporal distribution characteristics of load power and the insignificant significance of decomposing it into high- and low-frequency signals. In the other four scenarios, the method proposed in this article can improve accuracy by a maximum of 6.4% and an average improvement of 3.5%. RMSE is easy to understand and easy to calculate, sensitive to outliers, and can effectively reflect the precision of measurements. This article combines the advantages of physics knowledge and data-driven models. By combining physical characteristics with the learning ability of data-driven models, it is possible to capture more accurately the changing patterns of new energy power. The high-frequency components are input into a CNN model, which excels at processing data with local correlations and can effectively capture short-term fluctuation features in the high-frequency components. The low-frequency components are input into an LSTM neural network, which is suitable for processing time series data and can capture long-term dependencies to accurately predict long-term trends in the low-frequency components. This hybrid model, which combines CNN and LSTM, can fully utilize both advantages and improve the accuracy of prediction.