Photovoltaic Power Prediction Technology Based on Multi-Source Feature Fusion

Zhou, Xia; Zhang, Xize; Dai, Jianfeng; Zhang, Tengfei

doi:10.3390/sym17030414

Open AccessArticle

Photovoltaic Power Prediction Technology Based on Multi-Source Feature Fusion

by

Xia Zhou

¹,

Xize Zhang

^2,*,

Jianfeng Dai

^2,† and

Tengfei Zhang

^2,†

¹

Carbon Neutralization Advanced Technology Research Institute, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

²

School of Automation and Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Symmetry 2025, 17(3), 414; https://doi.org/10.3390/sym17030414

Submission received: 27 January 2025 / Revised: 28 February 2025 / Accepted: 6 March 2025 / Published: 10 March 2025

(This article belongs to the Special Issue Symmetry and Asymmetry in Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

With the increase in photovoltaic installed capacity year by year, accurate photovoltaic power prediction is of great significance for photovoltaic grid-connected operation and scheduling planning. In order to improve the prediction accuracy, this paper proposes a photovoltaic power prediction combination model based on Pearson Correlation Coefficient (PCC), Complete Ensemble Empirical Mode Decomposition (CEEMDAN), K-means clustering, Variational Mode Decomposition (VMD), Convolutional Neural Network (CNN), and Bidirectional Long Short-Term Memory (BiLSTM). By making full use of the symmetric structure of the BiLSTM algorithm, one part is used to process the data sequence in order, and the other part is used to process the data sequence in reverse order. It captures the characteristics of sequence data by simultaneously processing a ‘symmetric’ information. Firstly, the historical photovoltaic data are preprocessed, and the correlation analysis of meteorological factors is carried out by PCC, and the high correlation factors are extracted to obtain the multivariate time series feature matrix of meteorological factors. Then, the historical photovoltaic power data are decomposed into multiple intrinsic modes and a residual component at one time by CEEMDAN. The high-frequency components are clustered by K-means combined with sample entropy, and the high-frequency components are decomposed and refined by VMD to form a multi-scale characteristic mode matrix. Finally, the obtained features are input into the CNN–BiLSTM model for the final photovoltaic power prediction results. After experimental verification, compared with the traditional single-mode decomposition algorithm (such as CEEMDAN–BiLSTM, VMD–BiLSTM), the combined prediction method proposed reduces MAE by more than 0.016 and RMSE by more than 0.017, which shows excellent accuracy and stability.

Keywords:

photovoltaic; bimodal decomposition; BiLSTM; power prediction; CNN

1. Introduction

In recent years, in response to the ‘double carbon’ target, the installed capacity of photovoltaics has been increasing year by year. However, the photovoltaic output is greatly affected by meteorological factors and has strong instability, which brings great challenges to large-scale photovoltaic grid connection. Therefore, in order to ensure the safe and stable operation of the power system, it is particularly important to improve the stability and accuracy of photovoltaic power prediction [1]. Photovoltaic power prediction is classified according to the prediction time range, including ultra-short-term prediction, short-term prediction and medium and long-term prediction. According to the classification of prediction models, there are physical models, statistical models, artificial intelligence models and hybrid models [2,3,4,5,6,7]. With the wide application and rapid development of deep learning algorithms, the use of historical meteorological data and photovoltaic data have become the main means of photovoltaic power prediction. For a long time, experts and scholars at home and abroad have conducted a lot of research on this and have proposed various prediction methods based on deep learning. These include the Long Short-Term Memory network (LSTM), Artificial Neural Network (ANN), least squares support vector machine (LSSVM) and so on [8,9,10].

The historical data of photovoltaic power generation have the characteristics of strong randomness, weak law and a complex fluctuation mode, which leads to the difficulty and low accuracy of prediction. In order to enhance the regularity of historical data and eliminate redundant variables, the data processing method of meteorological variable screening is proposed. In the literature [11], CEEMDAN was used to decompose the original photovoltaic power data to reduce data volatility, and two combined models of Temporal Convolutional Network–Bidirectional Gated Recurrent Unit (TCN–BiGRU) and TCN–BiLSTM were constructed for training, respectively. ElasticNet was introduced, and L1 and L2 regularization terms were used for prediction. However, historical meteorological factors were not considered, and the prediction reliability was insufficient in the face of some significant weather conditions. In Reference [12], a linear regression analysis model was established by using the LSTM and random forest model, considering meteorological factors, such as temperature, humidity, cumulative radiation and precipitation, which improved the universality and accuracy of prediction. In the literature [13], combined with variational mode decomposition, sparrow search algorithm SSA and the long short-term memory network LSTM, VMD is used to decompose the historical photovoltaic power time series. The LSTM optimized by SSA is used to predict and superimpose each sequence, which improves the prediction accuracy. However, it lacks the use of historical meteorological data and its practical application is limited. The CEEMDAN–CNN–LSTM prediction model was established in Reference [14], and many meteorological factors, such as horizontal radiation, temperature and wind speed were considered. The prediction effect was good, but the modal decomposition part and the underlying algorithm could be further optimized. In terms of processing signal sequences, CEEMDAN reduces data nonlinearity and nonstationarity by decomposing power time series in photovoltaic prediction and improves prediction accuracy and stability. Combined with models such as LSTM, it can effectively deal with the intermittency and volatility of photovoltaic power [15,16,17,18,19]. References [20,21,22,23,24,25] used VMD to decompose time series in photovoltaic prediction and combined BiLSTM and DELM algorithms for photovoltaic prediction. However, in the face of complex weather conditions or improper parameter selection, the prediction error is greater than that of conventional weather. Reference [26] proposes a deep learning method based on a one-dimensional convolutional neural network for fault detection and classification of grid-connected photovoltaic systems. The research results in data monitoring, feature extraction, model robustness, and real-time response have important reference values for photovoltaic output prediction. In reference [27], the short-term prediction model of distributed photovoltaic power is optimized by considering the sensitivity of meteorological data, which significantly improves the prediction accuracy. Although there are some limitations, the research provides new ideas and methods for power prediction of distributed photovoltaic systems and has important practical significance for power grid dispatching and energy management.

Considering the advantages and disadvantages of the above literature, this paper proposes a photovoltaic power prediction combination model combining Pearson correlation coefficient (PCC), complete ensemble empirical mode decomposition CEEMDAN, K-means clustering, variational mode decomposition VMD, CNN and BiLSTM. The PCC extracts high correlation meteorological features, CEEMDDAN and VMD achieve double decomposition, and the CNN–BiLSTM prediction model realizes the feature extraction of input variables and prediction of photovoltaic power. Finally, a series of experimental analyses and comparisons are carried out to verify the effectiveness of the model in photovoltaic power prediction.

2. The Overall Framework of Prediction Process

The prediction process of the CEEMDAN–VMD–CNN–BiLSTM prediction method proposed in this paper is shown in Figure 1, which is mainly divided into three parts.

(1): Data preprocessing of the original data, including abnormal data detection, missing value filling and normalization. The correlation analysis of various meteorological factors is carried out by the Pearson correlation coefficient method, and the influence characteristics of multi-photovoltaic power output of various meteorological factors are studied and analyzed. At the same time, important meteorological factors are selected as the key feature input prediction model.
(2): The CEEMDAN decomposition data are decomposed into multiple intrinsic mode functions and a residual component. The K-means algorithm is used to cluster different frequency components based on the calculated sample entropy results. Finally, the VMD algorithm is used to perform secondary decomposition on the high-frequency components with greater influence, and the multi-scale characteristic mode matrix is constructed according to the results.
(3): The multivariate time series feature matrix and multi-scale feature modal matrix of key meteorological factors of photovoltaic output are input into CNN to extract deep features, and the predicted value of photovoltaic output is output by BiLSTM processing. At the same time, it is compared with BiLSTM, VMD–BiLSTM and CEEMDAN–BiLSTM models.

Figure 1. Forecasting process.

3. Data Sources and Data Preprocessing

The dataset in this paper selects the measured historical photovoltaic data of a small photovoltaic power station located in the wasteland of Northeast China as the object. The photovoltaic output data and meteorological data of the station from 5:00 to 17:00 from 1 October 2023 to 31 March 2024 were selected as samples. The data sampling interval was 15 min. The overall historical power data are shown in Figure 2.

3.1. Abnormal Data Detection

Considering that there will be outliers of historical photovoltaic power generation in the dataset, this paper uses the local outlier factor (LOF) outlier detection method. LOF is a density-based unsupervised anomaly detection algorithm. By comparing the density of a data point and its neighbors, it is determined whether it is abnormal data: if a data point is located in a low-density area, and its neighbors are located in a high-density area, it is determined that the point is abnormal, otherwise, it is normal. Therefore, the density of the data point and its surroundings in the dataset represents the outlier degree of the data point. The larger the LOF value, the greater the difference in density between the point and the data point in its neighborhood, and the more likely it is an outlier. In general, if the LOF value is greater than 1, it is considered an outlier because the data point is farther away from the surrounding than expected. If the data point is located in a dense data area, it is not an outlier, and the LOF value will be close to 1.

3.2. Missing Value Imputation

The method of filling in missing values mainly uses the median or mean. Because the mean is sensitive to outliers, large or small values in the data will have a greater impact on the mean. Therefore, for variables with little difference between the median and the mean, the mean is generally used to fill. The median is relatively stable and has little effect on the outliers. Therefore, for the characteristic variables that are far from the mean, the median is generally used to fill.

3.3. Normalized Processing

Considering that the dimension of the data is quite different, it will affect the analysis of the influence of meteorological factors on photovoltaic output, the accuracy of prediction and the training efficiency of the model. Therefore, the data used are normalized. As shown in Figure 3, the maximum and minimum values of the data are used to normalize the data and map the data to the [0, 1] interval. The calculation formula is as follows:

X_{2} = \frac{X_{1} - X_{min}}{X_{max} - X_{min}}

(1)

where

X_{1}

,

X_{2}

—results before and after standardization

X_{max}

,

X_{min}

—maximum and minimum values in data

Figure 3. Normalization of key meteorological factors and output power.

Combined with the correlation analysis of the meteorological conditions in the previous section and Figure 3, it can be clearly found that when the climatic conditions are stable, the change trend of photovoltaic output power and irradiance is basically the same, with only dimensional differences, while humidity is closely related to precipitation, and to a large extent directly affects the size of irradiance, which in turn, affects photovoltaic output.

4. Numerical Weather Prediction (NWP) Quality Analysis

4.1. Analysis of Relationship

Under normal circumstances, photovoltaic power will be affected by various meteorological factors, such as irradiance, temperature, pressure, humidity, wind speed, and precipitation. Therefore, it is necessary to first analyze the correlation of various influencing factors of photovoltaic output to screen out strong correlation factors. In this paper, the Pearson correlation coefficient method is used to quantitatively analyze the influence of various meteorological factors on photovoltaic output.

The Pearson product-moment correlation coefficient is a linear correlation coefficient, which uses

ρ

to reflect the degree of linear correlation between two variables. Among them

ρ

is between −1 and 1. If

ρ

is greater than 0, it means that the two variables are positively correlated, and if

ρ

is less than 0, it means that the two variables are negatively correlated. The closer the absolute value of

ρ

is to 1, the greater the correlation between the two variables is. In the dataset of this paper, the Pearson correlation coefficient is used to measure the correlation between different characteristic variables and the actual power of photovoltaic. Its calculation formula is as follows:

ρ = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}} \sqrt{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}

(2)

where

X and Y represent the values of the two variables, respectively.

\bar{X}

and

\bar{Y}

represent the sample mean of the two variables, respectively.

n represents the number of samples.

From Table 1, and Figure 4, it can be seen that the air pressure and irradiance are positively correlated with the actual power, and the correlation coefficients between them and the actual power are 0.24 and 0.97, respectively, indicating that the irradiance has a greater impact on the actual power, and the air pressure has a smaller impact. The characteristic variables that are negatively correlated with the actual power are temperature, humidity, precipitation and surface wind speed. The absolute value of the correlation coefficient between humidity and precipitation is greater than 0.4, which has a great influence on the actual power of photovoltaics. The correlation coefficients of air temperature and surface wind speed are −0.17 and −0.2, respectively, which should also be considered in the prediction of photovoltaic power.

4.2. NWP Prediction Effect Analysis

The historical data of photovoltaic power generation have the characteristics of strong randomness, weak law, and complex fluctuation mode, which leads to difficulty in prediction and an unstable prediction effect. Therefore, only considering the historical photovoltaic power data will lead to low prediction accuracy.

Considering that the current numerical weather forecast is more reliable, it can be seen from the previous section that irradiance plays a leading role in photovoltaic output, and the influence of precipitation, humidity, and other factors on photovoltaic output cannot be ignored. Using the coupling relationship between meteorological factors and photovoltaic output time series, the regularity of the original data is enhanced, redundant factors with low correlation are eliminated, and the short-term prediction of photovoltaic output is carried out by relying on meteorological factors strongly related to photovoltaic output. The feature information can be fully extracted, effectively improving the accuracy of the prediction. Finally, in the construction of the model, the feature selection is optimized according to factors, such as irradiance, temperature and precipitation.

5. Theoretical Basis and Prediction Model Establishment

5.1. CEEMDAN–VMD Bimodal Decomposition

The non-stationarity of photovoltaic power data is a key factor affecting the prediction accuracy. The modal decomposition algorithm can decompose the original multivariate photovoltaic power data into a series of modal functions and residual components with different time scales. It can not only effectively separate the influence of different frequency components but also transform the complex time series into a series of relatively stable sub-sequences, which is convenient for subsequent photovoltaic prediction.

However, based on different mathematical mechanisms, the modal decomposition algorithm has different understanding and processing methods for the original sequence and presents different characteristics when dealing with complex sequences. The obtained multi-scale subsequences have different regularities and are complementary to some extent.

Therefore, this paper chooses CEEMDAN and VMD, two excellent modal decomposition algorithms, to process the original signal.

5.1.1. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

CEEMDAN overcomes the shortcomings of EMD (Empirical Mode Decomposition) by adding adaptive white noise and calculating signal residuals, so as to obtain multiple intrinsic mode functions, so that the reconstructed signal is almost the same as the original signal. CEEMDAN not only overcomes the existing EMD mode mixing phenomenon but also reduces the reconstruction error by increasing the number of decompositions.

Define

E_{j} (\cdot)

as the calculation operator of the

j

th mode component obtained by EMD, then the

k

th IMF of the original net load curve P(t) to be decomposed through CEEMDAN is

I_{k}

. The algorithm steps are as follows:

Step 1: CEEMDAN uses the original net load curve

P (t) + ε_{0} ω_{i} (t)

to perform M experiments at the first stage (k = 1), where

ω_{i} (t)

is Gaussian white noise with normal distribution and i = 1, 2, …, M,

ε_{0}

is Gaussian white noise amplitude constant. The first intrinsic mode function

I_{i, 1}

is obtained by decomposing it by EMD, and then a component obtained by CEEMDAN is the mean value of all

I_{i, 1}

in this experiment. As below

I_{1} = \frac{1}{M} \sum_{i = 1}^{M} I_{i, 1}

(3)

Step 2: In the first stage, calculate the residual sequence

r_{1} (t)

of the first time.

r_{1} (t) = P (t) - I_{1}

(4)

Step 3: The sequence

r_{1} (t) + ε_{1} E_{1} (ω_{i} (t))

is decomposed by EMD for M times until the first IMF is obtained, where

ε_{1}

is the Gaussian white noise adaptive coefficient added after the first stage, and

E_{1} (\cdot)

is the first component obtained by EMD. At this time, the second component

I_{2}

of CEEMDAN can be calculated.

I_{2} = \frac{1}{M} \sum_{i = 1}^{M} E_{1} (r_{1} (t) + ε_{1} E_{1} (ω_{i} (t)))

(5)

Step 4: For each remaining stage k, repeat step 3 and calculate the k + 1 modal component as follows.

r_{k} (t) = r_{k - 1} (t) - I_{k}

(6)

I_{k + 1} = \frac{1}{M} \sum_{i = 1}^{M} E_{1} (r_{k} (t) + ε_{k} E_{k} (ω_{i} (t)))

(7)

where

r_{k} (t)

is the kth residual sequence,

ε_{k}

is the corresponding adaptive coefficient of Gaussian white noise added after the kth stage,

E_{k} (\cdot)

is the kth component obtained by EMD.

Step 5: Perform step 4 until the obtained residual signal no longer performs any IMF, and the standard condition is that the IMF can not be extracted from the residual, and the number of extreme points does not exceed 2.

The final residual signal is as follows:

r (t) = P (t) - \sum_{k = 1}^{K} I_{k}

(8)

where

K is the total number of modal components

Therefore, the original net load signal sequence P(t) is finally decomposed into the following:

P (t) = \sum_{k = 1}^{K} I_{k} + r (t)

(9)

The CEEMDAN method can solve the traditional EMD mode mixing problem by adding white noise with standard normal distribution and is more adaptive in the decomposition of the original signal.

5.1.2. Variational Mode Decomposition

The core idea of VMD is to decompose the signal into a series of intrinsic mode functions with limited bandwidth by means of variational optimization. It can automatically adjust the bandwidth and center frequency of each mode according to the characteristics of the signal, and VMD can effectively separate different frequency components in the signal and avoid modal aliasing.

The main steps are as follows:

(1): Firstly, the Hilbert transform is used to calculate the analytical signal related to the modal function $u_{k} (t)$ , and the unilateral spectrum is obtained.
(2): Then, the spectrum per $u_{k} (t)$ is modulated to the corresponding baseband.
(3): The signal is demodulated by Gaussian smoothing, and the bandwidth of each $u_{k} (t)$ is calculated to obtain the corresponding constrained variational problem.
(4): By using the quadratic penalty and Lagrange multiplier, the constrained problem is transformed into an unconstrained problem:

$\begin{matrix} L (\{u_{k}\}, \{ω_{k}\}, λ) = α \sum_{k} {∥\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j ω_{k} t}∥}_{2}^{2} + {∥f (t) - \sum_{k} u_{k} (t)∥}_{2}^{2} + 〈λ (t), f (t) - u_{k} (t)〉 \end{matrix}$

(10)

where
$u_{k}$ , $ω_{k}$ —The set of all modal functions and their center frequencies
$λ$ —Lagrange multiplier
$α$ —secondary penalty factor
$δ (t)$ —Dirac function
j—The imaginary part
(5): The final modal function and center frequency are obtained by iterative updating.

5.2. Convolutional Neural Network

CNN is one of the earliest deep learning algorithms proposed. It has special convolution and pooling processing methods and is often used in image, text and signal input processing. Because of its effective feature extraction ability, it has become one of the most widely used deep learning algorithms. Figure 5 shows the basic structure of a CNN, including the input layer, convolution layer, pooling layer, fully connected layer and output layer.

The core component of the convolutional layer can be regarded as a stack of multiple convolution kernels, which is responsible for extracting features from the input data. The pooling layer is divided into the maximum pooling layer and the average pooling layer, and the maximum or average value in the input region is selected as the output, respectively. Its function is to extract the feature vector processed by the convolution layer and reduce the number of parameters, reduce the amount of calculation and prevent the model from overfitting. Each layer of the fully connected layer is composed of many neurons, which are used to connect the features after pooling. These neurons gradually learn the nonlinear relationship in the data and generate the final power prediction results.

5.3. BiLSTM Neural Network

The BiLSTM neural network model is shown in Figure 6. It contains two independent LSTM hidden layers, a forward LSTM layer and a backward LSTM layer, which are arranged in chronological order and reverse order, respectively. The BiLSTM algorithm structure can be regarded as a symmetrical structure. It contains two symmetrical LSTM networks that can simultaneously process the forward and reverse information of the sequence to capture the characteristics of the sequence data. This ‘symmetrical’ information processing method can combine positive and negative data information at the same time to improve the accuracy of prediction.

In this way, the forward and backward information of photovoltaic data can be effectively captured and abstracted. Among them,

h \to

and

h \leftarrow

represent the LSTM hidden vectors of the forward LSTM layer and the backward LSTM layer at time t, respectively. They are independent of each other and only related to their respective LSTM layers.

h_{t}

is obtained by the weighted connection of these two hidden layers. The specific calculation process is shown in Equations (11) and (12).

{\vec{h}}_{t} = LSTM (x_{t}, {\vec{h}}_{t - 1})

(11)

{\overset{\leftarrow}{h}}_{t} = LSTM (x_{t}, {\overset{\leftarrow}{h}}_{t + 1})

(12)

5.4. Establishment of CEEMDAN–VMD–CNN–BiLSTM Model

As shown in Figure 7, in order to more effectively deal with complex multivariate time data series and more accurately characterize the accuracy of input variables and output data, this paper uses CEEMDAN combined with VMD’s dual-modal decomposition, which can not only avoid the problem of modal aliasing caused by single-modal decomposition, but also decompose the original sequence into multiple relatively independent sub-sequences. At the same time, K-means is used to cluster the sample entropy to obtain high-frequency components. Finally, VMD is used to decompose the high-frequency components. This method not only screens out the key factors that affect the photovoltaic law. Moreover, it reduces the dimension of the input parameters of the subsequent model, eliminates the sequence correlation and redundancy caused by VMD decomposition only, and helps to extract the key feature information in historical data.

At the same time, in view of the lack of full utilization of multivariate meteorological sequence data in some previous studies, this paper analyzes the NWP data, combines the Pearson correlation coefficient method, and quantitatively analyzes the influence of meteorological conditions, such as irradiance, temperature, pressure, humidity, precipitation and ground wind speed on the photovoltaic power data according to its heat map. The meteorological characteristics of high correlation coefficients, such as irradiance, humidity and precipitation, are selected, and the multivariate time series characteristic matrix of photovoltaic output meteorological factors is input to the subsequent model. Combined with the multi-scale feature mode matrix output by the double decomposition, it is used as the feature information input prediction module.

As one of the earlier deep learning algorithms, CNN has been developed very maturely, and in terms of extracting features, CNN can efficiently identify key features in input information through multiple convolutional layers and pooling layers. In this paper, CNN can adaptively perform convolution operations with different sizes of convolution kernels to extract data features. Combined with BiLSTM’s ability to fully and effectively capture time series dependencies, the final photovoltaic power prediction value is obtained.

5.5. Performance Evaluation Index

In this experiment, mean absolute error (MAE) and root mean square error (RMSE) were used to evaluate the prediction accuracy and performance of the model.

M A E = \frac{1}{m} \sum_{i = 1}^{m} |y_{i} - {\hat{y}}_{i}|

(13)

R M S E = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}}

(14)

6. Experimental Analysis

In this section, the experimental process and results are described, and the proposed photovoltaic power prediction algorithm is verified to illustrate its effectiveness. The CPU model used in this experiment is AMD Ryzen 7-5800H (Advanced Micro Devices, Inc., Santa Clara, CA, USA) and the running memory is 16 GB. The dataset used is the 180-day NWP data and photovoltaic output data from a certain place in Northeast China, with a resolution of 15 min. The first 70% are the training sets, and the last 30% are the test sets. The simulation time is 724 s.

6.1. Bimodal Decomposition

First, the historical photovoltaic power sequence is decomposed into multiple intrinsic mode functions (IMFs) and a residual component by CEEMDAN, as shown in Figure 8.

Then, the complexity of each IMF is evaluated by calculating the sample entropy of each IMF to help identify the regularity and randomness of the sequence. Based on the calculation results of the sample entropy, the K-means algorithm is used to cluster the IMF, and then the IMF components in the same cluster are integrated. As shown in Figure 9, they are integrated into Co-IMF1, Co-IMF2, and Co-IMF3, where Co-IMF1 is a high-frequency component.

Finally, as shown in Figure 10, VMD decomposition is performed on the integrated high-frequency component Co-IMF1 to further refine its frequency components so that subsequent CNN can extract features and improve the prediction accuracy of the model.

6.2. Prediction Analysis

After preprocessing the dataset, the Pearson correlation coefficient method is used to analyze the correlation between irradiance, temperature, air pressure, humidity, precipitation and ground wind speed. As shown in the third section of the NWP correlation analysis, irradiance has the greatest impact on photovoltaic output, followed by humidity and precipitation. Therefore, based on this, irradiance, humidity and precipitation are used as strongly correlated meteorological factors to generate a multivariate time series feature matrix and input into the subsequent prediction algorithm. At the same time, the coupling relationship between historical photovoltaic power data is fully utilized, and the dual mode decomposition method is used to capture the mode and mapping relationship between data. The multi-scale feature mode matrix at the bottom of historical data is decomposed and input into the prediction algorithm together.

In order to intuitively reflect the prediction effect of this method, prediction experiments are carried out for two types of stationary and non-stationary days. At the same time, BiLSTM, VMD–BiLSTM and CEEMDAN–BiLSTM models are used to form a control group, as shown in Figure 11, Figure 12, Figure 13 and Figure 14.

The parameters of CEEMDAN are as follows: the sampling frequency f s is 4 Hz, the sampling period is 0.25 s; the sampling starting position is 1; the standard deviation of adding white noise Nstd is 0.2. In order to enhance the robustness of the algorithm, the number of repeated white noise additions NR is 500. The maximum number of iterations MaxIter is 5000 to ensure that the signal is fully decomposed.

When calculating the sample entropy, the parameters are set as follows: the embedding dimension dim is 2; the down-sampling delay time tau is the default value 1; the similarity tolerance is 0.2 times the vector standard deviation. Based on this, K-means clustering is performed on the results of sample entropy. Each row of the sample entropy matrix represents the sample entropy value of a signal and is clustered into three categories, namely high frequency, medium frequency and low frequency. Finally, the high-frequency components after clustering are decomposed by VMD, and the number of modal components decomposed is specified to be 3, and the penalty factor is set to be 2500.

Combined with the analysis of the prediction results in Figure 11, it can be seen that in the case of good weather conditions and stable photovoltaic output, the predicted value of BiLSTM in the first half is in low agreement with the actual value, which is not in line with expectations. Similarly, the prediction effect of VMD–BiLSTM is worse than that of this method. Whether during the peak period or the second half, the predicted value deviates greatly from the actual value. The coincidence of CEEMDAN–BiLSTM is better before and after, and the predicted value deviates from the actual value during the peak period. The overall prediction results of the CEEMDAN–VMD–CNN–BiLSTM model proposed in this paper are better than the above models. As shown in Figure 12, the overall prediction error of the model in this paper is controlled within 0.4 kW when predicting the photovoltaic power of a stable day, and the predicted value is the highest in agreement with the actual value. For the fluctuation in photovoltaic power in a day, this paper also gives the corresponding experiment. As shown in Figure 13, the BiLSTM and VMD–BiLSTM models perform poorly on non-stationary days, and the difference between the predicted value and the actual value before and after the power inflection point is obvious, which is higher than the actual power value. For the CEEMDAN–BiLSTM model, the performance before the inflection point is good, the predicted value is in good agreement with the actual value, and the difference at the peak after the inflection point is obvious; the effect of the model proposed in this paper is obviously better than that of the above models. As shown in Figure 14, the prediction curve before and after the power inflection point has the highest degree of agreement with the actual value curve, and the prediction error is always controlled within 0.2 kW. The prediction accuracy is significantly higher than other prediction models, which reflects the stability and high prediction accuracy of the algorithm model in this paper.

As shown in Table 2, the prediction performance evaluation based on MAE and RMSE indicators shows that the CEEMDAN–VMD–CNN–BiLSTM is stable in both sample types. In the prediction of stationary days, compared with other models, the average absolute error is reduced by more than 0.016, and the root mean square error is reduced by more than 0.017. In the prediction of non-stationary days, compared with other models, the average absolute error is reduced by more than 0.04, and the root mean square error is reduced by more than 0.03. In summary, the CEEMDAN–VMD–CNN–BiLSTM model proposed in this paper can cope with the weather conditions of unstable wind speed and rainfall. From the comparison of experimental performance and performance indicators, it can be seen that the method has higher prediction accuracy and has better performance in predicting different sample types, which verifies that the method in this paper has better prediction performance.

6.3. Limitations of This Study

This study is suitable for the prediction of the photovoltaic power of small photovoltaic power stations with numerical weather prediction. It can deal with the data from small and relatively stable photovoltaic power fluctuations in sunny or less cloudy weather. It is also suitable for the data of large and unstable photovoltaic power fluctuation on cloudy, rainy days. In comparison, the prediction effect on non-stationary days is not as good as that on stationary days. Among them, BiLSTM simulation takes 131 s, VMD–BiLSTM takes 217 s, CEEMDAN–BiLSTM takes 260 s, and this method takes 724 s. Although the computational cost is longer than other models and the model complexity is higher than other models, the prediction effect and stability are improved. In the future, on the basis of meeting the memory and time cost, the optimization algorithm can be expanded to select the algorithm parameters, which will improve the prediction accuracy to a greater extent.

7. Conclusions

This paper proposes a combined prediction method based on the Pearson correlation coefficient method, K-means modal clustering, CEEMDAN and VMD double decomposition, CNN and BiLSTM. The following conclusions are obtained through experimental verification and prediction effect comparison:

Through the NWP correlation analysis, combined with the Pearson correlation coefficient method, the correlation coefficient and correlation of each meteorological condition on the photovoltaic output are obtained, and the influence characteristics of each meteorological condition on the photovoltaic output are analyzed. The high correlation weather variable is selected as the subsequent prediction input, which improves the prediction efficiency and accuracy;
The photovoltaic power data are decomposed by CEEMDAN, and K-means combines sample entropy clustering high-frequency components and VMD secondary decomposition to extract the underlying features from complex data, which not only reduces the data complexity but also effectively eliminates the redundant components in the sequence;
CNN–BiLSTM is used as the underlying prediction algorithm, and the multivariate time series feature matrix of meteorological factors and the multi-scale feature mode matrix obtained by double decomposition are the input. A CNN is used to extract the feature information in the input, and BiLSTM is trained to process the time series for prediction. Finally, the accuracy and stability of the proposed method are verified by experiments. In the prediction of stable days, compared with other models, the average absolute error is reduced by more than 0.016, and the root mean square error is reduced by more than 0.017. In the prediction of non-stationary days, the average absolute error is reduced by more than 0.04, and the root mean square error is reduced by more than 0.03.

Author Contributions

Conceptualization, methodology and writing—original draft preparation, X.Z. (Xize Zhang); formal analysis, writing—review and editing, X.Z. (Xia Zhou); investigation, resources and supervision, J.D. and T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ye, X.; Ye, J.; Liang, G. A Day-ahead Photovoltaic Power Generation Prediction Method Based on Data Mining and Micro-meteorological Information. In Proceedings of the 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China, 26–28 July 2024; pp. 353–358. [Google Scholar]
Riedel, P.; Belkilani, K.; Reichert, M.; Heilscher, G.; von Schwerin, R. Enhancing PV feed-in power forecasting through federated learning with differential privacy using LSTM and GRU. Energy AI 2024, 18, 100452. [Google Scholar] [CrossRef]
Mansour, A.A.; Tilioua, A.; Touzani, M. Bi-LSTM, GRU and 1D-CNN models for short-term photovoltaic panel efficiency forecasting case amorphous silicon grid-connected PV system. Results Eng. 2024, 21, 101886. [Google Scholar] [CrossRef]
Wang, F.; Xuan, Z.; Zhen, Z.; Li, K.; Wang, T.; Shi, M. A day-ahead PV power forecasting method based on LSTM-RNN model and time correlation modification under partial daily pattern prediction framework. Energy Convers. Manag. 2020, 212, 112766. [Google Scholar] [CrossRef]
Michalakopoulos, V.; Sarantinopoulos, E.; Sarmas, E.; Marinakis, V. Empowering federated learning techniques for privacy-preserving pv forecasting. Energy Rep. 2024, 12, 2244–2256. [Google Scholar] [CrossRef]
Limouni, T.; Yaagoubi, R.; Bouziane, K.; Guissi, K.; Baali, E.H. Accurate one step and multistep forecasting of very short-term PV power using LSTM-TCN model. Renew. Energy 2023, 205, 1010–1024. [Google Scholar] [CrossRef]
Liu, Q.; Darteh, O.F.; Bilal, M.; Huang, X.; Attique, M.; Liu, X.; Acakpovi, A. A cloud-based Bi-directional LSTM approach to grid-connected solar PV energy forecasting for multi-energy systems. Sustain. Comput. Inform. Syst. 2023, 40, 100892. [Google Scholar] [CrossRef]
Chen, Q.; Chu, A.; Du, J.; Wang, M. Short Term Forecast of Photovoltaic Power Generation Based on WOA-LSTM. In Proceedings of the 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China, 26–28 July 2024; pp. 1214–1220. [Google Scholar]
Thaker, J.; Höller, R. Hybrid model for intra-day probabilistic PV power forecast. Renew. Energy 2024, 232, 121057. [Google Scholar] [CrossRef]
Tavares, I.; Manfredini, R.; Almeida, J.; Soares, J.; Ramos, S.; Foroozandeh, Z.; Vale, Z. Comparison of PV power generation forecasting in a residential building using ANN and DNN. IFAC-PapersOnLine 2022, 55, 291–296. [Google Scholar] [CrossRef]
Huang, Y.; Liu, J.; Zhang, Z.; Li, D.; Li, X.; Wang, G. Dynamic Combination Forecasting for Short-Term Photovoltaic Power. IEEE Trans. Artif. Intell. 2024, 5, 5277–5289. [Google Scholar] [CrossRef]
Olcay, K.; Tunca, S.G.; Özgür, M.A. Forecasting and performance analysis of energy production in solar power plants using long short-term memory (LSTM) and random forest models. IEEE Access 2024, 12, 103299–103312. [Google Scholar] [CrossRef]
Li, M.; Zhang, F.; Wang, Y.; Ren, J.; Zhou, Q. Multidimensional Temporal Photovoltaic Power Prediction Based on VMD-SSA-LSTM. In Proceedings of the 2024 6th International Conference on Energy Systems and Electrical Power (ICESEP), Wuhan, China, 21–23 June 2024; pp. 192–197. [Google Scholar]
Saha, S.K.; Mahajan, S.M. Multivariate Optimal Hybrid Deep Learning Model for Forecasting of Day-Ahead Solar Irradiance with Meteorological Constraints. In Proceedings of the 2024 56th North American Power Symposium (NAPS), El Paso, TX, USA, 13–15 October 2024; pp. 1–6. [Google Scholar]
Wang, Z.; Yuan, Y.; Gong, Y.; Jiang, Y. The Short-Term Photovoltaic Power Prediction Model Based on FCM-BLS. In Proceedings of the 2024 7th International Conference on Power and Energy Applications (ICPEA), Taiyuan, China, 18–20 October 2024; pp. 762–766. [Google Scholar]
Wu, S.; Guo, H.; Zhang, X.; Wang, F. Short-Term Photovoltaic Power Prediction Based on CEEMDAN and Hybrid Neural Networks. IEEE J. Photovolt. 2024, 14, 960–969. [Google Scholar] [CrossRef]
Liang, H.; Li, G.; Xu, L.; Liu, Q. Short-Term Load Forecasting for A Power Supplying District Based on CEEMDAN-WPE-LSTM-Stacking Methods. In Proceedings of the 2024 7th International Conference on Energy, Electrical and Power Engineering (CEEPE), Yangzhou, China, 26–28 April 2024; pp. 436–441. [Google Scholar]
Huang, Y.; Wang, A.; Jiao, J.; Xie, J.; Chen, H. Short-term PV power forecasting based on CEEMDAN and ensemble DeepTCN. IEEE Trans. Instrum. Meas. 2023, 72, 2526012. [Google Scholar] [CrossRef]
Pan, Y.; Wang, J.; Li, P.; Wang, L.; Li, J.; Yin, Y. Photovoltaic power forecasting based on similar day theory and CEEMDAN-CSO-BP. In Proceedings of the 2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2), Taiyuan, China, 22–24 October 2021; pp. 1765–1770. [Google Scholar]
Sun, Y.; Wu, Y.; Liu, J.; Zhang, S.; Li, G.; Zou, G. Ultra-Short-Term Photovoltaic Power Prediction Based on Improved Kmeans Algorithm and VMD-SVR-LSTM Model. In Proceedings of the 2022 6th International Conference on Power and Energy Engineering (ICPEE), Shanghai, China, 25–27 November 2022; pp. 47–51. [Google Scholar]
Wang, J.; Lu, S.; Zhou, B.; Xu, B. Ultra-short-term forecast of photovoltaic power based on vmd error correction and cnn-gru-am. In Proceedings of the 2022 3rd International Conference on Advanced Electrical and Energy Systems (AEES), Lanzhou, China, 23–25 September 2022; pp. 91–96. [Google Scholar]
Li, G.; Ding, C.; Zhang, R.; Chen, Y.; Zhao, N.; Zhu, R. Short-Term Prediction of PV Power Based on Hybrid CNN–BiLSTM-Attention Model and VMD. In Proceedings of the 2023 6th International Conference on Energy, Electrical and Power Engineering (CEEPE), Guangzhou, China, 12–14 May 2023; pp. 998–1003. [Google Scholar]
Li, Z.; Ju, Y. Short-Term Photovoltaic Power Prediction Based on VMD-mRMR and TCN-BIGRU-ATTENTION. In Proceedings of the 2024 9th International Symposium on Computer and Information Processing Technology (ISCIPT), Xi’an, China, 24–26 May 2024; pp. 386–389. [Google Scholar]
Xin, Y.; Li, M.; Hong, Y.; Qiu, Y.; Wu, H.; Wang, P. Digital Twin Model of Photovoltaic Power Generation Prediction based on VMD and Bi-LSTM. In Proceedings of the 2024 8th International Conference on Power Energy Systems and Applications (ICoPESA), Hong Kong, 24–26 June 2024; pp. 507–511. [Google Scholar]
Zhang, S.; Niu, D.; Zhou, Z.; Duan, Y.; Chen, J.; Yang, G. Prediction Method of Direct Normal Irradiance for Solar Thermal Power Plants Based on VMD-WOA-DELM. IEEE Trans. Appl. Supercond. 2024, 34, 9002904. [Google Scholar] [CrossRef]
Aljafari, B.; Satpathy, P.R.; Thanikanti, S.B.; Nwulu, N. Supervised classification and fault detection in grid-connected PV systems using 1D-CNN: Simulation and real-time validation. Energy Rep. 2024, 12, 2156–2178. [Google Scholar] [CrossRef]
Ma, Y.; Huang, Y.; Yuan, Y. The short-term forecasting of distributed photovoltaic power considering the sensitivity of meteorological data. J. Clean. Prod. 2025, 486, 144599. [Google Scholar] [CrossRef]

Figure 2. Historical power data of photovoltaic power generation.

Figure 4. Pearson hotspot chart.

Figure 5. The structure diagram of CNN.

Figure 6. The structure diagram of BiLSTM.

Figure 7. Flow chart of prediction model.

Figure 8. The results of CEEMDAN modal decomposition.

Figure 9. The K−means clustering results for multiple IMF components.

Figure 10. The VMD decomposition of Co-IMF1.

Figure 11. Steady day forecast results.

Figure 12. Steady day CEEMDAN–VMD–CNN–BiLSTM error curve chart.

Figure 13. Non-stationary day forecast results.

Figure 14. Non−stationary day CEEMDAN–VMD–CNN–BiLSTM error curve chart.

Table 1. Correlation coefficient and correlation of PV output with meteorological factors.

Feature	Correlation Coefficient
Irradiance	0.97
Temperature	−0.17
Barometric pressure	0.24
Humidity	−0.45
Precipitation	−0.43
Surface Wind Speed	−0.2

Table 2. Comparison of predictive performance of different models.

Predict Sample Type	Model	MAE	RMSE
Stable Day	BiLSTM	0.45694	0.3258
	VMD–BiLSTM	0.30373	0.17747
	CEEMDAN–BiLSTM	0.10816	0.08574
	CEEMDAN–VMD–CNN–BiLSTM	0.09170	0.06819
Non-stationary Day	BiLSTM	0.19961	0.16335
	VMD–BiLSTM	0.15001	0.10877
	CEEMDAN–BiLSTM	0.09233	0.07437
	CEEMDAN–VMD–CNN–BiLSTM	0.05205	0.04054

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, X.; Zhang, X.; Dai, J.; Zhang, T. Photovoltaic Power Prediction Technology Based on Multi-Source Feature Fusion. Symmetry 2025, 17, 414. https://doi.org/10.3390/sym17030414

AMA Style

Zhou X, Zhang X, Dai J, Zhang T. Photovoltaic Power Prediction Technology Based on Multi-Source Feature Fusion. Symmetry. 2025; 17(3):414. https://doi.org/10.3390/sym17030414

Chicago/Turabian Style

Zhou, Xia, Xize Zhang, Jianfeng Dai, and Tengfei Zhang. 2025. "Photovoltaic Power Prediction Technology Based on Multi-Source Feature Fusion" Symmetry 17, no. 3: 414. https://doi.org/10.3390/sym17030414

APA Style

Zhou, X., Zhang, X., Dai, J., & Zhang, T. (2025). Photovoltaic Power Prediction Technology Based on Multi-Source Feature Fusion. Symmetry, 17(3), 414. https://doi.org/10.3390/sym17030414

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Photovoltaic Power Prediction Technology Based on Multi-Source Feature Fusion

Abstract

1. Introduction

2. The Overall Framework of Prediction Process

3. Data Sources and Data Preprocessing

3.1. Abnormal Data Detection

3.2. Missing Value Imputation

3.3. Normalized Processing

4. Numerical Weather Prediction (NWP) Quality Analysis

4.1. Analysis of Relationship

4.2. NWP Prediction Effect Analysis

5. Theoretical Basis and Prediction Model Establishment

5.1. CEEMDAN–VMD Bimodal Decomposition

5.1.1. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

5.1.2. Variational Mode Decomposition

5.2. Convolutional Neural Network

5.3. BiLSTM Neural Network

5.4. Establishment of CEEMDAN–VMD–CNN–BiLSTM Model

5.5. Performance Evaluation Index

6. Experimental Analysis

6.1. Bimodal Decomposition

6.2. Prediction Analysis

6.3. Limitations of This Study

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI