1. Introduction
Population growth and economic development have led to a continuous increase in electricity demand and global energy consumption [
1,
2]. Meanwhile, in the face of the increasing depletion of limited fossil fuels and the demand for carbon emission reduction, the development of renewable energy generation technologies is of paramount importance [
3,
4]. Photovoltaic power generation has grown rapidly due to its advantages of inexhaustible supply, long performance life, and good medium- and long-term economic feasibility [
5]. Researchers indicate that solar power could contribute 41–96 PWh of energy per year by 2050 [
6]. However, due to the influence of solar radiation, temperature, and humidity, the output power of photovoltaic generation has the characteristic of intermittency, volatility, and randomness, which causes great trouble to the operation, scheduling, planning, and safety of power systems [
7,
8,
9]. Therefore, the prediction of photovoltaic power considering related meteorological factors becomes a crucial guarantee for a secure and reliable power supply, as it significantly reduces the sensitivity of photovoltaic power to the intermittency, volatility, and randomness of meteorological factors [
10,
11,
12]. Simultaneously, with the rapid development of smart grids, the use of accurate photovoltaic power prediction methods is further promoted [
13]. With the development of computer hardware and software, prediction models use high-performance computing to achieve greater effectiveness, so that prediction plays a vital role in ensuring the operation of power stations and the safe operation of the smart grid [
14].
In this context, researchers have been dedicated to developing effective prediction technologies to cope with various application scenarios [
15]. Photovoltaic power forecasting belongs to the category of time series forecasting due to its continuous and real-time data [
16]. At present, according to the time scale of time series prediction, research in photovoltaic power prediction mainly includes ultra-short-term, short-term, and medium-to-long-term forecasts. The former primarily provides data support for grid dispatch to ensure the safety of power transmission, while the latter two mainly offer data support for the planned operation and production of power stations [
17,
18,
19].
In the early stages, Alam S et al. [
20] proposed using the REST model to predict direct solar irradiance. Lorenz et al. [
21] presents an approach to predict regional PV power output based on irradiance forecasts provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). While these methods achieved certain predictive success, it was found that similar models are difficult to obtain, the computational cost is high, and the adaptability to complex and variable meteorological conditions is poor. Therefore, time series models and regression analysis methods are applied in this field to obtain more accurate photovoltaic power data. Li Y et al. [
22] proposed the ARMAX model to predict photovoltaic power based on historical data, which significantly improved the prediction accuracy of power output. Persson C et al. [
23] proposed the use of gradient-boosted regression for prediction. Compared to single-site linear autoregressive models and variations of GBRT models, the multi-site model shows competitive results in terms of root mean squared error on all prediction ranges, while time series models and regression analysis methods have achieved certain success in prediction. However, in most cases, due to the instability of meteorological data and high sampling frequency, these methods are insufficient for accurate predictions.
In recent years, artificial intelligence technology, as a new generation of technology that simulates the work of the human brain to solve complex problems, is increasingly being applied in production and daily life [
24]. The bottleneck of photovoltaic power generation efficiency caused by the instability of meteorological factors is also expected to be solved through artificial intelligence algorithms. Due to its strong adaptability, self-learning ability, and ability to fit complex nonlinear relationships, artificial intelligence algorithms have begun to replace traditional time series models and regression analysis methods in complex photovoltaic power generation prediction. The artificial intelligence algorithm trains a model that can predict future power by taking as input historical data of collected power and meteorological factors that affect power without explicit expressions of mathematical relationships. Abdullah Alfadda et al. [
25] proposed a support vector regression model to perform one-hour-ahead solar photovoltaic power prediction. Experiments proved that, compared to polynomial regression and Lasso, the SVR prediction model is superior in accuracy. However, the accuracy of SVR in complex nonlinear scenarios will encounter challenges and SVR can not dig deep long-term dependencies in photovoltaic time series data. Recently, a deep learning theory, proposed by Hinton et al. [
26], has rapidly evolved and offered deeper and more powerful nonlinear network structures compared to traditional machine learning methods [
27]. Hossain and Mahmood [
28] proposed using Long Short-Term Memory (LSTM) to predict photovoltaic power generation, which can effectively capture time continuity and period dependence. Experimental results show that LSTM has the highest prediction accuracy compared to recurrent neural networks (RNNs), generalized regression neural networks (GRNNs), and extreme learning machines (ELMs). However, LSTM, being a recursive model, requires long computation times and high computational performance, thus demanding high computer hardware requirements. In order to solve the computing time problem of LSTM, Zhu R et al. [
29] proposed a prediction framework based on an improved Temporal Convolutional Network (TCN) specifically for time sequence processing to predict wind power. This method, by expanding causal convolution and residual connections, solves the problems of long-term dependency and performance degradation of deep convolutional models in sequence prediction. Experimental results show that the TCN exhibits higher prediction accuracy than existing predictors (such as Support Vector Machines, Multilayer Perceptron, Long Short-Term Memory networks, and Gated Recurrent Unit networks). However, to capture long-term dependencies, TCNs usually require a larger receptive field, which may lead to increased computational complexity and decreased accuracy, especially when dealing with very long time series that are highly complex with dynamic, changing characteristics.
Currently, in order to further improve the prediction accuracy and avoid the shortcomings of each single model, more and more researchers are using model combinations to predict photovoltaic power. Elizabeth Michael N et al. [
30] proposed a prediction model based on CNN and LSTM, using an improved CNN layer to extract features, and the output results of the CNN were used to predict targets using a stacked LSTM network, achieving excellent prediction accuracy. Limouni T et al. [
31] proposed a new model using LSTM-TCN to predict photovoltaic power generation. This model combines Long Short-Term Memory and Temporal Convolutional Network models, utilizing LSTM to extract temporal features from input data and then integrating them with the TCN to establish a connection between features and output. Compared with LSTM and TCN, it reduced the Mean Absolute Error as follows: Autumn by 8.47%, 14.26%; Winter by 6.91%, 15.18%; Spring by 10.22%, 14.26%; and Summer by 14.26%, 14.23%. Although the combined model further improves the prediction accuracy, it also makes the model more complex and less interpretable, particularly as the data shape transformation into each model becomes more complicated.
Recently, a new network of the RNN, the Simple Recurrent Unit (SRU), has been used for regression problems. The SRU network, by introducing parallel computing and GPU optimization, solves the problem of models with long training times and large occupation of computing resources, such as LSTM, TCN, and combined models, without losing accuracy. Additionally, it has better long-period acquisition and nonlinear processing capability than TCNs. The SRU network has recently been applied to complex nonlinear classification and regression problems, achieving commendable predictive results in water quality prediction [
32], humidity in waterfowl breeding environments [
33], remaining useful life prediction of bearings [
34], and spatiotemporal traffic speed prediction in urban road networks [
35].
Some scholars have also used outlier processing on datasets to reduce the complexity of the model and improve the prediction accuracy. In experiments conducted by Alimohammadi H et al. [
36], assessing the performance of time series outlier detection techniques, the K-Nearest Neighbors (KNN) and the Fulford–Blasingame methods are better than other outlier detection methods among the 17 evaluated. Therefore, this study uses KNN for outlier processing of photovoltaic power data time series.
Furthermore, some researchers have introduced intelligent optimization algorithms to enhance the performance of the proposed models, such as Particle Swarm Optimization (PSO) [
37], the Grey Wolf Optimizer (GWO) [
38], and the Cuckoo Search (CS) [
39]. Experimental results indicate that the use of optimization algorithms can improve the predictive capability of the proposed models. At the same time, quite a few optimization algorithms have been applied to the photovoltaic field to solve corresponding problems, such as the parameter identification of solar cells using the improved Archimedes Optimization Algorithm [
40], parameter extraction for photovoltaic models with tree seed algorithm [
41], and parameter extraction of solar photovoltaic models using queuing search optimization and differential evolution [
42], these have all achieved certain success. Now, a new population-based optimization algorithm called the Hunter–Prey optimizer (HPO) was proposed by Naruei et al. [
43] in 2022. It has the advantages of fast convergence and a strong optimization ability by simulating the animal hunting process. Many scholars have applied it to their research, and through the use of the HPO, they have avoided the uncertainty of manual and empirical adjustment of parameters and improved the accuracy and effectiveness of predictions such as the Short-Term Power-Load Forecasting Method Based on the HPO-LSTM Model [
44], Research on Early Warning of Coal and Gas Outburst Based on HPO-BiLSTM [
45], and machine vision-based recognition of elastic abrasive tool wear and its influence on machining performance [
46]. In view of the existing foundation of intelligent optimization algorithms in the photovoltaic field and the achievements of the HPO in other fields, this paper uses the HPO algorithm as the parameter of the intelligent optimization algorithm of the model to improve the accuracy and effectiveness of predictions.
In summary, in view of the shortcomings of the existing research, this article proposes an ultra-short-term prediction algorithm for photovoltaic power based on the HPO-KNN-SRU model, and compares the HPO-KNN-SRU model with SVR, LSTM, the TCN, and the SRU in experiments. Extensive ablation experiments are conducted to validate the effectiveness of the integrated model, KNN outlier handling, and the HPO algorithm. Our main contributions can be summarized as follows:
In the process of data anomalies, KNN is proposed to be used to process outliers in the data, and the HPO algorithm is used to optimize the KNN parameters. The ablation experiment proves that KNN can solve the problem of the data anomaly and improve the accuracy of prediction.
Utilizing the preprocessed data, relevant prior knowledge and the efficient, parallel-computable network SRU, we construct and train the HPO-KNN-SRU prediction model for photovoltaic power prediction. By comparing with SVR, LSTM, the TCN and the SRU, this method can achieve higher prediction accuracy.
The ablation experiments confirm that the advanced intelligent optimization algorithm HPO, applied to optimize KNN parameters and SRU parameters, not only improves the accuracy but also solves the randomness and subjectivity in parameter setting.
The remainder of this paper is organized as follows:
Section 2 introduces the methods and theories used. In
Section 3, we analyze and discuss the experimental results.
Section 4 concludes our work and illustrates future work.
2. Materials and Methods
2.1. Data Description and Preprocessing
The power generation data of the Canadian solar, 5.3 kW, polysilicon, stationary photovoltaic plant was obtained from the Australian (DKASC) photovoltaic center as the experimental data for this study. The data span from 1 January 2020 to 31 December 2022 with a time interval of 5 min, and 288 sets of data values can be collected per day. The dataset includes timestamp, Active_Energy_Delivered_Received, Current_Phase_Average, Active_Power, Performance_Ratio, Wind_Speed, Weather_Temperature_Celsius, Weather-_Relative_Humidity, Global_Horizontal_Radiation, Diffuse_Horizontal_Radiation, Wind-_Direction, Weather_ Daily_Rainfall, Radiation_Global_Tilted, Radiation_Diffuse_Tilted, etc.
Through observation data and data detection, the data power data at night are 0 or negative values, and there were vacant values in some rows of data. Therefore, these data values were deleted in this study and the timestamp and empty columns weed_speed that have little relevance to the experiment are deleted. Finally, 130,941 sets of data were left, and the dataset was divided into a training set (104,752 sets of data) and a test set (26,179 sets of data) at a ratio of 8:2.
The data information of the power time series training set and test set is shown in
Figure 1. The corresponding statistical information, including mean, minimum, maximum, standard deviation, skewness, and kurtosis are shown in
Table 1.
2.2. Sliding Time Window Selection
The sliding time window is a common method for handling time series data. The sliding time window mainly selects a fixed-size subsequence (window) on the time series data and then lets this window gradually slide over time. By inputting past data to predict one or more data points in the future, it captures the data characteristics of different time periods.
ACF refers to the correlation between a time series and itself at different lags, and its range is usually between −1 and 1. ACF can be used to determine whether a time series has autocorrelation, that is, whether there is a relationship with one or more previous observations within a lag period. PACF refers to the correlation between a time series and its own observations after a specific time lag period. It can be used to determine the lag order in the AR (autoregressive) model of the time series. PACF can directly measure the influence of this lag period by eliminating the influence of other lag periods. In time series analysis, ACF (the autocorrelation function) and PACF (the partial autocorrelation function) plots are used to assist in determining the parameters of the ARIMA model (autoregressive integrated moving average model), including the order p of the autoregressive term (AR) and the order q of the moving average term (MA). p and q represent how far past historical information should be considered when predicting the current value, which is what we often call the time window. If the ACF starts to decrease at a certain point and is close to 0 or becomes insignificant in subsequent lags, this point may be a suitable time window size. This may also be a window size to consider if the PACF suddenly decreases at some point and approaches 0 or becomes insignificant in subsequent lags.
The ACF of the data in this study is shown in
Figure 2 and the PACF is shown in
Figure 3 as follows.
It can be seen from the ACF and PACF graphs that in the ACF 38 is a significant value and in the PACF 5 and 38 are significant values. In order to further determine the time window value, the time window values 4–54 were experimentally compared in this study by using the SRU model training prediction to obtain the lowest time window value of the corresponding Mean Absolute Error (MAE). The final experimental results are shown in
Figure 4.
The experimental results indicate that a time window value of 38 has the lowest MAE value of 0.16. Consequently, this time window value of 38 is selected in subsequent experiments in this study.
2.3. Feature Selection
In this study, meteorological factors were used as feature data and the final power data were predicted through historical meteorological data. In the field of artificial intelligence, too many feature dimensions will introduce redundant information, prolong the training time, and increase the difficulty of modeling. The Insufficient Features dimensions will lead to the failure to achieve ideal prediction results. When selecting input features, factors with strong correlation with power should be selected.
Before selecting input factors, we first conduct a significance test to observe the statistical significance of the correlation between variables. By calculating the two-tailed test results based on the t distribution, the significant correlation p-values between each variable and Active_Power are all 0.00, which means that the correlation between each variable and Active_Power is statistically significant, proving that each variable and Active_Power are related in the population, so the correlation tests can be performed to obtain factors with strong correlations.
The Pearson correlation coefficient is used to measure the correlation between two variables,
X and
Y [
47]. In this study, it was used to measure the degree of correlation between meteorological factors and power generation. The formula of the Pearson correlation coefficient is:
where
X represents a value of a meteorological factor, and
represents its mean;
Y represents a value of photovoltaic power generation, and
represents its mean; and n represents the total number of data points for a meteorological factor.
In the Pearson correlation coefficient, when |r|≥ 0.8, the two variables can be considered to be highly correlated; when 0.5 ≤ |r| < 0.8, the two variables can be considered to be moderately correlated; and when 0.3 ≤ |r| < 0.5, the two variables can be considered to be a low-degree correlation. If |r| < 0.3, it can be considered that the two variables are basically irrelevant [
48].
The Pearson correlation coefficient between photovoltaic power and various meteorological factors is shown in
Figure 5.
From the analysis in
Figure 5, it can be concluded that the absolute values of the correlation coefficient of Current_Phase_Average, Performance_Ratio, Weather_Relative_Humidity, Global_Horizontal_Radiation, Radiation_Global_Tilted, and Active_Power are greater than 0.3. Therefore, these factors are ultimately selected as the final meteorological feature factors. The detailed information is shown in
Table 2.
2.4. HPO Optimization Algorithm
The Hunter–Prey Optimizer (HPO) is a new optimization algorithm based on swarm intelligence proposed by Naruet et al. [
43] in 2022. The core concept of the HPO is as follows: hunters attack individuals far away from the prey group and constantly adjust their position to chase the prey. At the same time, the prey is also dynamically adjusting its position in an attempt to escape to a safe area to evade the hunter’s attack. These two processes involve the update of the hunter’s position and the prey’s location, thereby completing the whole search process. The safe place is the global optimal position. When the prey reaches the safe position, the hunter gives up the current prey and chooses new prey, and the current prey survives.
First, initialize the initial population randomly as and the objective function of all members of the population is , where m is the population and , are the position and fitness function of the m-th member, respectively. Using the rules and strategies of this algorithm, we can guide and control the population in the search space, constantly update the position of the hunter’s prey, know whether the hunter is chasing the prey and whether the prey escapes the hunter’s pursuit, and use the fitness function to dynamically evaluate whether the new position isa global optimal solution. This process gradually refines the solution to the problem with each iteration.
The key to the HPO algorithm is to select hunters and prey. The corresponding selection mechanism is:
where
is a random number in the range of [0, 1],
is an adjustment parameter with a value of 0.1. If
, the search agent is regarded as the hunter, and the upper part of Equation (
2) is used to update the next position. If
, the search agent is regarded as the prey, and the lower part of Equation (
2) is used to update the next position.
is the position of the hunter/prey at time
t.
is the position of the hunter/prey at time
t + 1.
is the nth dimensional position of the prey.
C is the balance parameter and its value decreases from 1 to 0.02 during the iteration process.
is the optimum global position.
Z is the adaptive parameter.
is the average value of all positions. The calculation formulas of
C,
Z,
, and
are, respectively, as follows:
where
i is the current number of iterations.
is the maximum number of iterations.
and
are random vectors in the range [0, 1].
is a random number.
P is the index value of
.
L is the index value of vector
that satisfies the condition (
P == 0).
In this paper, the fitness function of the HPO uses the Mean Absolute Error (MAE) predicted by relevant model training. The specific calculation formula is as follows:
where
and
represent the observed and simulated power at point i, respectively.
and
represent the average of the observed and simulated power time series, respectively. n is the length of the time series.
2.5. HPO-KNN Outlier Detection
The K-Nearest Neighbors algorithm (KNN) is a method for outlier detection that calculates distances between different samples [
49]. The core idea of KNN outlier detection is that outlier points refer to sample points that are far away from most normal points. To put it simply, outlier points must be far away from most sample points.
The calculation formula for distances between different samples is:
The meaning of the expression is that in n-dimensional space, there are a(
,
, …,
) and b(
,
,
…,
), and the KNN distance value between them is the value calculated by the formula.
The calculation formula for outlier detection algorithm is:
where
is the distance between the current node and i-th node.
The KNN outlier detection algorithm is shown in
Figure 6.
On the left side of the figure, the average distance of the three neighbors is calculated as (3 + 4 + 3)/3 = 3.33. Conversely, on the right side, the average distance of the three neighbors is (7 + 9 + 5)/3 = 7. Clearly, the second point is more anomalous compared to the first point.
In order to improve the accuracy of prediction, the KNN is used to process data outliers. However, when processing data outliers based on the KNN, the important parameters of the KNN are artificial and random, which brings great uncertainty to the acquisition of the optimal parameter values. In this study, the HPO algorithm is utilized to intelligently optimize the n_neighbors and contamination parameters of the KNN. The SUR is used as the control experimental model. The fitness function is set to the MAE between the predicted value and the observed value. The n_neighbors and contamination values corresponding to the minimum MAE are chosen as the optimal parameters for this experiment.
The detailed steps of the HPO-KNN are as follows:
Collect historical photovoltaic power data and perform corresponding preprocessing.
Parameter initialization: Initialize the parameters of the HPO algorithm, including the number of search agents N and the maximum number of iterations T, and set the upper and lower boundaries of the HPO algorithm and mapping them to the upper and lower bounds of the KNN parameters n_neighbors and contamination.
Obtain the initial optimal fitness value through SRU training and prediction.
Adjust the positions of the hunters and prey according to the rules of the HPO, simultaneously updating the fitness values of members whose positions have been adjusted.
Obtain the best solution to the problem and output the optimal parameters for the KNN. Use these optimal parameters for outlier detection.
2.6. HPO-SRU Training
The SRU deep learning model is a model proposed by Tao Lei et al. [
50] based on the research on LSTM, GRU, and other models. It introduces parallel processing to reduce training time and complexity while maintaining the accuracy. The structure of the SRU model is shown in
Figure 7.
The network structure of SRU is as follows:
As can be seen from the above formulas, does not rely on , so the program can be parallelized. The last two formulas can perform calculations very quickly and concisely. Their operations are all between corresponding elements.
Simultaneously, matrix multiplication can be batch processed, which can significantly improve the intensity of computation and GPU utilization. In the above formula, three weight matrices can be merged into one large matrix:
In order to improve the accuracy of the SRU in photovoltaic power prediction, the four main parameters of the optimal SRU should be sought, namely hidden size, learning rate, network layers, and batch size. The HPO algorithm is used to optimize the four parameters of the SRU model. The fitness function is set as the MAE between the predicted and observed values. The steps for photovoltaic power prediction based on the HPO-SRU are as follows:
Standardize the data processed with outliers, normalize the entire dataset to [0, 1], and divide it into a training set and a test set according to the ratio of 8:2.
Parameter initialization: Initialize the parameters of the HPO algorithm, including the number of search agents N and the maximum number of iterations T, and set the upper and lower boundaries of the HPO algorithm and map them to the upper and lower bounds of the SRU parameters’ hidden size, learning rate, network layers, and batch size.
Obtain the initial optimal fitness value through SRU training and prediction.
Adjust the positions of the hunters and prey according to the rules of the HPO, simultaneously updating the fitness values of members whose positions have been adjusted.
Obtain the best solution to the problem and output the optimal parameters of the SRU. Build a model using the optimal parameters for prediction.
2.7. HPO-KNN-SRU Construction of the Predictive Model
According to the above description of the HPO-KNN and HPO-SRU, the implementation process for the proposed HPO-KNN-SRU model is as follows. The dynamic optimization process of the hunter/prey position in the HPO algorithm is used to achieve efficient outlier processing by optimizing the KNN parameters and then optimizing the SRU model parameters to improve prediction accuracy.
The structural framework of the HPO-KNN-SRU prediction model is shown in
Figure 8.
The HPO-KNN-SRU algorithm mainly comprises four modules: the HPO module, SRU module, KNN module, and Data module. The HPO module describes the detailed process of the hunter/prey optimization algorithm. The KNN module describes the detailed algorithm for K-Nearest Neighbors outlier handling. The SRU module describes the detailed algorithm for the SRU network. The Data module serves to supply the raw data.
The main steps of the HPO-KNN-SRU model for ultra-short-term photovoltaic power prediction are as follows:
Initialize the HPO algorithm population.
Determine the n_neighbors and contamination of the KNN and the hidden size, learning rate, network layers, and batch size of the SRU, which need to be solved by the HPO algorithm.
Train and test the HPO-KNN, use different parameters to process the data as outliers. The SRU model is used as the control experiment, and the Mean Absolute Error (MAE) is returned to the HPO to update the optimal solution of the population. Finally, obtain the minimum fitness value achieved by the HPO-optimized model and process the data as outliers.
Train and test the HPO-SRU model, and use different parameters to train and validate the exception processed data. MAE is returned to the HPO as the fitness value to update the best solution to the population. Finally, under the HPO, the optimization model is obtained and the optimal parameter combination model is obtained.
Construct the HPO-KNN-SRU model for final prediction.
2.8. Parameter Configuration
The rolling time window value is determined through the model training prediction of the SRU. The SRU parameters are set as follows: the learn rate is 0.001, the hidden size is 64, the batch size is 128, the epochs is 500, the optimizer is adam, the loss function is the MAE, the output size is 1, and the number of layers is 1.
KNN is used to process outliers on the data, the SRU training prediction is verified for the outlier processing effect, and the KNN parameters are optimized by using the HPO algorithm, which requires setting the relevant parameters of the HPO and the SRU model. After many experiments, the relevant parameters of the HPO are set as follows: nPop is 60, T is 30, lb is 5, ub is 35, and dim is 2. Finally the KNN was able to perform the optimal parameter search within the range of n_neighbors [5–35] and contamination [0.05–0.15]. The SRU parameters are as follows: the learn rate is 0.001, the hidden size is 64, the batch size is 128, the number of epochs is 500, the optimizer is adam, the loss function is the MSE, the output size is 1, and the number of layers is 1.
The SRU model is used for training and prediction, and the HPO algorithm is used to optimize the parameters of the SRU model, which requires setting a reasonable number of HPO parameters to search for the optimal parameters of the SRU. After many experiments, the relevant parameters of the HPO are set as follows: nPop is 60, T is 60, lb is 5, ub is 15, dim is 4. And the final SRU is searched for the optimal parameters within the range of the learn rate [0.001–0.01], batch size [], hidden size [], and number of layers [1–5].
In order to emphasize the effectiveness of the proposed HPO-KNN-SRU model, this study constructed various models such as SVR, LSTM, the TCN, and the SRU. The performance of SVR, LSTM, the TCN, the SRU and the HPO-KNN-SRU methods was evaluated to highlight the performance of the stand-alone HPO-KNN-SRU model. By introducing the HPO-KNN-SRU (only optimizing the KNN), model ablation experiments were carried out, and the experimental results of the HPO-KNN-SRU proved the effectiveness of the HPO algorithm in searching for the optimal parameters of the SRU. To further verify the effectiveness of the KNN in processing outliers in photovoltaic power data, we constructed the KNN-SRU and KNN-SVR models based on the KNN to conduct ablation experiments and compared them with the the SRU and SVR models. Simultaneously, by comparing the experimental results of the KNN-SRU and HPO-KNN-SRU (only optimizing the KNN), it is proved that the HPO algorithm is effective in searching for the optimal parameters of the KNN. Due to the use of a new dataset, grid search optimization was performed on each model parameter based on previous experience, and the final parameters were determined as follows:
SVR: c = 10, gamma = 0.01, kernel = rbf
LSTM: learn rate = 0.005, hidden size = 32, batch size = 128, number of layers = 1, optimizer = adam, loss function = MSE
TCN: channels = [32, 64, 8], kernel sizes = 3, dilation = [1, 2, 4], optimizer = adam, loss function = MSE
SRU: learn rate = 0.001, hidden size = 64, batch size = 64, number of layers = 1, optimizer = adam, loss function = MSE
For the KNN-SRU and KNN-SVR, the parameters for the KNN were determined through multiple experiments as follows: n_neighbors = 10, contamination = 0.1.
This paper records the starting time before starting, and then uses the current time minus the starting time as the prediction time at the end of prediction. In order to make the time recording more accurate, each model in this article uses the average prediction time of three experiments as the final prediction time, and while each model is running, the device is not running any other tasks.
2.9. Evaluation Metrics
To evaluate the performance of the proposed model, this study uses Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination (
) as performance metrics to evaluate prediction accuracy. The RMSE, MAE and
formulas are:
where
and
represent the observed and simulated power at point i, respectively.
and
represent the average of the observed and simulated power time series, respectively. n is the length of the time series.
To evaluate the performance differences between different models, the improvement percentage of three performance metrics, namely
,
and
, are used. The subscript 1 is a model with better performance, and the subscript 2 is a model with normal performance. M is represented as the value of an evaluation index (
,
, and
) of the prediction model. The percent improvement metric between Model 1 and Model 2 is calculated as follows:
2.10. Experimental Environment
The experimental environment used in this article is Intel Core™ i7-7700HQ CPU @2.80 GHz (Intel, Santa Clara, CA, USA), 16 GB RAM, NVIDIA GeForce GTX 4090, operating on Windows 11 (64-bit) (NVIDIA, Santa Clara, CA, USA). The programming and network construction were conducted in a PyCharm 2021 and Anaconda environment using Python 3.9, PyTorch 2.0, and CUDA 11.8.