1. Introduction
Electricity price has a crucial role as an economic lever in the electricity market [
1,
2], as an economic link connecting the electricity consumption and generation sides. Accurate electricity price prediction also has a huge impact on the sustainability of the power generation side, transmission and distribution side, and power consumption side.
With the advancement of electricity marketization reform in China, electricity trading has gradually been started based on bidding in the electricity market. Electricity prices are directly linked to the economic benefits of market participants [
3,
4,
5]. Accurate electricity price prediction can help market participants gain more profits in the market.
However, in the actual market, there is strong nonlinearity and volatility of electricity prices, and the electricity prices are influenced by various factors. Considering technology, the economy, the environment, policies, laws, and society makes the prediction of electricity prices very complex. Furthermore, the ongoing global transition toward sustainable energy introduces additional complexity. As renewable energy sources like wind and solar power gain prominence, their intermittent nature significantly impacts electricity price fluctuations, introducing new dynamics that must be considered in forecasting models. In response to the challenge of accuracy in electricity price prediction results, the main goals of this work are reflected in the following two aspects: (1) An FC-SSA-LSTM prediction model. (2) An error correction method for recursive prediction. The former optimizes the model and algorithm for electricity price prediction, and the latter provides an optimization solution for the practical use of prediction methods.
The technical routes for prediction can be generally divided into two types: mathematical statistical methods and machine learning methods. Mathematical statistical methods are relatively traditional, mainly using techniques such as multiple linear regression, autoregressive moving models, and exponential smoothing to fit the curve to predict [
6,
7,
8]. However, due to the nonlinearity and volatility of the electricity price series, traditional mathematical statistical methods cannot sensitively capture the inherent connections in time history series.
Machine learning-based prediction methods have developed rapidly in recent years, such as support vector machine (SVM), recurrent neural networks (RNNs), random forest (RF), and other algorithms, which have more advantages in handling nonlinear and highly volatile data [
9,
10,
11,
12]. During the training process of long sequences, RNNs may encounter the problem of vanishing or exploding gradients, and long short-term memory (LSTM) neural networks are a special type of RNN that can perform better in longer time series.
The prediction of electricity prices cannot simply rely on various input feature data. Electricity prices reflect the supply–demand relationship in the electricity market. In the short term, electricity prices are also affected by weather, bidding behavior, and supply and demand in the market. Therefore, it is necessary to first perform feature extraction and feature selection before predicting electricity prices. The accuracy of electricity price prediction results highly depends on the quality of input information. Electricity prices have certain characteristics, such as volatility and periodicity, and can be analyzed to reduce the impact of distractors. Chen et al. extracted and divided data into individual features, spatial features, and temporal features to ensure the accuracy of user load prediction in large-scale WiFi systems. Semantic information can be injected into each time step in prediction to reduce accumulated prediction errors [
13]. Cao et al. proposed using the random forest algorithm to analyze feature correlation and obtain the optimal input feature combination, and validated the influence of feature selection on the model through ablation experiments [
14]. Zhang et al. proposed constructing features based on the aerodynamic performance of wings and the energy relationship of hybrid power systems when predicting ship fuel consumption, overcoming the problem of ignoring certain feature information when directly specifying feature subsets in traditional prediction methods [
15]. Li et al. proposed a combination of a two-step hybrid architecture and autoencoder hybrid architecture for feature selection, and concluded that different feature selection algorithms will produce different subsets, affecting the prediction accuracy of LSTM. Therefore, using a hybrid model to process market information and obtain accurate prediction results is effective [
16].
Modal decomposition of data before prediction is also effective. For non-stationary sequences, the signal is decomposed into components that reflect the trend in the original sequence and noise components, and the needed sequence for reconstruction to denoise the dataset is selected. Empirical mode decomposition (EMD) is a common decomposition method; derivative variants such as ensemble empirical mode decomposition (EEMD), CEEMD, and CEEMDAN have subsequently been proposed. He et al. proposed an electricity price prediction model based on BED (Bivariant EMD Denoising), which serves as a feature tool to identify and eliminate noise, and further uses its error entropy as a criterion to determine the optimal level of contraction in EMD [
17]. Zhang et al. used a method combining VMD and ensemble empirical mode decomposition (EEMD) to improve the prediction accuracy of electricity prices. The residual term after VMD decomposition was decomposed into EEMD, and a differential evolution algorithm optimized extreme learning machine was used to predict the electricity price sequence [
18].
At present, the main research direction is to combine different neural networks and analyze different data separately to maximize the advantages of different algorithms. Kim et al. used 1D-CNN and BiLSTM models to predict power loads and system marginal prices. BiLSTM replaces traditional RNN cells with LSTM cells in the BRNN structure. By using these LSTM units, the basic long-term dependency problem of traditional RNNs was solved [
19]. Pan et al. proposed that, using the artificial bee colony algorithm to optimize support vector regression (SVR) neural networks, quantum algorithms can find the best solution in a shorter time and accelerate the speed of neural network training [
20]. Chaâbane uses the autoregressive fractional integral moving average (ARFIMA) model and feedforward neural network model to predict electricity prices. However, the use of linear models in electricity price prediction still has its limitations. Neural networks are gradually replacing traditional methods due to their flexible nonlinear fitting ability [
21]. Xiong et al. proposed using Bayesian optimization with Hyperband (BOHB) to optimize the hyperparameters of LSTM, which affects the final prediction results, reducing the uncertainty of hyperparameters and the possibility of unexpected prediction results caused by unreasonable settings [
22]. Qiu et al. used a composite algorithm consisting of empirical mode decomposition (EMD), Kernel Ridge Regression (KRR), and support vector regression (SVR) [
23]. The basic idea is to decompose the electricity price signal into several components through EMD, then use KRR to predict each component separately, and finally combine the predicted results of all components with SVR to obtain the predicted electricity price results. The summary of past research is shown in the
Table 1.
According to current research, the vast majority of researchers use machine learning algorithms for predictive fitting. Therefore, the model proposed in this paper uses the widely applicable and technically mature LSTM algorithm. However, there are still some problems that need to be solved in traditional machine learning methods. Firstly, the performance of machine learning algorithms is closely related to the input data, and secondly, it is necessary to overcome the nonlinearity of the input data. In order to further improve the accuracy of electricity spot price prediction, we propose a prediction and error correction method based on feature construction–singular spectrum analysis–LSTM.
First, we construct features based on the theoretical guidance of the power system. Input features with practical physical meanings are constructed based on existing publicly available market data, and the correlation between the constructed features and electricity prices is verified to ensure an explanation of changes in the electricity price sequence. Next, the SSA algorithm is used to decompose the electricity price sequence, then predict the decomposed components separately, and finally combine the predicted results to obtain the final electricity price prediction result. Traditional recursive prediction for future data has the problem of error accumulation, which affects the predictive performance of the model.
This article provides an error correction method that gradually corrects each step of the prediction value in recursive prediction based on the backpropagation of the historical error distribution results of the training and testing sets.
The rest of the paper is organized as follows.
Section 2 describes the model and prediction method.
Section 3 shows a case study based on actual market data, and the effectiveness of the proposed model and method is demonstrated through the comparison of the results.
Section 4 concludes this article.
2. Materials and Methods
2.1. Model of FC-SSA-LSTM-EC
The electricity price prediction model and process in this paper are shown in
Figure 1. The overall structure follows the “Construction–Filter–Decompose–Error Correction–Predict” approach, as follows:
Step 1: This paper uses data from the provincial electricity market in a certain province, China, and proposes constructing features with practical physical meanings from publicly available market data, generating new feature quantities based on existing features, and considering the essence of the features and the structure of the data to increase the nonlinearity of the overall input features.
Step 2: Calculate the Pearson correlation coefficient between the constructed features and existing market feature data and day-ahead electricity prices, and select features with high correlation as inputs to the model.
Step 3: Use the SSA algorithm to decompose the historical sequence of spot electricity price data and obtain multiple subsequences.
Step 4: Merge the decomposed subsequences separately, and then normalize the data of each component.
Step 5: Input each component into the LSTM model for training and testing, and obtain the predicted results for each component.
Step 6: Normalize the predicted results and merge them to reconstruct the electricity price prediction results.
Step 7: Compare the predicted results with the actual electricity price data and evaluate the model accuracy.
Step 8: Filter similar days for the forecast day in the historical data, calculate the errors in the training and testing sets for similar days, and then divide the errors into 96 sequences based on 96 points within a day. The median value in each sequence is the correction value at the corresponding time point, forming a 96-point error correction sequence.
Step 9: In recursive prediction, gradually correct the prediction results according to the data at the corresponding time point, and repeat this process until a complete round of prediction is completed to obtain the final prediction result.
2.2. Evaluation Indicators for Prediction Results
In regression prediction analysis, common evaluation indicators include the correlation coefficient R2, mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), etc. [
24,
25,
26,
27].
MSE amplifies the impact of large errors by squaring, making it suitable for scenarios that are more sensitive to extreme errors. However, its results are measured in square units and have poor interpretability; RMSE (root mean square error) preserves the same dimension as the target variable through square root opening, making it easier to understand and suitable for situations that require intuitive representation of errors and attention to outliers. MAE is more robust to outliers and can directly reflect the average deviation, but it is not as smooth as MSE in the optimization process and is suitable for scenarios where outliers exist in the data. MAPE provides the relative proportion of error, suitable for comparing datasets of different sizes, but is sensitive to actual values close to zero, making it suitable for situations with a wide range of data and the need for relative error. R2 is used to measure the explanatory power of a model for fluctuations in the target variable, making it easier to compare with the baseline model. However, it may underestimate the effectiveness of the model for nonlinear problems and is suitable for scenarios where the explanatory power of the model needs to be evaluated. This article compares all four indicators to comprehensively evaluate the performance of the model from multiple perspectives. Each indicator reflects different aspects of model performance: MSE and RMSE emphasize punishment for large errors, which helps reduce extreme prediction errors; MAE provides an intuitive measure of average error and is more robust to outliers; MAPE measures the relative proportion of errors and is suitable for evaluating datasets of different sizes; and R2, on the other hand, demonstrates the model’s ability to capture fluctuations in the target variable from the perspective of explanatory power. By combining these indicators, the advantages and disadvantages of the model can be diagnosed more accurately, avoiding the misleading effects of a single indicator, and providing more comprehensive guidance for model optimization and improvement.
The above four indicators are used to evaluate the prediction model, and their calculation formulas are as follows:
where
is the predicted value and
is the sequence average. The range of values for
is [0,1]. If the result is 0, the model fitting effect is bad. If the result is 1, it means there is no error in the fitting result. Generally, the closer the correlation index is to 1, the better the model fitting effect. However, it should be noted that the value of the correlation index is also related to the number of samples, and a large amount of data will inevitably lead to an increase in this value. The closer the value of MAE is to 0, the smaller the model error. The MAPE formula is similar to MAE, but due to the addition of multiple denominators, it is not suitable for situations where the denominator is 0.
2.3. Feature Construction
The original dataset used in this article is public data from the electricity market in a province from China. When conducting research on electricity price prediction, feature construction and feature selection are crucial [
28]. Input features directly affect the performance and prediction accuracy of the model. By selecting and constructing appropriate features, the complexity of the model can be reduced, the generalization ability of the model will be improved, and overfitting and distortion can be avoided. Constructing meaningful features can improve the interpretability of the model and more accurately reflect the actual situation.
But the constructed features should have certain characteristics to better explain the reasons for the formation or fluctuation of electricity prices [
29,
30]. When constructing features, this paper considers the correlation and interpretability between features and electricity prices. This means that features need to be able to capture factors that affect electricity prices and explain the relationship between electricity prices and features. Therefore, feature construction should be considered from the perspective of how features affect electricity demand and supply.
2.4. Feature Selection
Pearson correlation analysis is often used to measure the linear correlation strength and directional relationship between different variables, and to represent the degree of correlation between two variables by calculating the Pearson correlation coefficient between two variables [
31]. The Pearson correlation coefficient is a value between −1 and 1. If the coefficient is 1, it indicates a complete positive correlation between the two variables; 0 indicates no correlation, and −1 indicates a complete negative correlation between the two variables. The calculation formula for Pearson correlation coefficient is as follows:
Formula (5) is the definition formula for the Pearson correlation coefficient, where cov is the covariance and
is the standard deviation. In practical applications, estimation of expectation, variance, and covariance is based on samples. By substituting (6)–(8) into (5), it can be obtained that
and
are the values of two variables at the i-th observation point,
and
are the means of the two variables, and
n is the number of samples. The comparison of the Pearson correlation coefficient and correlation strength [
15] can be found in
Table 2.
The choice of Pearson correlation analysis for feature selection is justified by its ability to identify linear relationships between input features and the target variable, which is crucial for selecting relevant predictors in time series forecasting. Unlike more complex feature selection methods, such as mutual information or model-based feature importance (e.g., random forest), Pearson correlation provides a computationally efficient and interpretable approach. Given the high-dimensional nature of electricity price data and the need for quick insights into feature relevance, Pearson correlation was deemed appropriate.
Moreover, while nonlinear dependencies may also exist, the proposed framework compensates for this limitation in subsequent steps by employing SSA and LSTM, which are better equipped to capture nonlinearity in the data. This hybrid approach ensures that both linear and nonlinear relationships are effectively addressed, striking a balance between computational simplicity and predictive performance.
2.5. Singular Spectrum Analysis
SSA is commonly used to process nonlinear time series data with significant fluctuations, and can effectively achieve data analysis and denoising. The SSA algorithm decomposes and reconstructs data sequences to extract long-term trends, seasonal trends, noise, etc. The steps of singular spectrum analysis usually include trajectory matrix construction, singular value decomposition (SVD), reconstruction, and component extraction [
32].
The basic idea of singular spectral decomposition is to decompose time series data into a set of orthogonal components, each corresponding to a specific structure or frequency in the original data. Singular value decomposition makes this decomposition possible, and by appropriately selecting singular values, important components can be extracted. This makes SSA perform well in time series analysis, especially for the decomposition of nonlinear and non-stationary sequences.
The primary steps of SSA include trajectory matrix construction, singular value decomposition (SVD), reconstruction, and component extraction. The process systematically separates meaningful patterns from noise and randomness, ensuring that critical features of the time series are preserved for subsequent modeling and analysis.
The analysis object of SSA is a one-dimensional time series,
, where
N is the sequence length. Firstly, select the appropriate window length L to lag the original sequence and obtain the trajectory matrix:
Next, use singular value decomposition on the trajectory matrix:
In (11),
is called the left singular vector matrix;
only has a value on the main diagonal, and all other elements are zero, which is the singular value; and
is called the right singular vector matrix. In addition,
and
are both unit orthogonal matrices. Using the results of singular value decomposition, the reconstruction matrix is obtained by truncating the matrix
:
, , and represent the truncated left singular vector, singular value, and right singular vector matrices.
SSA is particularly suited for electricity price forecasting, as it addresses both noise and broader sources of uncertainty, such as irregular market behaviors and renewable energy integration. Its ability to isolate and analyze key components of the data ensures more robust and accurate predictions.
This article first decomposes the time series into several components through SSA, including trends, periodic components, and random noise. Then, LSTM is used to separately model and predict each component, capturing its unique dynamic characteristics. The LSTM model is a nonlinear model that can learn complex time series patterns.
In the component reconstruction stage, we use a simple addition method to synthesize the predicted results of each component into the final predicted value. This is based on the linear superposition property of SSA decomposition: the original time series is equal to a linear combination of all components.
2.6. LSTM Network
Recurrent neural networks were commonly used to predict data with strong temporal correlation, but RNNs inevitably encountered the problem of vanishing gradients when dealing with large amounts of data, and the LSTM algorithm solved this defect [
33].
An LSTM cell has three gates: the forget gate, input gate, and output gate, which correspond to the three boxes labeled with σ. The output of the output gate is not the final output of the LSTM cell, and the final output is the
and
shown in
Figure 2. The forget gate is denoted as
. The input gate is denoted as
. The output gate is denoted as
.
At time t, the formula of the LSTM neural network is as follows:
In addition to the , , , , and mentioned, represents the recursive connection weights of the corresponding thresholds, while sigmoid and tanh are two types of activation functions.
In this study, the LSTM network was implemented using the Python Keras library. The architecture of the model included multiple LSTM layers combined with fully connected dense layers for final predictions. To ensure optimal performance, hyperparameter optimization was performed during training. The range of hyperparameters, such as the number of LSTM units, learning rate, batch size, and dropout rate, was predefined.
During the training process, the grid search approach was employed to test combinations of hyperparameters, and the model performance was evaluated using a validation dataset. The hyperparameter settings yielding the best validation performance were selected as the final configuration. This automatic tuning ensured the robustness of the model and its ability to generalize well to unseen data.
Key hyperparameters optimized include the following:
Number of LSTM units: determining the capacity of the model to capture temporal features.
Learning rate: balancing convergence speed and stability during optimization.
Batch size: controlling the number of samples per gradient update.
Dropout rate: mitigating overfitting by randomly deactivating units during training.
In traditional recursive prediction methods, the one-step model is used multiple times to continuously predict forwards, and the prediction result of the previous time step is used as the input for predicting in the next time step. In this prediction method, as the time range expands, prediction errors accumulate and the predictive accuracy of the model may rapidly decline [
34].
To address this limitation, this study incorporates an error correction mechanism, dynamically adjusting the recursive prediction process. By leveraging historical error distributions observed during training and validation, the model refines each predicted value iteratively, ensuring greater robustness and accuracy in long-term forecasts.
2.7. Selection of Similar Days
Before calculating the error correction sequence, filter similar days for the forecast day in the historical data, searching for electricity price data and prediction errors that can effectively reflect the same supply and demand conditions from historical data, thereby improving the pertinence and effectiveness of error correction.
Using the discrete Fréchet distance as the standard, calculate the overall similarity between
N historical days and the predicted day, and take the M-day with higher similarity as the sample of similar days [
35].
Calculate the discrete Fréchet distance formula for the sequence point pairs composed of each sequence point in two discrete sequences
P and
Q as follows:
and are the elements in sequences P and Q, are the longest lengths in the sequence point pairs , and are the discrete Fréchet distances.
Calculate the discrete Fréchet distance of each input feature between the historical day and the forecast day according to the above formula, obtain the distance matrix, and then calculate the overall similarity:
After obtaining the overall similarity through multiplication, select the samples with higher similarity from historical days as similar day samples.
2.8. Error Correction Method Based on Historical Error Distribution Results of Training and Testing Sets
Based on the distribution fitting of the error data from the training set and test set of similar days, the distribution function of the error sequence can be obtained. It can be used as a reference to correct the predicted results to a certain extent. The fitting diagram of the error distribution of all data in the test set and training set is shown in
Figure 3.
Using a single numerical value to correct the result during the recursive process may result in greater errors, making it difficult to correct the value in the right direction. The error distribution is analyzed based on the 96 points of the day to be predicted, and then the residual sequence of the 96 points is obtained to directly correct the recursive prediction results. This correction method does not consider the accumulation of errors in the recursive prediction process, so the correction force is not enough in later time periods and periods with large fluctuations. In order to eliminate the impact of accumulated errors, we make gradual corrections to the prediction results during the recursive prediction process and use the latest observed values for the next prediction, which can greatly reduce the impact of accumulated errors. The revised formula can be expressed as follows:
is the corrected predicted value, is the original prediction result of the model at the t-th time point, and is the corresponding corrected value in the corrected sequence. is the sequence of prediction error vectors on day i, N is the number of prediction points in a day, and M is the number of days counted in the training and testing sets.
3. Case Study
In this section, we use publicly available market data for feature construction and feature selection for feature engineering, and demonstrate the effectiveness and efficiency of our method through ablation experiments. The experimental results were obtained on a PC with an Intel Core i5-8300 processor 2.3 GHz, four cores, and 16 GB of memory, and GTX 1050Ti as the GPU. The software and framework used included Python 3.11, tensorflow 2.12, and keras 2.12.
3.1. Dataset
To validate the predictive performance of the FC-SSA-LSTM model proposed in this paper, the study used data from a provincial electric power market, with a time span from 1 December 2021 to 3 November 2022. The data include publicly available data from the provincial electricity market and the installed capacity of various types of energy in this province. The market is cleared every 15 min, with 96 time points per day. A total of 32,448 rows of data were used as experimental data, with the first 80% being the training set and the last 20% being the testing set. Some of the current electricity price data are shown in
Figure 4, showing a certain degree of regularity.
Based on publicly available market data, we can analyze the relationship between electricity price sequences and other influencing factors. The market data used in this study include reference prices, total wind power, total photovoltaic power, proportion of renewable energy output, direct load regulation, total power generation of local power plants, contact line electricity load, total nuclear power, wind power flexibility, photovoltaic flexibility, flexibility ramp up, coal price index, maximum operating ratio, and clearing ratio.
Under conventional cognition, the price difference between day-ahead and real-time electricity prices in the spot market will change with the variation in bidding space difference, and bidding space difference is usually analyzed during daily declaration. For market participants, price differences in different seasons can directly affect their decision-making behavior, adjusting trading decisions based on price difference fluctuations to maximize profits or reduce costs.
According to the information in
Figure 5 and
Figure 6, it can be seen that the electricity prices are not highly similar under the same supply and demand relationship in different seasons. Therefore, in electricity price prediction, we try to reduce the number of training sets. In this study, we used quarterly data for training to better capture the characteristics of factors such as climate impact, holiday impact, policy adjustments, etc., and not being disturbed by noise data from other quarters.
3.2. Feature Construction and Feature Selection
According to publicly available market data, provincial direct load regulation data, the contact line electricity load, and the total output of renewable energy such as wind and solar can be obtained. The processes of generating electricity such as wind energy, solar energy, and hydropower are usually carbon-free, and the main cost of most renewable energy is concentrated in the construction stage. The marginal cost of subsequent operation and power generation is relatively low. Therefore, in order to promote the transformation of energy structures, reduce dependence on traditional energy sources, and promote sustainable development of the environment, renewable energy in the electricity market is usually prioritized for clearance.
This paper aims to construct new features that have practical significance in the electricity market, rather than adding useless input features to increase the complexity of the algorithm. When constructing input features for electricity price prediction models, the main consideration is whether the constructed features will affect the formation or fluctuation of electricity prices. Therefore, from the aspects of market supply and demand, power generation capacity, and unit equipment, the bidding space for thermal power is proposed, and the minimum technical output of units is used for feature construction.
When considering the bidding space for thermal power in the market, based on the principle of prioritizing the clearance of wind and solar power generation and external transmission power [
36,
37,
38], thermal power is responsible for filling the load gap. Under the condition of supply–demand balance, the formula for the bidding space for thermal power can be obtained:
represents the bidding space for thermal power; represents direct load regulation; represents the power load of the connecting line; represents the total amount of wind power; represents the total sum of photovoltaics; represents the total amount of nuclear power; and represents the total number of self-owned units.
The bidding space for thermal power reflects the ability of the market to adjust based on the supply–demand situation, and is closely related to the formation of market prices. Using the bidding space for thermal power as an input feature for electricity price prediction can explain the fluctuations in market prices.
In the electricity market, the minimum active power required for steady-state operation of a generator set (power plant) is called the minimum technical output [
39,
40]. The minimum technical output reflects the operating and fuel costs of power generation companies in the market, directly affecting the electricity price level and also reflecting the flexibility of power generation units, thereby directly affecting the participation strategy of generator units in the market.
After the implementation of the thermal power capacity pricing mechanism in China, the flexibility requirements for thermal power will be higher, and the minimum technical output of units will also have corresponding requirements. The formula proposed is as follows:
is the minimum technical output of the unit; represents the bidding space for thermal power; and represents the installed capacity of thermal power, expressed in percentage form. In the Guangdong spot market of China, it is stipulated that the first output in the quotation is the minimum technical processing to solve the problem of reduced optimization space that may occur during clearance. This formula reflects whether the flexibility of thermal power in the market is sufficient and is related to the level of unit pricing.
By using optimization algorithms to determine the minimum technical output setting, the correlation between and electricity price is strongest. It is found that the best fitting is achieved when the minimum technical output is set to 0.6. Meanwhile, the minimum technical output of thermal power units is generally about 60–70% of the rated capacity, so the numerical setting is in line with the actual situation.
In this paper, we evaluated the linear relationship between features and targets by calculating the Pearson correlation coefficient between features and target variables. By reducing redundant and irrelevant features, the accuracy of prediction can also be improved and the running time of LSTM algorithm can be reduced. The input features selected in this study and the Pearson correlation coefficient with electricity prices are shown in the
Table 3.
3.3. Singular Spectrum Analysis Results
The singular spectrum analysis algorithm is used to decompose electricity price data, and then the components are predicted separately. The number of decomposed modes is set to 10, and the electricity price sequence is decomposed into long-term trend signals, periodic signals, and noise signals. The results obtained are shown in
Figure 7.
It can be seen in
Figure 7 that the first sequence can reflect the trend in the electricity price sequence generally, while the later sequences can almost be regarded as noise. Here, we select the first five sequence combinations for reconstruction and the last five sequence combinations for reconstruction, as shown in
Figure 8.
Before using the LSTM algorithm for training, we need to process the input data uniformly; the method used here is normalization. Normalization of data can unify raw data of different dimensions into the same range, avoiding the impact of different dimensions on predictive performance. We scale each feature value using the following formula:
is the feature value in the original data, is the minimum value in the sample where the feature is located, and is the maximum value in the sample where the feature is located. This formula maps all input features to the interval of [0,1]. Normalization makes the gradient easier to backpropagate, alleviating the problem of vanishing or exploding in the gradient. And it can accelerate training convergence and reduce the instability of the training process.
3.4. Testing Set Results
After completing data preprocessing, LSTM is used to predict the two components separately. In this paper, historical data of input features from 96 time steps were used to predict the electricity price at the 97th point. The first 80% of the dataset is used as the training set, and the last 20% is used as the testing set. The input dimension is (96,6), and the corresponding output is (1,1). The model parameter settings in the experiment are shown in
Table 4.
To verify the effectiveness of the FC-SSA-LSTM model proposed in this article, the electricity price prediction results were compared with the prediction results of a single LSTM model, as well as the FC-LSTM model, BP neural network, and support vector machine (SVM) regression prediction models.
LSTM is suitable for time series data and can effectively capture long short-term dependencies. It is particularly adept at handling temporal tasks such as speech recognition, natural language processing, and stock prediction; SVM has powerful classification and regression capabilities by maximizing the classification interval, especially suitable for small-sample high-dimensional data such as text classification and gene data analysis; BPNN, on the other hand, can handle complex nonlinear problems through feedforward and feedback structures, and is widely used in tasks such as function approximation and image recognition.
The predicted results are partly shown in
Figure 9, and the test set error results for different models are shown in
Table 5.
As shown in
Table 5, it is possible to visually compare the differences in evaluation indicators between the model proposed in this paper and other models. The correlation index R2 of the five models in the table is not significantly different, and each model has a good fitting effect. The MAE value of feature construction–SSA-LSTM is 8.894, the MAPE value is 9.99%, and the RMSE value is 11.472. Compared with the model of feature construction–LSTM, the MAE value has decreased by 8.785, the MAPE value has decreased by 28.477%, and the RMSE value has decreased by 17.194. This proves that the SSA algorithm successfully performs data dimensionality reduction and denoising in deep learning prediction models and extracts effective feature information.
Compared with the STAM-LSTM proposed by Cao et al. [
14], the FC-SSA-LSTM proposed in this paper has significant advantages in both MAPE and R2, proving that feature construction and SSA can effectively improve model performance. But like most studies in the same field, the lightweighting of the model is still an unresolved issue. The results also indicate that different accumulation techniques have limited impact on predictive performance. In the case of simple addition, the prediction errors of each component will be directly superimposed, but the capture effect on the overall trend and dynamic changes in the result is good.
From
Figure 9 and
Table 5, it can be seen that the prediction model of the BP neural network has the worst performance. Compared with the feature construction–LSTM hybrid model and the ordinary LSTM model, the fitting effect of feature construction–SSA-LSTM is better. The prediction accuracy of the BP neural network and support vector machine (SVM) regression prediction model in peak and valley values is lower than the method proposed in this paper. By adjusting the LSTM network structure and early data processing, the model performance can be effectively improved and the prediction results of nonlinear and non-stationary time series data can be enhanced.
3.5. Error Correction and Recursive Prediction
The time series prediction model based on the LSTM neural network usually uses the recursive prediction method when predicting future data. By training the LSTM model with historical data, adjusting the weights and biases of the model, the model can achieve better predictive performance.
After completing the model training, use the model for rolling recursive prediction. After finishing a prediction, slide the window and add the first prediction result to the input. Repeat this step to predict the Nth step.
We selected the last day of the testing set, 30 October 2022, as the day to be predicted. There are a total of 96 electricity price datapoints for the forecast day. The electricity price and publicly available market data from the day before the day to be predicted are used as inputs for the first step of prediction, and the recursive prediction is carried out forwards. The results are shown in
Figure 10.
In order to overcome the impact of accumulated errors in the model on future data prediction results, this paper proposes an error correction method based on error backpropagation, which updates the model and input data after each prediction to reduce the accumulation of errors. Introduce an error term into the LSTM neural network model and add it to the loss function. Use backpropagation to update the weights of the LSTM model, so that the model can adjust to obtain more accurate prediction results through historical error data.
Firstly, construct a sample set of similar days, calculate the overall similarity of all historical data according to (19)–(23), and sort them in descending order of similarity. The similarity calculation results of historical data are shown in
Table 6.
The electricity price curves for different historical days are shown in
Figure 11. From the above results, it can be seen that the correlation and the electricity price curve between closely spaced dates is higher. As the time distance increases, the overall similarity decreases rapidly. Therefore, only 30 sets of historical days with higher similarity are selected as similar day sample sets.
During the model training process, the error data of the training and testing sets are statistically analyzed. The histogram of the deviation between the predicted results of the testing set and the actual values is shown in
Figure 12.
The shape of the histogram can visually show that the error distribution has a certain normal distribution shape. By comparing the errors between the training and testing sets, and counting 96 points in a day, the recursive prediction results are corrected accordingly. The histogram of error distribution at different time points is shown in
Figure 13.
The median was extracted from each time period sequence separately, and the 96 data obtained form an error correction sequence. Using the median can ensure the correct correction direction and reasonable correction force to the greatest extent possible. After completing the sorting of the correction sequence, we gradually correct the latest prediction results in recursive prediction and use the latest observation values for the next prediction. The predicted result at 96 points on the day to be predicted is shown in
Figure 14.
By using corrected values for prediction, the impact of accumulated errors is minimized as much as possible. The corrected average absolute percentage error decreased from 11.11% to 5.01%. The results show that the error correction method of backpropagation based on the training and testing set errors can effectively improve the prediction accuracy and performance of the model in recursive prediction. It performs well in traditional recursive prediction methods, with poor performance in large fluctuation periods and large step sizes.
The process of gradually correcting the prediction results can make the model more versatile, better adapt to the changes and complexity of the data, and reduce the errors caused by the model.
4. Conclusions
This paper proposes an electricity spot price prediction model based on FC-SSA-LSTM and an error correction method based on backpropagation of training and testing set errors.
Firstly, through feature construction, this model incorporates essential information on the formation and fluctuation of electricity prices, considering not only traditional market factors but also the impact of renewable energy integration. The inclusion of renewable energy sources such as wind and solar power introduces volatility and nonlinearity into the price formation process, making accurate prediction more challenging. By providing a more comprehensive set of input features, the model captures these complexities, ensuring more robust predictions.
Secondly, using singular spectrum analysis (SSA) as part of data preprocessing, we decomposed the original time series into multiple components with different frequencies and amplitudes, improving the sensitivity of the model to the intrinsic structure of the data. The SSA algorithm can perform data dimensionality reduction and denoising, and extract effective feature information. This step enhances the model’s sensitivity to the intrinsic structure of the data, including seasonal variations and volatility associated with renewable energy generation. The experimental results show that the feature construction–SSA-LSTM proposed in this paper has higher prediction accuracy compared to traditional LSTM, and the feature construction further reduces prediction errors, verifying the effectiveness and accuracy of the model.
In addition, the results in recursive prediction are gradually corrected by statistically analyzing the errors in the training and testing sets. The experimental results show that this error correction method can effectively reduce the impact of error accumulation in recursive prediction and improve the predictive performance of the model.
In the context of the continuous promotion of electricity marketization reform, the ability to predict electricity prices with higher accuracy is essential. The proposed model provides a valuable tool for energy market participants, enabling them to adapt to market fluctuations driven by the integration of sustainable energy sources. It not only addresses traditional market dynamics but also incorporates the emerging complexities of sustainable energy transitions, ensuring more accurate and actionable electricity price prediction, and guides the strategy of quoting in the spot market on the power generation side, as well as planning electricity consumption plans on the power consumption side. At the same time, error correction can be used to improve the prediction accuracy of future data. Error analysis of the training and testing sets can guide market participants in making decisions under different risk preferences.
However, the proposed model is complex and requires a large amount of data for training and statistical analysis of errors for the test and training sets. Future work will focus on lightweighting the model and improving versatility.