1. Introduction
As a clean and efficient secondary energy, electricity will play a more important role in serving people’s energy demands and building a clean, low-carbon, safe, and efficient energy system. Electric-load forecasting refers to forecasting future electricity demand and load trends [
1]. The demand for efficient and accurate forecasting of electric load is more urgent, so as to optimize the planning and scheduling of power-system operation, to ultimately achieve the improvement of the economy and the society, and to complete the stable transformation of the power industry.
In terms of load forecasting, there are short-term forecasting horizons, mid-term forecasting horizons, and long-term forecasting horizons [
2]. Short-term forecasting usually refers to forecasting horizons ranging from a few hours to a few weeks, and it can coordinate power generation and develop a reasonable scheduling plan. Mid-term load forecasting usually refers to forecasting horizons ranging from a few months to a few years; it can provide support for ensuring the consumption of enterprise-production electricity and residential electricity, and for the reasonable operation and maintenance decisions of the power system. Long-term load forecasting usually refers to forecasting horizons of 3 years or more, and it can serve for the planning of the power industry.
Nonlinear and temporal characteristics are two major characteristics of electric loads [
3]. There are two major categories of electric-load forecasting algorithms: traditional algorithms and artificial-intelligence algorithms [
4]. Traditional algorithms are represented by time-series algorithms, such as Fourier expansion and multiple linear return [
5]. These algorithms have the advantages of fully considering the temporal nature of electric-load data. However, their data-regression ability is weak, and they require good stationarity of the time-series data [
6]. Therefore, they cannot accurately forecast data with nonlinear relationships. However, with the increasing complexity of electric-load forecasting, the statistical methods cannot effectively predict nonlinear load data, resulting in significant prediction errors. Meanwhile, statistical methods are extremely sensitive to changes in abnormal load values and cannot effectively predict sudden changes and peak loads. The new artificial-intelligence algorithm can better fit nonlinear data. According to some of the relevant literature [
7,
8], the back propagation (BP) neural network has been commonly used for load forecasting, but the learning ability of the BP neural network is relatively poor and the forecasting accuracy needs to be improved. Other examples from the literature [
9,
10] have used the fuzzy-inference algorithm, but its calculation speed is too slow and its accuracy is low. Other researchers [
11,
12] used the support vector regression (SVR) algorithm to load forecasting. Still other researchers [
13] used decision trees for forecasting. However, these artificial-intelligence algorithms do not take into account the temporal nature of electric loads, and the manual addition of time features is required in the forecasting to ensure the accuracy to a certain extent [
14].
The development of the artificial neural network (ANN) has led its various models and variants being widely applied in the field of load forecasting, with the most representative being the back propagation (BP) neural network [
15]. Some researchers [
16,
17] addressed the problem of the traditional BP algorithm easily falling into local minima. They optimized the network performance, improved the forecasting accuracy of gradient descent, and improved the connection weights of the neural networks. The researchers in [
18] carried out point forecasting and interval forecasting for electric-consumption data, and their point forecasting and interval forecasting algorithms, which were constructed by wavelet transform and improved by particle swarm optimization, were both better than traditional BP.
The model has improved in accuracy. The recurrent neural network (RNN) effectively overcomes the drawback of the ANN’s inability to forecast data based on temporal dependencies by combining the temporal nature of data with network design [
19,
20]. However, when dealing with nonlinear data with long time spans, the RNN also faces the problem of gradient vanishing and exploding. Hochreiter and Schmidhuber [
21] proposed the long short-term memory (LSTM) neural network to improve it, which effectively solved the problem of long-term temporal dependency among data. The researchers in [
22,
23,
24,
25] adopted a deep-learning framework, which was a double-layer LSTM neural network, combining the output layer of LSTM with the full connection layer, combining support vector regression (SVR) with LSTM to build a mixture model, and making different improvements on the construction of the LSTM model to obtain more accurate forecasting results. The researchers in [
26] improved LSTM input data by fusing multi-scale feature vectors through the convolutional neural network (CNN). The researchers in [
27,
28] used particle swarm optimization (PSO) to optimize LSTM network parameters. The results showed significant improvements in artificially setting network parameters and improved the forecasting accuracy of network models compared to previous LSTM algorithms [
29]. The long short-term memory (LSTM) network takes into account the temporal and nonlinear nature of data, with high forecasting accuracy, so it is widely used in electric-load forecasting [
30]. However, the model parameters of the LSTM and other neural networks are difficult to determine and often rely on human experience for selection. The fitting ability and the prediction performance of different model parameters vary greatly. The global optimization ability and convergence speed are low, making the LSTM prone to the risk of particle local optima.
This research processes the 15 min electric-load data of the regional power grid to obtain the daily maximum (minimum) at peak-valley time. The temporal features and weather features are extracted, and the correlation test is carried out to screen out the feature sets whose correlation exceeds the threshold. The numerical forecasting model is established, and the time-segment classification model is further established and the parameters are adjusted. The load and weather characteristics of the previous two years are forecast, the forecasting results of the maximum (minimum) daily load and the arrival times in the next three months are provided, and the forecasting accuracy is analyzed.
The category variables with no order relation are coded independently to avoid the partial ordering of the variable values with no partial-order relation and to expand the characteristics. When screening features, the Pearson correlation coefficient is used to quantify the linear correlation, and the random forest is used to calculate the nonlinear correlation. It selected the indicators that reach the threshold to form the final feature set. The feature engineer not only considers fully the mining feature information, but also avoids dimensional disasters and multicollinearity problems through double screening.
In the medium-term load forecasting of daily peak and valley, this research applies the deep-learning model with long short-term memory (LSTM) and the relevant information of the sparrow search optimization algorithm (SSA) to accelerate the model convergence. The SSA was proposed in 2020; it mainly simulated the foraging and anti-predation behavior of the sparrow population [
31]. This intelligent optimization algorithm is very novel, has strong optimization ability, and can greatly improve the efficiency of the forecasting model [
32]. The SSA algorithm can research the global optimal solutions of load-forecasting results and can effectively prevent the situation in which the best expectation value found by the algorithm is always a local extreme value [
33].
This paper proposes the LSTM-SSA-RF algorithm for the first time, which is applied to middle-term load forecasting. The innovation points of this paper have two aspects. First, the traditional algorithms with regression and neural networks have not had good results on middle-term load forecasting; the LSTM-SSA-RF algorithm has greater accuracy. The novel time-series forecasting algorithm and the novel intelligent optimization algorithm have been used in short-term load forecasting and not usually in middle-term load forecasting. Second, this paper adopts a new feature selection process with nonlinear correlation analysis.
This paper is organized as follows.
Section 2 describes the considered methods and the algorithm framework.
Section 3 provides the data processing, the characteristic engineering, and the forecasting results, which are reported and compared.
Section 4 discusses the forecasting results.
Section 5 draws some conclusions.
3. Analysis and Results
3.1. Data Preprocessing
3.1.1. Handling Outlier Data
The existing methods for handling electric load data can generally be divided into three categories: statistical model methods, clustering model methods, and classification model methods. Statistical models describe the patterns and distributions of outlier data, compare similarities, and use outlier data indicators or criteria to construct one or more combination probability models. Clustering models can obtain classification results based on differences in load characteristics, effectively reflecting the overall characteristics of the load curve and detecting anomalies. Classification models often require a large amount of labeled information, and the actual application of abnormal samples is much smaller than that of normal samples, which can also lead to the problem of imbalanced sample distribution.
This research used the regional 15 min load data, industrial daily load data, and meteorological data. First, the standard deviation algorithm (k sigma) was used to test the outliers of the regional 15 min load data and the industrial daily load data and to set them as blank. Under the assumption of normal distribution (large samples can be regarded as normal distribution, approximately), the k sigma principle indicates that the probability of values outside the k times standard deviation of the average value is very small. Then, it checks the abnormal value of the 15 min load data of the area, selects k = 3, and sets 332 abnormal records.
3.1.2. Filling in Missing Values
Even though some of the technologies used in this research contained the existence of missing values, considering that different models have different means to deal with missing values, in order to ensure the accuracy of the solution, it was still necessary to fill in the missing values. In
Figure 4, the data set was searched and it was found that only the regional 15 min load data and the industry daily load data had missing values (including the abnormal values that had been set to null). For this kind of numerical data, this research applied the linear interpolation algorithm to retain the local linear trend.
3.1.3. Data Processing by Type
The exploration data found that the “weather” characteristic format of meteorological data in the basic data was “weather 1/weather 2”, which could not be directly applied to numerical analysis. It was easy to lead to “dimension disaster” when using unique hot coding for expansion. A custom weather dictionary was selected, and each weather corresponded to the amount of illumination and precipitation, which was convenient for model training. Similarly, there were as many as 39 features of “wind and direction”, which were not directly coded. The wind characteristics were reserved.
Seasonal features were added to basic data. Because there was no order or difference between the values of a “season”, this research needed to use dummy variables and used the unique heat coding to expand the feature.
3.2. Characteristic Engineering
3.2.1. Creating an Alternative Feature Set
First, it was necessary to establish sufficient alternative feature sets. According to a large number of existing research bases, the main influencing factors of medium-term and short-term load are time-sequence factors, meteorological factors, and random interference factors, which consider the forward dependence of load and the historical load data. Due to the unforecastable nature of random interference factors, this research did not separately select random interference factors as characteristics; this is explained in the section on mutation-point analysis and policy-effect evaluation.
- (1)
Time series factors included the month, the day ordinal of a month, the hour, the day ordinal of a year, the week ordinal of a year, the working day, holidays, the time period ordinal (divided by 3 h), the season ordinal (converted into a unique code), the beginning of a month, the end of a month, etc.
- (2)
Meteorological factors included the maximum temperature, the minimum temperature, the temperature difference, the wind force, illumination, precipitation, etc.
- (3)
Trend factors (historical load data) included the maximum (minimum) load of the previous day and the maximum (minimum) load of the previous week.
In considering the continuity of monthly data and hourly data, direct coding is not suitable. For example, 23 points and 0 points are similar in a practical sense, but direct coding can easily lead to a distance of 23 h. In order to avoid the discrepancy between the coding meaning and the actual meaning, in the regression analysis, this research cosined the monthly data (Formula (1)) and the hourly data (Formula (2)).
3.2.2. Feature Selection
Feature selection (FS) is an important problem in feature engineering. By eliminating redundant features and searching for the optimal feature subset, the efficiency of model solving is ultimately improved. This research mainly used the filter algorithm to filter the characteristics according to the correlation indicators in various statistical tests.
- (1)
Linear correlation analysis: Pearson correlation coefficient
Pearson correlation coefficient is a typical indicator to measure the linear relationship between two variables. The calculation is relatively simple. The larger the coefficient, the stronger the linear correlation.
- (2)
Nonlinear correlation analysis: random forest
Both the maximum mutual trust coefficient (MIC) and the Gini coefficient can be used to calculate nonlinear correlation. In this research, the feature selection algorithm based on random forest was used to select features by calculating the average reduction of the impurity of each feature. For classification, information gain was used. For regression, variance was used.
The nonlinear correlation coefficient of the regression model was calculated by random forest, and double screening was carried out.
The results of feature screening in
Figure 5 were based on the following:
- (1)
Peak value forecasting: The difference between the charge peaks and valleys of the previous day, the maximum temperature, the minimum temperature, the season, the light, the precipitation, the weekend/workday/holiday time point, the peak value of the previous day, the peak value of the previous 2 days, the peak value of the previous 5 days, the peak value of the previous 6 days, the peak value of the previous 7 days, and the peak value of the previous 30 days.
- (2)
Valley value forecasting: The difference between the charge peaks and valleys of the previous day, the maximum temperature, the minimum temperature, the season, the valley value of the previous day, the valley value of the previous 2 days, the valley value of the previous 3 days, the valley value of the previous 7 days, and the valley value of the previous 30 days.
- (3)
Forecasting of peak time: The forecasted peak, the maximum temperature, the minimum temperature, the light, the precipitation, whether it was a holiday, the season, the time point of the first day, the time point of the first 2 days, the time point of the first 5 days, the time point of the first 6 days, the time point of the first 7 days, and the time point of the first 30 days.
- (4)
Forecasting of valley time: The time point of the first day, the time point of the first 2 days, the time point of the first 5 days, the time point of the first 6 days, the time point of the first 7 days, and the time point of the first 30 days.
3.3. Data Process Results of the SSA-LSTM-RF Algorithm
This research selected the number of neurons in the hidden layer of LSTM and included the first hidden nodes L1, the second hidden nodes L2, the iterations number iter, and the learning rate lr. These four key parameters that affected the performance of the LSTM model were taken as the optimization objects of the SSA.
The SSA-LSTM-RF optimization process in
Figure 6 was as follows:
- (1)
The number of neurons in the hidden layer of LSTM, included first hidden nodes L1, the second hidden nodes L2, the iterations number iter, and the learning rate lr were taken as the optimization objects, and constructed the parameter optimization range.
- (2)
The individual fitness of sparrows was determined, and MSE was regarded as the fitness-evaluation function, which was the fitness curve value of the SSA algorithm.
- (3)
The position information of sparrow individuals was calculated and updated to obtain fitness values. If the result was the global optimal fitness, then the optimal fitness in the current sparrow population of the individual position was saved; if not, update sparrow position was updated.
- (4)
It was determined whether the number of iterations reached the upper limit. If so, the optimization process was exited and the returned optimal solution was saved. Otherwise, loop 3 was continued.
- (5)
The optimized L1, L2, iter, and lr were substituted into the LSTM model with a random number.
- (6)
The optimized model was used for forecasting.
As shown in
Table 1, through 175 iterations, the learning rate reached 0.0060 and the hyperparameters received the optimization stations.
As shown in
Figure 7, the optimization results of the SSA-LSTM-RF algorithm, shown in
Figure 7a, were significantly better than those of the PSO-LSTM-RF algorithm, shown in
Figure 7b, with higher convergence accuracy and relatively fewer iterations. The results also provided sufficient persuasiveness for the forecasting model established in this article to be used for medium- and short-term power-load forecasting.
3.4. Forecast Results of Minimum and Maximum Daily Loads
This research used the SSA-RF-LSTM model to evaluate the forecast results of the minimum daily load. The collected data of certain areas every 15 min were experimented with, using the SSA-RF-LSTM model.
This research forecast the daily peak and peak-valley times from the electric history-load data. This research carried out scenario forecasting based on the load and weather characteristics of the previous two years, provided the forecasted results of the minimum and maximum daily loads and arrival times for the next 300 days, and analyzed the forecasting accuracy.
The experiment adapted the Python language with the sklearn packages to forecast results and with the Matplotlib packages to visualize the results. The experimental results are shown in
Figure 8 and
Figure 9.
In
Figure 8, the forecast results of the minimum daily load are approximately the true minimum daily load in the 300 days. The forecast results of the maximum daily load were approximately the true maximum daily load in the 300 days. Relatively speaking, the maximum daily load was more approximate than the true minimum daily load with the true daily load.
In
Figure 9 is a perfect loss curve graph. At the beginning of the training, the loss value decreased significantly, indicating that the learning rate was appropriate, and the gradient decline process was carried out. After learning to a certain stage, the loss curve tended to be stable, and the loss change was not as obvious as it was at the beginning. At the beginning of training, the loss value decreased rapidly, proving that the learning rate was appropriate. As the epoch increased, the learning rate gradually flattened out. The burrs in the curve were due to the relationship between batch sizes. The larger the batch size setting, the smaller the burrs.
3.5. Forecast Results of Daily Load with Different Algorithms and Steps
The LSTM network achieved the best performance by optimizing the super parameters through the SSA and RF. The MAPE, RMSE, and MSE of the forecasted values in the last two months of 2019 were accuracy and stability, respectively, which verified the accuracy and stability of the linear regression fitting ability of the model. See the annex for the results of scenario forecasting in
Table 2.
The collected data were experimented with these five models: LSTM, RF-BP, RF-LSTM, RF-PSO-LSTM, and RF-SSA-LSTM. The error comparison of the different forecasting models with 60 days is shown in
Figure 10 and
Table 3.
From the results of the experiment, the smaller the result of IMAE, the better the forecasting effect; the smaller the result of IRMSE, the better the forecasting effect; and the greater the result of IR2, the better the forecasting effect. From the error comparison of the forecasting models, the effect of the RF-SSA-LSTM model was better than those of the LSTM, RF-BP, RF-LSTM, RF-PSO-LSTM, and RF-SSA-LSTM models.
The collected data were experimented with every 60 days, at 60 days, 120 days, 180 days, 240 days, and 300 days. The error comparison of the different forecasting times is shown in
Figure 11 and
Table 4.
From the results of experiment, the smaller the result of IMAE, the better the forecasting effect; the smaller the result of IRMSE, the better forecasting effect; and the greater the result of IR2, the better forecasting effect. From the error comparison of the different forecasting times, the best forecasting result was 60 days forecasting; as time goes by, it becomes lower and lower.
From the experimental results shown in
Figure 12 and
Table 5—the
IMAE,
IRMSE, and
IR2 comparisons of different forecasting models—the effect of the RF-SSA-LSTM model was better than those of the RF-LSTM and RF-PSO-LSTM models, which were experimented with every 60 days, 120 days, 180 days, 240 days, and 300 days. From the error comparison of the different forecasting times, the best forecasting result was 60 days forecasting; as time goes by, the effect becomes worse and worse.
3.6. Forecast Evaluation of Minimum and Maximum Daily Load in the Future
This research used the historical load data in 2018 to forecast the future 90 days minimum and maximum daily loads. The future 90 day minimum and maximum daily loads are shown in
Figure 13.
This research used the historical load data in 2019 to forecast the future 90 day minimum and maximum daily loads. The future 90 day minimum and maximum daily loads are shown in
Figure 14.
3.7. Forecast Evaluation of Minimum and Maximum Daily Load in Great Industry
Generally speaking, the classification of industries for electric-load forecasting by electricity-consumption characteristics includes great industries, non-general industries, general industries, and commerce. The trends of electric-load forecasting in different industries are different. This research proposes specific measures to forecast the evaluation of the future 90 days minimum and maximum daily load in a large industry.
In
Figure 15 and
Figure 16, the forecast results show the real value, the forecast result, the 15% upper limit, and the 15% lower limit. As shown in the forecast results, the holiday period had an important effect on the forecast results. During the holiday period in October, the forecast curve deviated from the original curve.
The forecast results are approximately the real value of minimum and maximum daily load in a great-industry load forecast in the future 90 days. The forecast results prove the effectiveness of the forecasting.
4. Discussion
This research first used Pearson correlation coefficient and random forest model to select features; Then, this research proposed the RF-SSA-LSTM daily peak-valley forecasting model. The model took the target value, the climate characteristics, the time series characteristics, and the historical trend characteristics as input to the LSTM network to obtain the daily-load peak and valley values. The super parameters of the LSTM network were optimized by the SSA algorithm and the global optimal solution was obtained.
This research provides a daily peak-valley electric-load forecasting based on RF, the SSA, and LSTM. This research shows that the forecasting outcomes of the RF-SSA-LSTM algorithm provide a considerable improvement, as shown in
Figure 10 and
Figure 11 and
Table 2,
Table 3 and
Table 4. Additionally, the accuracy of daily peak-valley electric-load forecasting were compared with the fitness curves between RF-SSA-LSTM and RF-PSO-LSTM, as shown in
Figure 8, and the fitness curves of LSTM, RF-BP, RF-LSTM, RF-PSO-LSTM, and RF-SSA-LSTM were compared. The forecast evaluation of the future 90 days with minimum and maximum daily loads was determined. In this research, the RF-SSA-LSTM algorithm was used to update the displacement forecasting model. It was demonstrated that the RF-SSA-LSTM algorithm has higher accuracy and greater stability than the other algorithms, such as RF-PSO-LSTM. The SSA algorithm can research the global optimal solutions better than the PSO algorithm. The RF-SSA-LSTM algorithm has greater accuracy than the other algorithms.
As the forecasting time step increases, the deviation between the electric-load forecasting results and the real values becomes more and more obvious, and the overall forecasting effect becomes worse. The forecasting results of 300 days were far inferior to those of 60 days, 120 days, 180 days, and 240 days. The longer the forecasting time, the poorer the forecasting of MAE, RMSE, and R2. However, the forecasting accuracy of this algorithm can also be further improved—for example, by using more precise data collection techniques to improve the accuracy of model forecasting.
Finally, the forecasting peak and valley values were also input into the random forest, as features to obtain the output of peak-valley time. The MAPE value of the SSA-LSTM-RF forecasting model was 1.5%, and the fitting ability was also good.
5. Conclusions
In summary, this research optimizes the LSTM displacement forecasting model using the SSA and RF algorithms to establish a preliminary displacement forecasting model for electric-load forecasting. The conclusions are as follows:
The environmental variables in electric-load forecasting are complex and nonlinear. This paper used the RF algorithm to weigh the environmental characteristic variables that affect electric-load forecasting. This research, analyzing and selecting feature variables with higher weights, reduced the computational power of the forecasting model, which was beneficial for improving the accuracy of the forecasting model. We searched for the optimal parameters of the LSTM model using the SSA search algorithm. Compared with commonly used grid search methods and PSO algorithms, the sparrow search algorithm has a simple structure and a high convergence rate, with both the global optimization ability of grid search and the local search ability of pattern search, forming complementary advantages. Through experimental comparison, it can be seen that the electric-load forecasting model based on RF-SSA-LSTM proposed in this article has higher forecasting accuracy and provides ideal performance for electric-load forecasting with different time steps.
With the development of deep-learning methods, this research should replace the RF method with deep-learning methods, such as XGBOOST and TCN. In addition to short-term and medium-term electric-load forecasting in a region, the medium-term and long-term electric-load forecasting focus on industry is extremely important for power-system planning and operation. The above research points should be the basis for future research recommendations.