**1. Introduction**

The daily generating energy of a photovoltaic power station affects the power consumption of the local area [1–3], while the photovoltaic power generation has a relationship with environmental factors, such as sunshine duration, temperature, etc. Thus, the prediction of the generating energy helps the local power grid system to improve foreseeability and to create a proper generating schedule [4–7]. Since the main facility of a photovoltaic power station works outdoors, the environmental factors would affect the device's working state, making it meaningful to study this effect. For example, the characteristics of temperature changes on the quality of output current in solar power plants are studied in Indonesia [8]. In the global viewpoint, temperature and sunshine duration vary in different countries around the world, which makes the characteristics of solar plants generation different. It is a research focus to predict the generation based on environmental variation.

Generally, prediction is essentially a regression problem, the purpose of which is to build the relationship between environmental factors and generating energy. Hence, the machine learning-based methods have been widely used to achieve power generation prediction, such as outage forecasting, wind power prediction, stability forecasting, peak load prediction, etc.

The machine learning algorithm can treat big data efficiently [9], which can obtain the optimal parameters for PGPMs based on a lot of historical data, as well as make a prediction to generating energy through a trained model. Recently, the PGPMs based on

**Citation:** He, B.; Ma, R.; Zhang, W.; Zhu, J.; Zhang, X. An Improved Generating Energy Prediction Method Based on Bi-LSTM and Attention Mechanism. *Electronics* **2022**, *11*, 1885. https://doi.org/ 10.3390/electronics11121885

Academic Editors: Luis Hernández-Callejo and Javid Taheri

Received: 18 April 2022 Accepted: 13 June 2022 Published: 15 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

machine learning have been proposed for different types of power stations, such as wind power, thermal power, solar power, nuclear power, etc. Moreover, in order to achieve accurate prediction of daily generating energy of power stations, the input data set of existing PGPMs based on machine learning algorithm usually adopt all the environmental parameters that affect the power generation, which makes the computational complexity of such PGPM very high.

A PGPM based on support vector machine (SVM), one of the most commonly used algorithms in machine learning, was proposed in ref. [10], which applied an improved grid search method to optimize the parameters of C and g to improve the accuracy in forecasting wind power generation. The experimental results showed that the model was able to predict the real-time (15 min) wind power, and the accuracy was up to 78.49%. However, since the computational complexity is very high in scenarios with larger training samples, the SVM-based prediction model is only suitable for small-sample scenarios that can obtain the global optimization parameters.

In order to solve the limitations of the SVM-based PGPM in the large-sample condition, a lightweight PGPM based on ensemble decision tree haswas proposed in ref. [11], which can predict a power system's operating states in a real-time and in an on-line environment. In the proposed solution, an ensemble security predictor (ENSP) was developed and trained to predict and classify power system's dynamic operating states into secure, insecure, and intermediate transitional classes. Finally, the performance was evaluated with two different case studies performed on IEEE 118-bus and IEEE 300-bus test systems, and the experimental results showed that the prediction accuracy was up to 94.4%. However, in some circumstances, for the ensemble decision tree model, it is a challenge to find appropriate pruning schemes to remedy the decision tree due to the overfitting problem, which means the proposed model is only optimized for the existing data, namely, the proposed model is not quite suitable for unknown, new data.

Moreover, to improve the performance of the decision tree-based power generation prediction model, the random forest-based PGPM [12] is developed to forecast medium– long-term power load. In the proposed model, the total load is decomposed into the basic load affected by the economy and meteorological sensitive load affected by meteorological factors, and the prediction results are intelligently corrected by the wavelet neural network algorithm. The experimental results showed that the mean absolute percent error (MAPE) of the random forest-based PGPM was up to 1.43%, which is much better than decision treebased model proposed in ref. [11]. However, the random forest-based model is equivalent to running multiple decision trees at the same time, which will inevitably have higher computational complexity than decision trees.

Apart from the above-mentioned statistical learning methods, the artificial neural network (ANN), which can simulate the human brain, has been widely used in the power generation prediction field in the recent years [13]. To improve the power production prediction for solar power stations, a PGPM based on the optimized and diversified artificial Neural Networks was proposed in ref. [14]. The method is optimized in terms of the number of hidden neurons and improved in terms of diverse training datasets used to build ANN. The simulation results showed that the proposed approach outperformed three benchmark models, with a performance gain reaching up to 11% for RMSE (rootmean-square error) metric, and the confidence level reaches up to 84%. However, such methods employ classical neural networks, which may not be suitable for some timevarying sequence data of environmental factors.

Generally, for time-varying sequence data, the model based on recurrent neural network (RNN) can provide higher prediction accuracy [15]. The Long Short-Term Memory (LSTM) [16], an improved RNN, could solve the problems of gradient disappearance and gradient explosion when training long sequence data in RNN, making it superior in time sequence forecasting problems [17]. The LSTM network has a strong memory function, which can establish the correlation between the data before and after, thereby improving the prediction accuracy. Based on the above advantages of LSTM, a PGPM based on the

high-performance K-Means-long-short-term-memory (K-Means-LSTM) was proposed to predict the power point of wind power in ref. [18], and the simulation results showed that the prediction error (RMSE) of the proposed PGPM reached 62 kW, achieving higher accuracy than RNN-based methods.

However, the LSTM-based PGPM can only capture the data features of the former part of the time sequence, which in turn leads to very limited performance of such methods in some scenarios. As an improved version of LSTM, the Bidirectional LSTM (Bi-LSTM) has better performance via adding a reverse-calculation module. Hence, a Bi-LSTM-based PGPM, which is used to predict the abnormal electricity consumption in power grids, was proposed in [19]. In the Bi-LSTM-based PGPM, the framework of Tensorflow was used to achieve feature extraction and power generation prediction. Final experimental results showed that the accuracy of the Bi-LSTM-based PGPM reached up to 96.1%, which is better than that of the LSTM-based PGPM proposed in ref. [18] (94.5%).

Generally, the Bi-LSTM model can enhance the mining of correlation information of time series feature to some extent; however, it can only extract local features, and it is difficult to obtain global correlation, resulting in the loss of feature correlation information. Simultaneously, such a model only focuses on the inherent relationship between the input features and the target feature, so the input features of each time are assigned the same weight. Nevertheless, the correlation between the input and target characteristics of electricity consumption varies with time, which puts forward higher requirements for the mining of time series correlation of input features.

Hence, in order to improve the performance of PGPMs based on Bi-LSTM, an Attention-Bi-LSTM PGPM based on attention mechanism and Bi-LSTM is proposed in this paper, which adequately employs the advantages of the attention mechanism and Bi-LSTM network. The main contribution of this paper is the way in which the attention mechanism is introduced. To solve this, appropriate attention layers have to be selected and designed to efficiently utilize historical data.

Moreover, existing machine learning-based PGPMs usually use all environmental parameters that affect power generation as input data sets, which can inevitably increase the computational burden of computers. In order to improve computational efficiency, the feature selection algorithm based on Pearson correlation theory [20] is proposed before constructing the proposed PGPM.

The remaining of this paper is organized as follows. Section 2 details the principle of environmental factors selection method based on Pearson coefficient theory. Section 3 presents the methodology of the prediction method. Section 4 elaborates data processing procedures. Section 5 shows experimental layout and relative results. Section 6 concludes the paper and looks forward to future work.
