1. Introduction
As a world-recognized renewable and clean energy, the rational development and utilization of small hydropower are of great significance for China to achieve the goal of “peak carbon and carbon neutrality”. According to the definition of relevant authorities in China, hydropower stations with an installed capacity of less than 50 MW are regarded as small ones [
1]. By the end of 2019, China had built 45,445 small hydropower stations, with an installed capacity of 81,442 MW (equivalent to the total installed capacity of 3.5 Three Gorges hydropower stations), accounting for 22.9% of China’s installed hydropower capacity and 4.1% of China’s total installed power capacity. In 2019, China’s small hydropower generation capacity reached 253 million MWh, accounting for 19.5% of the total hydropower generation capacity and 3.5% of the total power generation capacity [
2]. According to the latest coal consumption standard of 308 g/kWh, the annual power generation of small hydropower stations in 2019 was equivalent to saving approximately 78 million tons of standard coal, reducing carbon dioxide emissions by approximately 195 million tons, and sulfur dioxide emissions by approximately 1.01 million tons. Small hydropower stations in China are mainly distributed in remote mountainous areas, and they are an important part of the local power supply. Small hydropower stations are characterized by decentralized development, local network formation, and local power supply, as well as advantages such as low construction cost, short construction period, and fast return; they are natural supplements to the main power grid and have irreplaceable advantages [
3].
The large investment in small hydropower has made a great contribution to solving China’s clean energy problems; however, most of the small hydropower stations are run-of-the-river stations, with a wide range of points and no adjustment capabilities. In practice, such stations are in a state of “power generation with water, but shutdown without water” in the original “blind” state. Their operation and output state are completely dominated by rainfall, with frequent fluctuations and uncertainties. With heavy rain, the power generation of small hydropower stations increases sharply, leading to a large and disorderly influx of small hydropower, causing a great impact on the main network, greatly destroying the power balance of the main network, and endangering the safety of the power system [
4]. To realize optimal dispatching and resource allocation and ensure the safe, stable, and economic operation of the power system, it is necessary to make a reasonable small hydropower generation plan. Therefore, it is of great significance to accurately predict the generating capacity of small hydropower stations and provide a reference for the coordinated dispatch of multiple power sources in the power-dispatching department. Different from the well-developed forecasting models and methods of large and medium-sized hydropower stations, small hydropower generation capacity prediction research started late and faced a shortage of data (especially meteorological data and runoff data most relevant to small hydropower generation), large numbers, and poor model versatility. It is often difficult to learn from the mature hydrology and power generation forecasting methods of large- and medium-sized hydropower stations.
In view of the above problems, a few scholars have carried out relevant research. Paper [
5] took the entire area of small hydropower as the object and assimilated snow observations to improve the forecasting results of power generation capacity of small hydropower in snow-rich areas. Paper [
6] decomposed the power generation load of small hydropower into meteorological load components and long-term trend load components and predicted the power generation load of small hydropower by establishing a regression relationship between meteorological factors and meteorological load and a prediction model of long-term trend load characteristics. Paper [
7] introduced the concept of regional synchronization characteristics of small hydropower generation for the modeling of small hydropower output based on the differences in climatic conditions in different regions. Paper [
8] analyzed the influencing factors and characteristics of small hydropower generation and found that small hydropower output has a strong correlation with long-term precipitation, but the correlation with temperature and wind is very weak. When there is plenty of rainfall in summer, the output of small hydropower fluctuates greatly, and the change has a lag effect. Paper [
9] inputted different combinations of rainfall and power generation for
t days before the forecast day into the echo state network (ESN), obtained the prediction results of each combination, and used the comparison method to obtain the optimal result of the model. Papers [
10,
11] found that there are similar hydrological and meteorological conditions between large and small hydropower stations in the same basin. Therefore, the power generation of small hydropower stations can be predicted by using the predicted inflow of large- and medium-sized hydropower stations.
In summary, the key to improving the forecast accuracy of the power generation capacity of small hydropower stations lies in obtaining more abundant precipitation and historical operating data and adopting appropriate methods to fully extract the potential characteristic information. Considering the scattered distribution of small hydropower stations, the wide area of rain collection, and the extremely uneven spatial distribution of precipitation, it is necessary to consider the difference in the temporal and spatial distribution of precipitation in forecasting the short-term power generation of small hydropower stations. However, no scholars have conducted relevant research on the abovementioned problems. Therefore, with the help of the precipitation distribution field obtained by meteorological satellite remote sensing observations and the partial mutual information method, the difference in the temporal and spatial distribution of precipitation is included in a forecast of the short-term power generation of small hydropower stations for the first time in this paper. First, the spatial distribution of precipitation observed by satellite remote sensing and the actual precipitation observed at ground meteorological stations are used to generate the precipitation grid covering the region of the group of small hydropower stations. Then, the partial mutual information method is used to select the precipitation time scale that has the most significant impact on changes in short-term power generation. Finally, combined with the recent major trends in historical power generation data, a model is built to forecast the short-term power generation of the group of small hydropower stations.
To fully explore the characteristic information contained in the temporal and spatial distribution of precipitation data and the general trend of recent and historical power generation, this paper proposes a multimodal deep learning network based on a convolutional neural network and multilayer perceptron (CM-MDLN). The convolutional neural network (CNN) is good at processing data in the form of multiple arrays and can deeply express the effective spatial feature information contained in the grid data of precipitation. Multilayer perceptron (MLP) has a simple structure and strong adaptive ability. It can theoretically approximate any non-linear function and can learn representative trend characteristics of recent changes from historical power generation data. In this paper, the effectiveness of the proposed method is verified by using real data from a group of small hydropower stations in southern China for approximately 3 years, and the results are compared with the results of other forecasting models.
3. Data Preprocessing
3.1. Data Description
We use the daily power generation data and precipitation data of small hydropower stations in Hechi city (HC) and Guilin city (GL) in southern China for approximately 3 years to verify the effectiveness of the proposed CM-MDLN method based on a CNN and an MLP. HC and GL have a large rain collection area, many large and small rivers, and the average annual rainfall is between 1200 and 1600 millimeters. It is a typical region with abundant hydropower resources.
3.2. Grid Division of the Spatial Distribution of Precipitation
The QPE data used in this paper are from the Global Precipitation Measurement (GPM) program satellite constellation. The GPM satellite was launched by NASA and JAXA in 2014, aiming to measure rainfall and snowfall on Earth with advanced radar and radiometers carried by the satellite [
23]. The GPM core platform carries the first satellite-borne Ku/Ka-band dual-frequency precipitation radar (DPR) and a multi-channel GPM microwave imager (GMI). The minimum threshold value of precipitation measurement is 0.5 mm/h, which can achieve a good measurement of precipitation.
In HC City and GL City, there are 8 and 12 ground meteorological stations, respectively, and their location distribution is shown by the orange dots on the left side of
Figure 3 and
Figure 4, respectively. Compared with the rain-collecting surface of tens of thousands of square kilometres, the number of meteorological stations is particularly scarce. In accordance with the specifications of the 0.3° × 0.3° (latitude and longitude) grid, we generate a precipitation grid network covering the HC and GL regions, respectively, as shown in the blue squares on the right side of
Figure 3 and
Figure 4. According to the spatial distribution of precipitation observed by the GPM satellite and the actual precipitation observations from the ground meteorological stations, combined with the method mentioned in
Section 2.1, a gridding data set covering the spatial distribution of precipitation in the HC and GL area can be generated, respectively.
3.3. Calculation of the Lag Time of Daily Generating Capacity
In
Section 3.2, a precipitation grid data set covering HC City at 34 points and a precipitation grid data set covering GL City at 28 points are generated. The total daily precipitation in the two regions on the day and the previous 15 days (the sum of the daily precipitation values of 34 points) is taken as the input variable set to be selected, and the increment of electricity generation on the day and the previous day is taken as the output variable set to be selected, denoted as {
,
,
,
,…,
} and {
}, respectively, where
t-i represents the
i-th day before the date
t. Using Algorithm 1 in
Section 2.2 to calculate the correlation between the input variable set to be selected and the output variable set to be selected, the change in the
AIC value during the calculation process is shown in
Figure 5 and
Figure 6, respectively. It can be seen from the figure that the AIC value of HC city continues to decline and reaches the minimum when the 6-th variable is calculated, while the
AIC value of GL city reaches the minimum when the 4-th variable is calculated.
The above results show that the most significant impact on the fluctuation of the daily power generation of the small hydropower stations in HC is the total daily precipitation set {, , , , , }. Additionally, the most significant influence of on the daily power generation fluctuation of small hydropower station groups in GL is the total daily precipitation set {, , , }. Therefore, in the subsequent forecast of the daily power generation of the group of small hydropower stations in HC and GL, the precipitation grid data of the 6 days and 4 days before the forecast date will be used as the forecast input variable, respectively.
3.4. Filtering the Power Generation Data
Most of the small hydropower stations basically have no regulation capacity, and their power generation capacity depends on the runoff of the river. However, small hydropower stations are isolated and scattered, and it is difficult to know the relevant information of the rivers in which they are located. Therefore, it is impossible to intuitively evaluate the current level of the power generation capacity of the group of stations through the flow. We use the Gaussian weighted moving average filtering method to smooth the historical power generation data, filter out the frequent fluctuation information caused by precipitation, and obtain the general trend of recent changes in daily power generation. We also select the filtered values of the power generation on the 6 days and 4 days before the forecast date, respectively, as the assessment and prediction of the general trend of the daily power generation change of the small hydropower station group in HC city and GL city.
3.5. Vectorization of Sample Data
Before model training, the sample data should be vectorized, and the corresponding label value should be set for each group of feature vectors.
The mathematical explanation of the forecasting problem of daily power generation of a group of small hydropower stations is to obtain a mapping function
f that satisfies
. Suppose the power generation in HC to be forecasted on day
t is
; then,
. According to the analysis in
Section 3.3 and
Section 3.4, precipitation and power generation in the 6 days before the forecasted date
t should be selected as predictors, which are denoted as:
where
t −
m (
m = 1, 2, 3, 4, 5, 6) represents the
m-th day before the forecasted date
t.
represents the spatial distribution of precipitation on the previous
m-th day (its structure corresponds to the precipitation grid on the right side of
Figure 3):
In Equation (11),
(
j = 1, 2, 3..., 34) represents the precipitation value of the
j-th precipitation node (corresponding to the precipitation grid point on the right side of
Figure 3). In addition, 0 means that the point is not in the HC region, so the influence of its precipitation value is not considered. Finally, the input feature vector and label are:
Similarly, we can also obtain the vectorized sample data of GL city.
4. Calculation Results and Discussion
4.1. Evaluation Metrics
To evaluate the forecasting method more intuitively and effectively, we choose four indicators, including accuracy (
AC), mean absolute percentage error (
MAPE), root mean square error (
RMSE), and goodness of fit (
R2), as the evaluation basis [
24]. The expressions are:
where
and
are the true and forecasted daily power generation values, respectively, and
n is the total number of days used for testing the model results.
4.2. Validity Analysis Considering the Spatial Distribution of Precipitation
To verify the effectiveness of considering the spatial distribution of precipitation, the forecasting results of the proposed CM-MDLN model before and after considering the spatial distribution of precipitation are compared. We divide the data into two categories. The first category considers the spatial distribution of precipitation, that is, the grid precipitation grid data generated by combining satellite remote sensing data in
Section 3.2. The second category does not consider the spatial distribution of precipitation; that is, only the precipitation data observed by the original meteorological station is used. In this subsection, we take the data from June to August 2018 as an example for verification. This period belongs to the rainy season in HC and GL, and the spatial distribution of precipitation is highly variable. Therefore, this period, which is obviously representative, is selected as an example to verify the impact of the spatial distribution of precipitation on the forecast of power generation.
Figure 7 shows the comparison of the average daily precipitation in HC city from June to August and the forecasting results of daily power generation before and after considering the spatial distribution of precipitation in the model.
Table 2 shows the statistical results of the evaluation metrics of the forecasting results in HC city in the two cases. Parts of the peak and valley values in the curves of the forecasting results are enlarged and displayed in
Figure 8 and
Figure 9, respectively. Combining the table of statistical evaluation metrics and the magnified view of local peak and valley values, it can be seen that the fitting effect of the prediction results considering the spatial distribution of precipitation (green line, accuracy of 94.52%) is better than that of the prediction results without considering the spatial distribution of precipitation (purple line, accuracy of 88.72%) in both the trend of the curve and the peak and valley values.
Figure 10 shows the comparison of the average daily precipitation in HC city from June to August and the forecasting results of daily power generation before and after considering the spatial distribution of precipitation in the model.
Figure 11 and
Figure 12 show the enlarged display of some peaking and valley values in the curve of the forecasting results.
Table 3 shows the statistical results of evaluation metrics of the forecasting results in GL under two conditions. Combining the results in
Figure 10,
Figure 11 and
Figure 12 and
Table 3, it can also be clearly seen that the prediction results after considering the spatial distribution of precipitation are better than those without considering the spatial distribution of precipitation.
4.3. Comparison of Forecasting Models
To verify the effectiveness and universality of the proposed CM-MDLN forecasting method, we compare the forecasting results of different methods based on the same data set mentioned above in this subsection. The following six methods are compared: support vector regression (SVR), gradient boosting regression tree (GBRT), random forest (RF), long–short-term memory network (LSTM), and, separately, MLP and CNN. The above comparison methods are all based on the Keras deep learning framework in the Python development environment. The SVR, GBRT, and RF models are implemented by calling the Sklearn machine learning library. The SVR model takes the radial basis function (RBF) as the kernel function, and the penalty coefficient C is set to 10. The GBRT and RF models take default parameter values. The LSTM model takes a three-layer network, with 64, 32, and 16 neurons in each of the three layers. The separate MLP and CNN models have the same network parameter settings as the proposed model. Since several methods such as SVR, GBRT, RF, LSTM, and separate MLP cannot directly process the spatial distribution data of precipitation, it is necessary to reduce the two-dimensional spatial distribution data of precipitation to the one-dimensional vector data before using these methods.
To verify the universality of the proposed method, the HC data set is randomly divided into a training set, validation set, and testing set according to proportions of 80%, 10%, and 10%, respectively. Since the proportion of non-precipitation periods in a year is higher than that of precipitation periods, the data of the precipitation periods and non-precipitation periods contained in the randomly divided data sets are relatively evenly distributed, and the calculated results are more convincing. Among them, the training set is used for model training, the validation set is used for tuning model parameters, and the testing set is used to test the forecasting effect of the model.
The comparison of the forecasted value and the true value of each model in HC city and GL city are shown in
Figure 13 and
Figure 14 (the three straight lines in each figure represent
y = 1.1
x,
y =
x, and
y = 0.9
x from top to bottom). In the results of HC city and GL city, almost all the points of the proposed model fall between the two lines
and are closer to the diagonal line
y =
x, which indicates that the forecasted values of this model are closer to the true values and that the forecasting accuracy is higher than that of other models.
The evaluation metric statistics of the 106-day power generation forecast results in the testing set of HC city and GL city are shown in
Table 4 and
Table 5, and the comparison between the forecasted values and the true values of each model is shown in
Figure 8. It is easy to see that the
AC,
MAPE,
RMSE, and
R2 of the proposed multimodal deep learning model based on the fusion of a CNN and an MLP are significantly better than those of the SVR, GBRT, RF, LSTM, and separate MLP and CNN models. In particular, the percentage of days with absolute percentage error (
APE) less than 10% or 5% is much higher than other models. This proves that the proposed model is more effective and universal, and it is more suitable for forecasting the daily power generation of practical small hydropower stations.