3.1. Prediction of the Skill of Models
To facilitate the comparison of the differences in forecasting skill between the models,
Figure 3 depicts the distribution of each model’s forecasting skill across the country. The time correlation coefficients (TCC) of the multi-year returns and actual precipitation for the BCC, ECMWF, NCEP, and JMA models are depicted in
Figure 3a–d, respectively. The comparison of
Figure 3a–d reveals that the forecasting accuracy varied considerably between regions. BCC (
Figure 3a) had a high forecasting skill in the southwest, middle reaches of the Yangtze River, northern northeast region, and northern northwest region, whereas ECMWF (
Figure 3b) had relatively large areas of high forecasting skill, with relatively high skill from the southwest to northern China and western Inner Mongolia. In the southwest to northern China region and western Inner Mongolia, the forecast accuracy was relatively high, whereas it was low in regions south of the Yangtze River, the northeast, and parts of the northwest. In
Figure 3c, the distribution of NCEP forecasting skills was comparable to that of ECMWF, with the highest forecasting skills primarily in the southwest to Huang-huai regions, and the largest differences with ECMWF forecasting skills occurring in the Jiangnan and eastern northwest regions. ECMWF’s predictive ability in these two regions was the exact opposite of NCEP’s.
Figure 3d demonstrates that the JMA had a high prediction skill in the region between the two river basins, but a low prediction skill in the region south of the Yangtze River and in the northeast.
In
Figure 4a–f, the characteristics of the TCC distributions for each of the four models are presented with respect to one another. Comparing the TCC between each model allows us to determine whether they are highly correlated. According to
Figure 4a, the TCC between BCC and ECMWF was higher in North China, the middle and lower Yangtze River, and South China, and its coefficient passed the 5% level of significance test, indicating that the forecasts of the two models were consistent in these areas; in Southwest China, the differences were larger, and the forecasts often contradicted each other. A high TCC between BCC and NCEP can be found in
Figure 4b, mainly in the middle and upper reaches of the Yangtze River, in northern China, and in eastern northwest China. It can be seen from
Figure 4c that regions with a higher TCC between BCC and JMA were primarily located in South China, Central China, and North China.
Figure 4d illustrates regions with high TCC between ECMWF and NCEP that were primarily located in the middle and southern reaches of the Yangtze River. Despite the lower TCCs in eastern Yunnan, southern Northeast China, and the southeast coast, the correlation between ECMWF and JMA is strong in
Figure 4e. In most areas, the TCC between the NCEP and the JMA models in
Figure 4f is low, with the exception of the eastern northwest, western Yunnan, and some parts of the middle and lower reaches of the Yangtze River, where most TCCs were negative, indicating that the models made opposite predictions over time.
3.2. Parameter Optimization
The DT, RF, and AB algorithms have various hyperparameters, including feature selection criteria, feature partition criteria, maximum depth, leaf node minimum sample number, node partition minimum impure, maximum leaf node number, etc. Different configurations of hyperparameters have a significant effect on the simulation of the model. In essence, model learning refers to the process of adjusting the model parameters so that the observation data and the model prediction data become as close as possible. By adjusting the different configurations of parameters, the optimal model parameters are established, to achieve the optimal prediction effect. The maximum depth of the tree is the key tuning parameter in the CART, determining the complexity of the model.
As shown in
Figure 5, a MME prediction with a maximum depth of three was constructed for summer precipitation in eastern China. As can be seen from the figure, the model used the JMA as the root node and whether the precipitation anomaly percentage was greater than −9.36% as the basis for node splitting. The ECMWF model and the NCEP model were used as the decision nodes, while ECMWF, NCEP, and the JMA were the decision nodes of the next layer. As can be seen from the figure, when the predicted value of JMA was greater than −9.36%, it entered the right-hand branch of the tree. For the right-hand branch, we determined whether the NCEP prediction was greater than 7.91%, in order to proceed to the next node. The terminal node and its corresponding precipitation prediction value were obtained. The same applied to the left-hand branch, where different configurations of the JMA and ECMWF models were used to obtain their corresponding precipitation predictions. From the tree structure, we can see that the BCC was not used in the tree, and therefore BCC was of relatively low reference value for the East China region. In this tree structure, the ECMWF, NCEP, and JMA do not act as nodes in the tree structure the same number of times. JMA was used as the root node of the whole tree structure and ECMWF acted as a leaf node many times in the branches for node splitting, whereas the NCEP only acted as two leaf nodes. Therefore, the importance of JMA and ECMWF was relatively higher in this tree structure.
To investigate the effect of the maximum depth of the tree on the prediction skills, cross-validation of the DT, RF, and AB algorithms was conducted using data from the last 30 years. For the four regions of South China, East China, North China, and Northeast China,
Figure 6a–d show the variation of root mean square error (RMSE) with increasing maximum depth. A comparison of the RMSEs of the three ML algorithms for the South China region (
Figure 6a) revealed that the RF algorithm had the lowest RMSE, regardless of the variation in maximum depth. DT algorithms had lower RMSEs than AB algorithms when the maximum depth was less than 6, and AB algorithms had lower RMSEs than DT algorithms when the maximum depth was greater. In both the RF and DT algorithms, the RMSE reached a minimum value when a maximum depth of 2 was reached, whereas in the AB algorithm, the RMSE reached its minimum value when a maximum depth of 1 was reached. According to the DT and RF algorithms, the optimal depth for South China is 2, while for the AB algorithm, the optimal depth is 1. For the East, North, and Northeast China regions, a comparison of the RMSE with maximum depth also showed that the RMSE was dependent on maximum depth. Based on the cross-validation results, the RMSEs of the three algorithms showed a trend of decreasing and then increasing according to the maximum depth of the tree, and the RMSE of the RF algorithm had the smallest RMSE across all regions. RMSE reached a global minimum for East and North China at a maximum depth of 2, and RMSE reached a global minimum for Northeast China at a maximum depth of 1. Generally, each algorithm minimized the RMSE at maximum depths below 2, which was a consequence of the small amount of model data. Consequently, when developing prediction models, the maximum depth of the tree should not exceed two.
Additionally, when it comes to the AB and RF algorithms, the number of trees is an important parameter that affects the prediction skill.
Figure 7a–d illustrates the effect of changing the number of trees in the AB algorithm and RF algorithm on the prediction skill in South China, East China, North China, and Northeast China, respectively. According to
Figure 7a, the variation of RMSE was relatively similar for the AB and RF algorithms in South China as the number of trees increased. As the number of ensemble trees increased, the cross-validation RMSE decreased rapidly when the number of trees was less than 10. In the presence of more than 10 trees, the decline in RMSE became significant and stabilized over time. RMSE essentially reached a minimum value when the number of trees reached 20. Further increases in the number of ensemble trees did not result in further decreases in the RMSE. The comparison of the three plots in
Figure 7b–d shows that the effect of the number of integration trees on RMSE in East China, North China, and Northeast China was essentially the same as that in South China. As the number of ensemble trees increased, the RMSE for the three regions decreased rapidly at an early stage. Subsequently, with a further increase in the number of trees, the rate of decline decreased, and the RMSE reached a stable value when the number of trees reached 20 or more and no longer declined. Accordingly, the optimal number of integration trees for both the AB algorithm and the RF algorithm is about 20.
Table 2 shows the hyperparameter selection of the three ML algorithms when carrying out independent prediction.
In the tree-based models, we can calculate the mean square error for each feature, and the feature importance is the normalized value of this mean square error reduction. This enabled us to calculate the importance of each model in the DT, RF, and AB algorithms. A distribution of the importance of each model in China could also be obtained, as shown in
Figure 8.
As shown in
Figure 8, each of the three ML algorithms, DT, RF, and AB, was evaluated based on its importance for each of the four regions.
Figure 8a–c illustrates the importance of BCC in the three algorithms. Comparison of the three figures shows that the importance of BCC in all three algorithms had similar distribution characteristics, all of them having a higher importance in the Yangtze River basin and the northeast region, while the importance of the DT algorithm was greater than the others. In part, this was due to the fact that both the RF and AB are ensemble algorithms, which build multiple decision trees, in order to utilize as much valid information as possible from all models without over-reliance on one model, as in the case of individual decision trees. Therefore, the difference in importance between the RF and AB algorithms was relatively low.
Figure 8d–f shows the spatial distribution characteristics of the importance of ECMWF in each of the three algorithms, where it can be seen that ECMWF were highly significant in the Yellow River Basin, east of the northwest region, and west of the southwest region. According to
Figure 8g–i, the importance of the NCEP model varied by region in the three algorithms, and the model mainly relied on the results of NCEP in most regions south of the Yangtze River, northern China, and western Northwest China. JMA mainly had a high importance in the Huaihe River basin, central China, and northern northeast China, as shown in
Figure 8j–l. In general, the importance of each model varied widely from region to region. It can be seen from all graphs that both NCEP and JMA played a leading role in ensemble forecasting in eastern parts of China. Although BCC and ECMWF were less important than the other two models, they were of great importance in certain areas where the prediction accuracy was low, such as South China and Northeast China. ML algorithms can effectively evaluate the importance of each model for different regional zones and realize the complementary advantages of multiple models, resulting in an optimal MME prediction.
Comparing
Figure 8 and
Figure 3, it can be seen that the importance of the model is more similar to that of TCC. The areas where a single model has high skill also has higher importance for MME. This is because the ML-based MME is essentially a regression of multi-models. If the prediction of a model is more accurate in a specific region, the machine learning method will inevitably use this prediction information more in the training process, thus increasing the importance of the model. The biggest difference between
Figure 3 and
Figure 8 is that a model with lower predictive skill does not mean that the model is less important in the ML-based MME. For example, the prediction skill of a model is so low that predictions and observations are opposite in most years. It is also prediction information that can be used in ML algorithms. ML can obtain a relatively high accuracy by taking the inverse of the predictions of these models. Therefore, in such a training process, a model with low prediction skill could have a higher importance.
The validation results of the three ML algorithms were relatively similar, and in general, RF was the most stable of the three algorithms. Based on RF algorithms,
Figure 9 shows the comparison between the MME prediction and actual summer precipitation in China for 2019–2021. According to
Figure 9a, the summer precipitation in China in 2019 can be divided into two rain belts, from north to south, with the southern rain belt located in Jiangnan, while the northern rain belt is located in the northeast and east northwest, while North China, the Huaihe River basin, and Southwest China receive little rainfall. It can be observed that the integrated multi-model prediction results in
Figure 9b were more accurate in predicting the overall distribution characteristics of the rainbands, as well as the general distribution characteristics of two rainbands in general. The precipitation predictions for Jiangnan and Northwest China are more accurate, while those for Northeast China are less accurate. Based on
Figure 9c, China’s summer precipitation in 2020 was relatively anomalous, with unusually high precipitation levels compared to climate averages, especially in the Yangtze River basin. It was only in a few small areas of southern China, southern northeast China, and central northwest China that the precipitation was less.
Figure 9d depicts the general distribution of precipitation for 2020, with most areas in good agreement with the actual conditions, except for some areas in the northwest and northeast, where the actual conditions were reversed. However, the predictions differ significantly in magnitude from the actual conditions, and the anomalies in actual precipitation were not well predicted.
Figure 9e represents the distribution of summer precipitation in China in 2021 as a rainband extending from northeast to southwest, mostly covering eastern Inner Mongolia, north and central China, the Jianghuai basin, and the middle and lower Yangtze River reaches, with most other areas showing reduced precipitation. In
Figure 9f, the prediction offered a relatively good interpretation of the rainband, although the extent of the rainband was larger than in reality. Forecasts and actuals differ mainly in the eastern part of the northwest region, in the northern part of the northeast region, and in the middle and lower reaches of the Yangtze River. In addition, the predictions were smaller than the actual results, when comparing the forecast and actual precipitation levels. The MME prediction for 2019–2021 better captured the overall distribution of summer precipitation in China, but the magnitude of the forecast differed significantly from that of the actual situation, which was significantly higher.
A quantitative evaluation of the forecasting skill was conducted by computing the ACC and RMSE of the four models, weighted average MME, and ML-based MME, as shown in
Figure 10. When comparing the ACC of the various methods in
Figure 10a, it can be seen that the single model’s ACC was unstable and exhibited large interannual variations, but the two MME methods both had higher and more stable ACCs. It was determined that the mean ACC of ML-based MME was 0.3, an improvement of 0.09 over the weighted average MME of 0.21 for 2019–2021. The ML-based MME had a significant improvement over the other methods. RMSE comparison results in
Figure 10b are similar to those of ACC. The ML-based MME had the lowest RMSEs. However, RMSE’s improvement of ML-based MME was not as significant as ACC’s.