Next Article in Journal
Spatial Downscaling of GPM Satellite Precipitation Data Using Extreme Random Trees
Previous Article in Journal
Impact of Economic and Environmental Factors on O3 Concentrations in the Yangtze River Delta Region of China
Previous Article in Special Issue
Characteristics and Variations of Raindrop Size Distribution in Chengdu of the Western Sichuan Basin, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on a Clustering Forecasting Method for Short-Term Precipitation in Guangdong Based on the CMA-TRAMS Ensemble Model

1
Guangzhou Meteorological Observatory, Guangzhou 511430, China
2
Guangdong Meteorological Observatory, Guangzhou 510080, China
3
Guangdong Provincial Key Laboratory of Regional Numerical Weather Prediction, Guangzhou Institute of Tropical and Marine Meteorology, China Meteorological Administration, Guangzhou 510640, China
4
Plateau Atmospheric and Environment Laboratory of Sichuan Province, School of Atmospheric Sciences, Chengdu University of Information Technology, Chengdu 610225, China
*
Author to whom correspondence should be addressed.
Atmosphere 2023, 14(10), 1488; https://doi.org/10.3390/atmos14101488
Submission received: 8 August 2023 / Revised: 18 September 2023 / Accepted: 19 September 2023 / Published: 26 September 2023
(This article belongs to the Special Issue Identification and Optimization of Retrieval Model in Atmosphere)

Abstract

:
In light of the 2020–2021 flood season in Guangdong, we conducted a comprehensive assessment of short-term precipitation forecasts generated by the ensemble prediction system (EPS) based on the China Meteorological Administration Tropical Regional Atmosphere Model for the South China Sea (CMA-TRAMS). Furthermore, we applied four distinct strategies to cluster the ensemble forecast data produced by the model for precipitation, aiming to enhance our understanding of their applicability in short-term precipitation forecasting for Guangdong. Our key findings were as follows.: Precipitation during the 2020–2021 flood season in Guangdong exhibited distinct characteristics. The impacting areas of frontal and subtropical high-edge rainfall were relatively scattered, predominantly occurring in the evening and nighttime. In contrast, monsoon precipitation and return-flow precipitation were concentrated, with their impacts lasting from early morning to evening. Notably, the errors using the ensemble maximum and minimum values were large, while the errors for the ensemble mean values and medians were small. This indicated that the model’s short-term precipitation forecasts possessed a high degree of stability. The vertical shear of different types of precipitation exerted a noticeable influence on the model’s performance. The model consistently displayed a tendency to underestimate short-term precipitation in Guangdong; however, this bias decreased with longer lead times. Simultaneously, the model’s dispersion increased with longer lead times. In terms of mean absolute error (MAE) test results, there was little difference in the performance of ensemble primary forecasts under various strategies, while the “ward” strategy performed well in sub-primary cluster forecasts. This was particularly true for areas and types of precipitation where the model’s performance was poor. While the clustering approach lagged behind ensemble mean forecasts in predicting rainy conditions, it exhibited improvement in forecasting short-term heavy rainfall events. The “complete” and “single” strategies consistently delivered the most accurate forecasts for such events. Our study sheds light on the effectiveness of clustering methods in improving short-term precipitation forecasts for Guangdong, particularly in regions and conditions where the model initially struggled. These findings contribute to our understanding of precipitation forecasting during flood seasons and can inform strategies for enhancing forecast accuracy in similar contexts.

1. Introduction

In recent years, with the development of the “Guangdong–Hong Kong–Macao Greater Bay Area” and the establishment of Great Pearl River Delta Urban Agglomerations, the impact of the urban heat-island effect has led to an increasing trend of short-term heavy precipitation concentrating and developing in the region [1,2,3,4]. As a result of global warming, the hazard of short-term heavy precipitation will become more pronounced in Guangzhou, and record-breaking short-term heavy rainfall events will occur more frequently. Therefore, improving the accuracy of short-term heavy-precipitation forecasts is of utmost importance.
Due to the significant uncertainty in the development of short-term heavy precipitation systems, deterministic forecasting with a single initial value using numerical weather models is insufficient. Therefore, it is imperative to introduce the concept of ensemble forecasting, which allows for the uncertain results of the initial conditions and models to reflect multiple possible future weather scenarios [5,6,7,8]. For short-term heavy precipitation caused by mesoscale systems in the South China region, the establishment and application of ensemble forecasting systems are urgently needed to improve forecasting accuracy and enhance disaster-prevention and mitigation capabilities.
The China Meteorological Administration Guangzhou Institute of Tropical and Marine Meteorology developed the China Meteorological Administration Tropical Regional Atmosphere Model for the South China Sea (CMA-TRAMS), based on the China Meteorological Administration mesoscale model (CMA-MESO model) [9,10] established an ensemble prediction system—CMA-TRAMS (EPS)—based on CMA-TRAMS, via three years of batch tests that showed that CMA-TRAMS (EPS) exhibits more advantages in forecasting heavy precipitation than the European Centre for Medium-Range Weather Forecasts ensemble prediction system (ECMWF-EPS). In 2020, an upgraded version of CMA-TRAMS was implemented, and CMA-TRAMS (EPS) was put into operational trial [11].
Cluster analysis is a valuable method of classifying different ensemble members when it is impractical to manually identify the similarities and differences among numerous forecast samples within a limited time frame. It provides insights into the spatial distribution of similarities and differences among the ensemble members. Forecasters can use cluster products to gain a broad understanding of the forecast classification of all ensemble members, enabling them to assess potential future developments of weather systems and to identify any anomalies more easily.
Several international meteorological agencies have adopted cluster analysis products as part of their operational forecasting practices. For instance, the ECMWF initially employed the Ward clustering method [12] and later adopted the K-means algorithm, which provides the main categories of weather forecasts and extreme weather maps based on the farthest distance from the mean, to classify ensemble forecast products [13]. The National Centers for Environmental Protection (NCEP) uses the anomaly correlation coefficient clustering method to group 500 hPa height field forecasts, while the French Meteorological Agency utilizes the dynamic fuzzy clustering method proposed by Diday [14,15], with weather types determining the initial cluster centers and displacement and maximum correlation methods employed for clustering.
In this study, we employed a hierarchical clustering strategy based on the CMA-TRAMS (EPS) model to analyze short-term heavy precipitation and short-term precipitation events in Guangdong. The objective of this study was to compare and evaluate the applicability of hierarchical clustering for short-term heavy precipitation and short-term precipitation forecasts using the CMA-TRAMS (EPS) model.

2. Data and Methods

2.1. Study Area

The terrain distribution shows that the study area of this paper (Figure 1), Guangdong Province, has a wide range of topographic heights. In general, the topographic height in this region follows a trend of decreasing from south to north along the coastline. In addition, there are complex terrains in the Pearl River Estuary, such as trumpet-shaped terrains and hilly mountains. The complexity of the topography in Guangdong Province also results in localized short-term precipitation in this region. Simulation of short-term precipitation by numerical prediction models in this region often yields poor results, presenting significant challenges for operational meteorological short-term precipitation forecasting. Therefore, based on these factors, we selected this region as the study area for short-term precipitation.

2.2. Datasets

2.2.1. Forecasting Data

The forecast data used in this study were obtained from the CMA-TRAMS ensemble prediction system for the periods from 1 April 2020 to 30 September 2020 and 1 April 2021 to 30 September 2021, with the starting time of 20:00 (Beijing time). The forecast data include hourly precipitation elements for a 24 h lead time. The CMA-TRAMS model has a resolution of 9 km, and it consists of 30 ensemble members. The ensemble members are configured with various combinations of physical parameterization schemes, including simplified Arakawa–Schubert (SAS) cumulus convection with boundary layer Markov random field (MRF) parameterization (8 members), SAS cumulus convection with boundary layer Yonsei University (YSU) parameterization (8 members), Kain–Fitch (KF) cumulus convection with boundary layer MRF parameterization (7 members), and KF cumulus convection with boundary layer YSU parameterization (7 members).

2.2.2. Observation Data

The observational data used for verification were hourly precipitation from automatic weather stations in Guangdong Province for the periods from 1 April 2020 to 30 September 2020 and 1 April 2021 to 30 September 2021. These observation stations included national meteorological stations and regional stations. To match the model evaluation period, the observed precipitation data from station locations were interpolated to 9 km grid resolution, using bilinear interpolation.

2.2.3. Reanalysis Data

For further analysis, fifth-generation ECMWF reanalysis (ERA5) daily data with a spatial resolution of 0.25° × 0.25° were utilized to obtain the geopotential height field, the wind field, and the sea-level pressure field for corresponding precipitation events. Based on the analysis of synoptic patterns and the influencing weather systems during precipitation events in the 2020–2021 rainy season, the main types of precipitation affecting the South China region were categorized into four classes: frontal precipitation, monsoon precipitation, subtropical high-edge precipitation, and return-flow precipitation (Table A1).

2.3. Correction Methods

2.3.1. Evaluation Methods

Deterministic forecast verification: For the 30 ensemble members, ensemble statistics were computed, including ensemble mean (EM), median, minimum (Min), and maximum (Max). The deterministic forecast was evaluated using mean error (ME) and mean absolute error (MAE). Additionally, the threat score (TS) was used to assess the EM forecast. The calculation formula for the TS score is shown in Equations (1)–(3).
Probability forecast verification: The Talagrand distribution was employed to verify the probability forecast of the model. This method involved sorting the 30 ensemble members in ascending order and dividing them into 31 intervals. During the verification period, the cumulative frequency of observed values falling within each interval was computed to demonstrate the reliability of the ensemble model.
Additionally, the relative operating characteristic (ROC) curve and the area under ROC curve (ROCA) were utilized to evaluate the model’s probabilistic forecast skill. However, since the main focus of this study was still on the applicability analysis of the clustering method to the model, the ROC evaluation for the original model’s probability forecast was based on multiple thresholds (0.1 mm, 1.0 mm, 3.0 mm, 10.0 mm, 20.0 mm, 50.0 mm, and 70.0 mm). The ROC and ROCA calculations are expressed by Equations (4) and (5).
From Equations (4) and (5), the hit rate (HR) and the false alarm rate (FAR) were obtained. By plotting the FAR on the x-axis and the HR on the y-axis, the ROC curve was generated. Integrating this curve yields the ROCA. The ROC curve represents the trade-off between the HR and the FAR, and the ROCA provides a comprehensive measure of the model’s probability forecast skill.
When the ROCA = 0.5, the HR and the FAR were equal, indicating that the model had no skill in probabilistic forecasting. On the other hand, when the ROCA was greater than 0.5 and approached 1, this signified that the model exhibited higher skill in probabilistic forecasting.

2.3.2. Hierarchical Clustering Method

The concept of hierarchical clustering originates from the analysis of similarities and differences between biological organisms (Figure 2), where a hierarchical set of nested categories is used to describe the relationships between samples, resulting in a classification dendrogram. The essence of the entire hierarchical clustering process is to construct a classification tree of the samples. According to the principles of Johnson’s hierarchical clustering algorithm [16], given a set of N samples (or objects) and an N × N distance matrix (or similarity matrix) for clustering, the samples are initially considered as individual clusters, resulting in N classes, where each class contains only one sample, and the distance between classes is based on the distances between the samples they contain.
The specific steps of hierarchical clustering are as follows:
(1)
Initially, each object is treated as an individual class, resulting in N classes, where each class contains only one sample, and the distance between classes is based on the distances between the samples they contain.
(2)
The two closest classes are then identified and merged into one class, reducing the total number of classes by 1.
(3)
The distances between the newly merged cluster and all other existing clusters are recalculated.
(4)
Steps 2 and 3 are repeated, finding the next closest clusters and merging them, until all samples are combined into a single class, resulting in a class that contains all N samples.
(5)
Based on the given target number of clusters, n, the clustering results when divided into n clusters during the process are obtained.
The clustering strategy may vary, depending on the inter-group discrimination criterion. In step 3, different samples (i.e., those closest or farthest apart) are selected for merging, based on the comparison of their distribution within two clusters (i.e., clusters with the most similar or dissimilar samples). When using the within-cluster sum of squares (or covariance) as the criterion, the clustering method is referred to as the “Ward” strategy, aiming to minimize the maximum distance between clusters. The “complete” strategy seeks to minimize the maximum distance between clusters. The “average linkage” strategy aims to minimize the average distance between clusters. The “single linkage” strategy considers the minimum distance between all observations in two groups.

2.4. Formatting of Mathematical Components

M E = 1 n i = 1 n f i o i
M A E = 1 n i = 1 n | f i o i |
T S = A A + B + C
where n is the total number of all samples containing the verification grid points during the evaluation period, f represents the model forecast values, and o represents the corresponding observed values.
The calculation of the TS score is based on the statistical results presented in Table 1 and Table 2.
Where N represents the total number of samples within the evaluation period, including all grid points and forecast members. F corresponds to the number of forecast members in which the specific weather event occurred during the evaluation period, while NF denotes the number of forecast members in which the specific weather event did not occur during the evaluation period.
H R n = i = n N O i i = 1 N O i
F A R n = i = n N N O i i = 1 N N O i

3. Analysis of South China Precipitation during the Flood Season of 2020–2021

3.1. Average Daily Rainfall Distribution of Different Precipitation Types during the 2020–2021 Flood Season

The precipitation during the 2020–2021 flood season in South China can be classified into four types, based on weather patterns: frontal precipitation, monsoon precipitation, subtropical-high-edge precipitation, and return-flow precipitation. Considering the average daily rainfall levels of these different types (Figure 3), we observed that frontal precipitation and subtropical-high-edge precipitation exhibited a more dispersed distribution. In comparison, frontal precipitation showed a broad range of high-rainfall values, exceeding 20 mm, whereas subtropical-high-edge precipitation mainly concentrated in the central and western regions of Guangdong, with most other areas experiencing rainfall below 10 mm. On the other hand, return-flow precipitation and monsoonal precipitation displayed a more concentrated distribution of average daily rainfall. The areas with intense rainfall from return flow are primarily located in the central and western parts of Guangdong province, while monsoonal precipitation mainly affected the central and eastern regions of Guangdong.
In summary, the daily rainfall from return-flow and monsoonal precipitation, which had a more concentrated distribution of intense rainfall, exerted a more significant impact than frontal precipitation or subtropical-high-edge precipitation, which showed a more dispersed distribution of average daily rainfall.

3.2. Average Daily Rainfall Variations of Different Precipitation Types during the 2020–2021 Flood Season

From the average daily rainfall distributions of different precipitation types (Figure 4), it was evident that each type had varying impact periods and degrees of influence. Analyzing the impact periods of each precipitation type, we conducted a statistical analysis on the hourly distribution of average rainfall from 20:00 to 20:00 for each precipitation type.
We observed that monsoonal precipitation and return-flow precipitation had the most significant impact with longer precipitation periods, starting from the early morning and lasting until the evening. The extended duration of precipitation was one of the contributing factors to their higher daily rainfall values. Frontal precipitation, on the other hand, mainly occurred from the evening to the nighttime, and its precipitation period was relatively concentrated. Additionally, frontal precipitation exhibited the largest short-term rainfall among all four precipitation types. Regarding subtropical-high-edge precipitation, its hourly rainfall evolution showed a steady increasing trend, with precipitation primarily occurring during the evening and nighttime. The rainfall during the daytime was less pronounced for this type of precipitation.

4. Model Forecast Verification

During the 2020–2021 flood season, the occurrence of return-flow precipitation was relatively scarce, leading to reduced stability in model forecasts. Given this limited representation of return-flow precipitation, we placed a reduced emphasis on its analysis. Instead, the main focus was in assessing the applicability of hierarchical clustering for South China precipitation during the flood season.

4.1. Deterministic Forecast Verification

Based on deterministic forecast verification of the 30 ensemble members from the CMA-TRAMS ensemble model, simple ensemble forecast methods were employed to extract forecast information from the model. Specifically, we considered the Max, the Min, the EM, and the Median as the forecast methods for precipitation forecasts on an hourly basis from April to September during the 2020–2021 period. Regarding the MAE and the ME of hourly precipitation forecasts (Figure 5), the Max consistently exhibited larger errors at all lead times, with the discrepancy becoming more pronounced as the lead time increased. Regarding the MAE, the errors for the Min, the EM, and the Median all increased with lead time, but the forecast based on Min and Median resulted in the smallest MAE. Concerning the ME, the forecast using the Min showed a certain degree of negative bias, indicating a tendency to underestimate precipitation amounts. On the other hand, the other methods all exhibited a positive bias, indicating an inclination to overestimate precipitation amounts. The forecast using the Median yielded errors closest to zero, indicating the best forecast performance. Comparing the results of the MAE and the ME, we observed that the Max (or the Min) consistently showed a consistent overestimation (or underestimation) bias that increased with lead time. The Median provided slightly better results than the EM.
To analyze the model’s deterministic forecast for different types of precipitation, we conducted separate verification tests for each precipitation type (Figure 6). The results indicated that, for all types of flood-season precipitation, all ensemble methods except for the Max showed a certain level of forecasting effectiveness. When using the Max for forecasting, the MAE increased relatively less with lead time for frontal precipitation and return-flow precipitation compared to other precipitation types. However, the Min consistently yielded smaller MAE values for all lead times. Nevertheless, considering the overall average errors for different precipitation types, the Min showed a bias toward underestimation. In the short-term lead times, the Min, the Median, and the EM displayed similar forecasting effectiveness, with relatively small errors. However, as the lead time increased, their forecast errors grew significantly. Based on the overall average error analysis for the flood season, the Min exhibited a bias toward underestimation, while the EM and the Median had comparable forecasting performance. The EM method showed a noticeable increase in forecast errors with longer lead times, while the Median demonstrated better and more stable performance over various lead times.
Based on the widely used EM as a representation, we further examined the deterministic forecast of the model using threat scores (TSs), which are commonly used in the operational assessment of precipitation forecast effectiveness (Figure 7). Regardless of the entire flood season or different precipitation types, the TS scores decreased as the precipitation threshold increased. For the entire flood season, the EM showed a certain level of forecasting effectiveness, and the TS scores remained relatively stable and showed a slow increasing trend as the lead time increased. This indicated that for short-term precipitation forecasts, the model exhibited good forecast stability within a 24 h lead time. For all precipitation types, the model demonstrated relatively high forecast stability. Comparing different precipitation types, the TS scores for sub-tropical high edge precipitation were lower, and its higher TS scores were mainly found in the 24 h lead time forecasts. This suggested that the model’s forecast accuracy and stability for this type of precipitation were poor. In contrast, the model showed good performance for monsoonal precipitation and frontal precipitation. The strong baroclinic nature of these two types of precipitation may have contributed to their better forecast performance. However, further validation is needed to ascertain whether the model’s forecast accuracy is, indeed, better for precipitation types with strong baroclinic effects.

4.2. Probability Forecast Verification

To evaluate the appropriateness of a good ensemble forecast system, it should possess suitable dispersion, meaning that each ensemble member’s forecast probability should be evenly distributed and observed values should be close to any of the ensemble member forecasts. To verify this, we analyzed the Talagrand statistics results. For this, we sorted the 30 ensemble members of the CMA-TRAMS model ensemble forecast system in ascending order, forming 31 intervals. Then, we used the cumulative frequency of observed values falling within these 31 intervals to describe the reliability of the ensemble forecast system. In an ideal ensemble forecast system, the observed values would fall into each interval with equal probability, resulting in a flat Talagrand distribution.
Based on the Talagrand distribution for lead times of 1, 7, 13, and 19 h (Figure 8), the model exhibited a reverse “L” shape, indicating that it had a characteristic of small dispersion, which is a common issue in many ensemble models. Furthermore, most observed values fell to the right of the ensemble maximum, suggesting that the model had a certain degree of underestimation error. As the lead time increased, the underestimation error tended to decrease, and the probability of observed values falling within the ensemble members increased, indicating a larger dispersion in the model.
For each lead time, the model’s ROC curves were consistently located in the upper left corner (Figure 9), indicating that the model exhibited a certain level of forecast effectiveness for hourly precipitation. Moreover, the curves of different colors were closely aligned, indicating high stability and showing that the performance of probability forecast did not decrease with increasing lead time. In the legend, the values represent the ROCA for each lead time during the 2020–2021 flood season. It can be observed that the ROCA was greater than 0.5 for all lead times, indicating that the ensemble model had a certain degree of forecast effectiveness for hourly precipitation in the Guangdong region. As the lead time increased, there as no significant change in the ROCA, suggesting that the model’s probability forecast remained stable with increasing lead time, which aligns with the earlier analysis. Notably, when comparing different lead times, the ROC curve for the 1 h lead time was the closest to the reference line and had the smallest ROCA, indicating that the performance of probability forecast for the 1 h lead time was the weakest among all lead times.
Overall, the model exhibited a certain level of forecast effectiveness for all types of precipitation, with ROCA values greater than 0.5 at different lead times. However, the CMA-TRAMS ensemble model’s probability forecast varied for different types of precipitation (Figure 10). As the lead time increased, the model showed a tendency of improved probability forecast for monsoonal precipitation and frontal precipitation, which were characterized by strong baroclinic effects. Conversely, for return-flow precipitation and the subtropical-high-edge precipitation, which had weaker baroclinic effects, the model demonstrated higher stability in its probability forecasts. In these cases, as the lead time increased, the ROCA values showed no significant changes (or even a slight decrease). In summary, the model performed well in probability forecasts for monsoonal precipitation and frontal precipitation, followed by return-flow precipitation. However, its forecast for subtropical-high-edge precipitation was the least favorable among all types of precipitation.

5. Analysis of Cluster Forecast Performance

This study conducted a statistical comparative analysis through cluster analysis and EM deterministic forecast, aiming to assess the suitability of cluster forecasting for precipitation in Guangdong. The analysis included a comparison of the performance of individual cases, large cluster categories (sub-clusters), and EM deterministic forecasts from the CMA-TRAMS ensemble model.
For the overall precipitation forecasts during the 2020 and 2021 flood seasons, when using the MAE as the evaluation metric, using sub-primary clusters (Figure 11b) for forecasting, there were improvements in forecast performance. However, using primary clusters (Figure 11a), the performance was relatively poor compared to the EM, indicated by positive values when subtracting the MAE of the clusters from the MAE of the EM. Similar to the analysis of different precipitation categories mentioned above, using the “single” strategy for clustering resulted in smaller MAE values and higher forecast accuracy. On the other hand, when using the “sub-primary” strategy for clustering, the improvement in forecast accuracy was less pronounced for the “single” strategy. In contrast, the “Ward” strategy for clustering showed more significant improvements in forecast accuracy.
For each precipitation type (excluding subtropical-high-edge precipitation), whether using primary clusters or Sub-primary clusters for precipitation forecasting, the MAE of the forecasts was smaller compared to EM forecasting (Figure 12). Moreover, as the lead time increased, this trend of reduced error became more pronounced. When using primary clusters for forecasting, the differences in MAE among different clustering schemes were relatively small, making it challenging to identify a relatively “optimal” clustering scheme. In contrast, for pub- Primary clusters, the improvement in MAE forecasting was more significant when using the Ward strategy for clustering compared to the EM forecasting. The MAE values were smaller in this case. In summary, for the MAE, for primary clusters that better reflected the model’s performance, using different strategies for cluster forecasting, showed no significant difference in forecast effectiveness. However, for sub-primary clusters that exhibited more pronounced anomalies in precipitation forecasting, the improvement in forecast accuracy was more evident when using the Ward strategy for clustering.
Based on the analysis of the improvement in the MAE for different precipitation types when using primary clusters for forecasting, we observed that the differences among the various strategies were relatively small. Therefore, to further examine the level of MAE improvement and its distribution, we focused on analyzing the forecasts of each precipitation type using sub-primary clusters. This analysis provided more insights into the effectiveness of the clustering strategies in improving the forecast accuracy for individual precipitation types.
The model exhibited larger deviations in forecasting monsoonal precipitation compared to other precipitation types (Figure 13). Moreover, the areas with significant forecast deviations aligned well with the locations of heavy rainfall regions in Guangdong. On the other hand, the model showed a relatively smaller MAE for return-flow, frontal, and subtropical-high-edge precipitation. For each precipitation type, the regions with a higher MAE in the forecast corresponded well with areas of larger precipitation distribution. Additionally, for frontal precipitation and subtropical-high-edge precipitation, which had more scattered rainfall patterns, the MAE distribution in the forecasts was also more dispersed.
As shown in Figure 14, for different precipitation types, compared to EM precipitation forecasts, the improvement in the MAE was most pronounced for monsoonal and subtropical-high-edge precipitation when using sub-primary clustering. The improvement was second highest for frontal precipitation, while the improvement for return-flow precipitation was relatively poor. This might have been due to the model’s lower performance in forecasting this type of precipitation, leading to difficulties in achieving significant improvement.
Among the clustering strategies, using the Ward strategy for sub-primary clustering yielded the best improvement in the MAE, for all precipitation types. The complete strategy also showed overall positive improvement for all precipitation types. On the other hand, the average and single strategies showed a negative improvement in MAE forecasting compared to EM forecasting in most areas, consistent with the previous analysis.
The horizontal distribution of improvement levels represented by the Ward clustering strategy showed that using sub-primary clustering for forecasting led to significant improvement in regions where EM precipitation forecasts had a higher MAE. This meant that sub-primary clustering performed better in areas and precipitation types than the EM method.
From the TS scores, overall, there was not much difference in the performance when using different clustering strategies for both the clear-rain and heavy-precipitation forecasts (Figure 15). In other words, selecting the “optimal” clustering strategy for precipitation forecasts in Guangdong using primary clusters is challenging. Specifically, for clear-rain forecasts, using primary clusters did not perform as well as the EM. The TS score difference between the primary clusters and the EM forecast was negative for all lead times, indicating that the clustering effect was not significant in clear-rain forecasts. For short-term heavy-precipitation forecasts (using a threshold of hourly rainfall greater than 20 mm), the TS score differences between clustering strategies and EM forecasting were very small. The TS score improvement using the Ward clustering strategy was positive compared to the EM in some lead times, but the improvement in the TS score was not significant.
For precipitation during the 2020–2021 flood season (Figure 16), noticeable differences in TS scores were observed when using different strategies for sub-primary clusters at various lead times, compared to primary clusters. Similar to primary clusters, all strategies for sub-primary clusters showed negative skill improvement for clear-rain forecasts, compared to EM, and this negative skill improvement became more evident as the lead time increased. For short-term heavy precipitation, all strategies for sub-primary clusters outperformed the EM in terms of TS scores, and this improvement became more pronounced with increasing lead times. Among these strategies, the average strategy performed well at smaller lead forecast hours, while the single strategy’s forecast improvement became more evident with increasing lead times.
For the flood-season precipitation, both primary clusters and sub-primary clusters showed inferior performance compared to the EM in clear-rain forecasts. However, for short-term heavy precipitation, both primary clusters and sub-primary clusters outperformed the EM due to the EM’s smoothing effect on extreme heavy precipitation. Among the strategies for sub-primary clusters, the forecast improvement was more significant, especially for short-term heavy precipitation. When comparing the different strategies for sub-primary clusters for short-term heavy precipitation with positive improvement, the performance of different strategies for primary clusters was relatively similar and tended to weaken with increasing lead times. For sub-primary clusters, the single strategy showed the highest forecast-skill improvement.
In summary, for primary clusters, the performance using different strategies showed relatively small differences in terms of forecast accuracy and skill. When comparing the forecast errors in terms of the MAE, primary clusters generally exhibited a reduction in overall forecast bias compared to the EM. However, based on TS scores, the performance for clear-rain predictions was inferior to EM, while for heavy-precipitation forecasts, primary clusters showed some advantages, but overall they were similar to the EM.
Regarding sub-primary clusters, there were significant differences among the different strategies in terms of forecast accuracy and skill. In terms of the MAE, the Ward clustering method led to a more significant reduction in overall forecast bias compared to the EM, and this reduction became more pronounced with increasing lead times. However, concerning TS scores, regardless of the strategy used, sub-primary clusters performed worse than the EM for clear-rain predictions. However, for short-term heavy precipitation, overall, all strategies for sub-primary clusters performed better than the EM. Specifically, based on TS scores, the complete strategy was favorable for shorter lead times, while the single strategy was optimal for longer lead times. This suggested that each strategy had its strengths under different forecasting conditions.

6. Discussion and Conclusions

This study evaluated precipitation forecasts from CMA-TRAMS (EPS), including both deterministic and probabilistic forecasts during the 2020–2021 flood season, using various evaluation metrics such as the ME, the MAE, TS scores, Talagrand distribution, ROC curves, and ROCA. The comprehensive evaluation yielded the following findings:
(1)
During the 2020–2021 flood season, it was observed that frontal and subtropical-high-edge precipitation exhibited more scattered distribution, with subtropical-high-edge precipitation mainly occurring in the evening and frontal precipitation appearing from evening to nighttime. In contrast, monsoon and return-flow precipitation showed a more concentrated distribution and longer duration, spanning from early morning to evening, resulting in more significant impacts.
(2)
Max/Min forecasts tended to have a positive/negative bias, while the EM forecasts and the Median forecasts exhibited smaller average errors, closer to zero. However, as the lead time increased, both the EM forecast and the Median forecast experienced an increase in errors, with the EM showing more significant growth compared to the Median.
(3)
TS scores decreased as the precipitation threshold increased. The model showed good stability for short-term hourly forecasts within a lead time of 24 h. However, the performance for subtropical-high-edge precipitation was consistently poor. The model performed well in forecasting monsoonal and frontal precipitation, possibly due to their strong synoptic forcing, although the correlation between synoptic forcing and model performance needs further verification.
(4)
The model tended to underestimate short-term precipitation in the South China region, but this underestimation decreased as the lead time increased, leading to increased dispersion in the model’s forecasts.
(5)
The ROC curves for precipitation forecasts at various lead times were positioned in the upper-left corner, indicating a skillful forecast for hourly precipitation and a high level of forecast stability.
(6)
CMA-TRAMS (EPS) performed well in forecasting monsoon and frontal precipitation, and as lead times increased, the probability forecast showed some improvement. However, for return-flow and subtropical-high-edge precipitation with weaker synoptic forcing, the model’s performance was inferior. The ROCA for these types of precipitation did not show significant changes, or even decreased slightly with increasing lead times.
Based on the evaluation of precipitation forecasts from the CMA-TRAMS (EPS), we proceeded to cluster ensemble forecast information using four different strategies. The analysis yielded the following results:
(1)
For major clusters, the differences in performance among different clustering strategies were relatively small. For sub-major clusters, using the Ward strategy yielded better performance for precipitation magnitudes. Additionally, the Ward clustering strategy showed more significant improvements in the MAE for regions and precipitation types where the EM forecast performed poorly. The Ward method calculated distances between clusters and incorporated weighting factors, which may have offered certain computational advantages when considering the MAE evaluation metric.
(2)
In terms of TS scores, both major clusters and sub-major clusters did not perform as well as the EM in clear-rain forecasts. However, for short-term heavy rainfall, both major clusters and sub-major clusters showed improvements, compared to the EM forecast, with the sub-major clusters exhibiting more substantial improvements. Among the clustering strategies, the complete and single strategies yielded the highest forecast skill scores. Both the “complete” and “single” strategies considered that smaller distances indicated greater similarity between clusters. Regarding clustering strategy computation, they shared some similarities, and it was not surprising that their TS score results were similar, as well.
Regarding the handling and extraction of ensemble information in ensemble forecasting, previous research explored various methods, such as optimal percentile fusion and extreme forecast indices. However, these methods often came with their own set of issues. For example, while optimal percentile fusion can improve TS scores in precipitation forecasts, it tends to lead to excessive overprediction. Extreme forecast indices, on the other hand, primarily characterize the extremeness of model forecasts but do not provide specific quantitative precipitation forecasts. Similar issues can be found in clustering methods focused solely on precipitation elements. If the emphasis is on overall forecasting performance, one might choose the clustering strategy with the lowest MAE. However, if the focus is on TS scores commonly used in operational forecasting, the clustering strategy with the highest TS score would be preferred. The choice of clustering strategy varies, depending on the specific emphasis. Furthermore, this study solely addressed clustering for precipitation elements. Combining clustering with different elements in subsequent research is expected to yield more significant improvements in precipitation-forecasting performance.

Author Contributions

Conceptualization, P.R.; Methodology, J.Z., P.R. and H.L.; Software, H.C.; Formal analysis, J.Z.; Investigation, B.C.; Resources, X.Z.; Data curation, J.Z. and P.R.; Writing—original draft, P.R., B.C., X.Z. and H.C.; Writing—review & editing, H.L.; Funding acquisition, B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This study was jointly supported by the Guangzhou Municipal Science and Technology Planning Project of China (202103000030), China Meteorological Administration Review and Summary Special Project (FPZJ2023-091), the Science and Technology Research Project of Guangdong Meteorological Observatory (202203), the National Key Research and Development Program of China (2021YFC3000902), the National Natural Science Foundation of China (42075087, U20A2097), the Guangzhou Basic and Applied Basic Research Project (202201011093), and the Guangzhou Meteorological Society Science and Technology Research Project (M202105).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data sets for this research are publicly available. Archives of ERA5 data are available at: https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5 (accessed on 18 September 2023). The precipitation data used in this paper include national and regional station data covering the Guangdong province of China, which can be downloaded from the MUSIC interface at http://172.22.1.175/di/index (accessed on 18 September 2023). The CMA_TRAMS (EPS) ensemble model hourly precipitation forecast data can be obtained from the Guangzhou Institute of Tropical and Marine Meteorology, China Meteorological Administration (https://www.itmm.org.cn/ (accessed on 18 September 2023)).

Acknowledgments

We would like to express our sincere gratitude to the reviewers for their valuable comments and suggestions, which significantly contributed to the improvement of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The dates of various types of precipitation events.
Table A1. The dates of various types of precipitation events.
Return Flow PrecipitationMonsoonal PrecipitationFrontal PrecipitationSubtropical-High-Edge Precipitation
2021050220200520–202005212020032720200525
20210503202005272020040420200510
2020040520200529–202006022020042220200511
2020040620200608–202006092020051220200514
2020062520200526
20200803–20200805
20200812–20200813
20200826
20200907–20200908
20200915
20200919
20210531–20210602
20210613
20210623–20210626
20210628–20210629
20210716–20210717
20210730–20210731
20210809–20210811
20210814–20210815

References

  1. Liang, P.; Ding, Y.H. The long-term variation of extreme heavy precipitation and its link to urbanization effects in shanghai during 1916–2014. Adv. Atmos. Sci. 2017, 34, 321–334. [Google Scholar] [CrossRef]
  2. Liu, B.; Chen, S.; Tan, X.; Chen, X. Response of precipitation to extensive urbanization over the Pearl River Delta metropolitan region. Environ. Earth Sci. 2021, 80, 9. [Google Scholar] [CrossRef]
  3. Wu, M.; Luo, Y.; Chen, F.; Wong, W.K. Observed link of extreme hourly precipitation changes to urbanization over coastal South China. J. Appl. Meteor. Climatol. 2019, 58, 1799–1819. [Google Scholar] [CrossRef]
  4. Huang, X.; Wang, D.; Ziegler, A.D.; Liu, X.; Zeng, H.; Xu, Z.; Zeng, Z. Influence of urbanization on hourly extreme precipitation over China. Environ. Res. Lett. 2023, 17, 044010. [Google Scholar] [CrossRef]
  5. Epstein, E.S. Stochastic dynamic prediction. Tellus 1969, 21, 739–759. [Google Scholar] [CrossRef]
  6. Leith, C.E. Theoretical skill of Monte Carlo forecasts. Mon. Weather Rev. 1974, 102, 409–418. [Google Scholar] [CrossRef]
  7. Palmer, T.N.; Brankovic, C.; Richardson, D.S. A probability and decision-model analysis of PROVOST seasonal multi-model ensemble integrations. Q. J. R. Meteorol. Soc. 2000, 126, 2013–2033. [Google Scholar] [CrossRef]
  8. Richardson, D.S. Skill and relative economic value of the EC-MWF ensemble prediction system. Q. J. R. Meteorol. Soc. 2000, 126, 649–668. [Google Scholar] [CrossRef]
  9. Chen, Z.; Zhang, C.; Huang, Y.; Feng, Y.; Zhong, S.; Dai, G.; Xu, D.; Yang, Z. Track of super Typhoon Haiyan predicted by a typhoon model for the South China Sea. J. Meteor. Res. 2014, 28, 510–523. [Google Scholar] [CrossRef]
  10. Zhang, X.B. A GRAPES-based mesoscale ensemble prediction system for tropical cyclone forecasting: Configuration and performance. Quart. J. Roy. Meteor. Soc. 2018, 44, 478–498. [Google Scholar] [CrossRef]
  11. Li, J.H.; Gao, Y.D.; Wan, Q.L.; Zhang, X.-b. Sample optimization of ensemble forecast to simulate tropical storms (merbok, mawar, and guchol) using the observed track. J. Trop. Meteorol. 2020, 26, 14–26. [Google Scholar] [CrossRef]
  12. Ward, J.H. Hierarchical grouping to optimize an objective function. J. Amer. Stat. Assoc. 1963, 58, 236–244. [Google Scholar] [CrossRef]
  13. Atger, F. Turbing: An alternative to clustering for ensemble prediction classification. Weather. Forecast. 1999, 114, 741–757. [Google Scholar] [CrossRef]
  14. Yang, X.S.; Wang, J.; Chen, Y. The Application of Tidal Signal Exclusion Scheme from Initialization in a General Circulation Model. J. Trop. Meteorol. 2004, 10, 210–215. [Google Scholar]
  15. Vautard, R. Multiple Weather Regimes over the North Atlantic: Analysis of Precursors and Successors. Mon. Weather Rev. 1990, 118, 2056–2081. [Google Scholar] [CrossRef]
  16. Johnson, S.C. Hierarchical clustering schemes. Psychometrika 1967, 32, 241–254. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Terrain height distribution of Guangdong.
Figure 1. Terrain height distribution of Guangdong.
Atmosphere 14 01488 g001
Figure 2. A simple schematic diagram of the hierarchical clustering method.
Figure 2. A simple schematic diagram of the hierarchical clustering method.
Atmosphere 14 01488 g002
Figure 3. Average daily rainfall distribution of return-flow precipitation (a), monsoonal precipitation (b), frontal precipitation (c), and subtropical-high-edge precipitation (d) during the 2020–2021 flood season.
Figure 3. Average daily rainfall distribution of return-flow precipitation (a), monsoonal precipitation (b), frontal precipitation (c), and subtropical-high-edge precipitation (d) during the 2020–2021 flood season.
Atmosphere 14 01488 g003
Figure 4. Average daily rainfall variations of subtropical-high-edge precipitation, frontal precipitation, monsoonal precipitation, and return-flow precipitation during the 2020–2021 flood season.
Figure 4. Average daily rainfall variations of subtropical-high-edge precipitation, frontal precipitation, monsoonal precipitation, and return-flow precipitation during the 2020–2021 flood season.
Atmosphere 14 01488 g004
Figure 5. Variation of MAE (a) and ME (b) of precipitation forecasts during the 2020–2021 flood season with lead time, using various ensemble methods.
Figure 5. Variation of MAE (a) and ME (b) of precipitation forecasts during the 2020–2021 flood season with lead time, using various ensemble methods.
Atmosphere 14 01488 g005
Figure 6. Variation of MAE of precipitation forecasts for return-flow precipitation (a), monsoonal precipitation (b), frontal precipitation (c), and subtropical-high-edge precipitation (d) during the 2020–2021 flood season with lead time, using various ensemble methods.
Figure 6. Variation of MAE of precipitation forecasts for return-flow precipitation (a), monsoonal precipitation (b), frontal precipitation (c), and subtropical-high-edge precipitation (d) during the 2020–2021 flood season with lead time, using various ensemble methods.
Atmosphere 14 01488 g006
Figure 7. Threat score (TS) of EM forecasts for flood season precipitation (a), frontal precipitation (b), monsoonal precipitation (c), and subtropical-high-edge precipitation (d) during the 2020–2021 flood season.
Figure 7. Threat score (TS) of EM forecasts for flood season precipitation (a), frontal precipitation (b), monsoonal precipitation (c), and subtropical-high-edge precipitation (d) during the 2020–2021 flood season.
Atmosphere 14 01488 g007
Figure 8. Talagrand diagrams for model forecasts at lead times of 1 h (a), 7 h (b), 13 h (c), and 19 h (d).
Figure 8. Talagrand diagrams for model forecasts at lead times of 1 h (a), 7 h (b), 13 h (c), and 19 h (d).
Atmosphere 14 01488 g008
Figure 9. ROC distribution of model forecasts for South China precipitation during the 2020–2021 flood season at vrious lead times.
Figure 9. ROC distribution of model forecasts for South China precipitation during the 2020–2021 flood season at vrious lead times.
Atmosphere 14 01488 g009
Figure 10. Distribution of ROCA for different types of precipitation at various lead times.
Figure 10. Distribution of ROCA for different types of precipitation at various lead times.
Atmosphere 14 01488 g010
Figure 11. Difference between MAE of primary clusters (a) and sub-primary clusters (b) for precipitation forecasts during the 2020–2021 flood season and MAE of EM forecast.
Figure 11. Difference between MAE of primary clusters (a) and sub-primary clusters (b) for precipitation forecasts during the 2020–2021 flood season and MAE of EM forecast.
Atmosphere 14 01488 g011
Figure 12. Difference between MAE of primary clusters and sub-primary clusters for return-flow precipitation (a,e), monsoonal precipitation (b,f), frontal precipitation (c,g), and subtropical-high-edge precipitation (d,h) in precipitation forecasts and MAE of EM forecast.
Figure 12. Difference between MAE of primary clusters and sub-primary clusters for return-flow precipitation (a,e), monsoonal precipitation (b,f), frontal precipitation (c,g), and subtropical-high-edge precipitation (d,h) in precipitation forecasts and MAE of EM forecast.
Atmosphere 14 01488 g012
Figure 13. Horizontal distribution of MAE for EM forecast of each precipitation type.
Figure 13. Horizontal distribution of MAE for EM forecast of each precipitation type.
Atmosphere 14 01488 g013
Figure 14. Horizontal distribution of the difference between MAE of each precipitation type forecast using different sub-primary clustering strategies and MAE of EM forecast.
Figure 14. Horizontal distribution of the difference between MAE of each precipitation type forecast using different sub-primary clustering strategies and MAE of EM forecast.
Atmosphere 14 01488 g014
Figure 15. Difference in TS Scores between primary clusters (a) and EM forecast for precipitation (b) during the 2020–2021 flood season.
Figure 15. Difference in TS Scores between primary clusters (a) and EM forecast for precipitation (b) during the 2020–2021 flood season.
Atmosphere 14 01488 g015
Figure 16. The difference between the TS scores of cluster sub-primary clusters and the EM forecast for clear-rain (a) and heavy-precipitation (b) forecasts during the 2020–2021 flood season.
Figure 16. The difference between the TS scores of cluster sub-primary clusters and the EM forecast for clear-rain (a) and heavy-precipitation (b) forecasts during the 2020–2021 flood season.
Atmosphere 14 01488 g016
Table 1. Contingency table for binary classification.
Table 1. Contingency table for binary classification.
CategoryForecast PositiveForecast Negative
Observation positiveAC
Observation negativeBD
Table 2. ROC contingency table for probability forecast.
Table 2. ROC contingency table for probability forecast.
BinMember DistributionObserved OccurrencesObserved Non-Occurrences
1F = 0, NF = NO1NO1
2F = 1, NF = N − 1O2NO2
3F = 2, NF = N − 2O3NO3
N + 1F = N, NF = 0On + 1NOn + 1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, J.; Ren, P.; Chen, B.; Zhang, X.; Cai, H.; Li, H. Research on a Clustering Forecasting Method for Short-Term Precipitation in Guangdong Based on the CMA-TRAMS Ensemble Model. Atmosphere 2023, 14, 1488. https://doi.org/10.3390/atmos14101488

AMA Style

Zheng J, Ren P, Chen B, Zhang X, Cai H, Li H. Research on a Clustering Forecasting Method for Short-Term Precipitation in Guangdong Based on the CMA-TRAMS Ensemble Model. Atmosphere. 2023; 14(10):1488. https://doi.org/10.3390/atmos14101488

Chicago/Turabian Style

Zheng, Jiawen, Pengfei Ren, Binghong Chen, Xubin Zhang, Hongke Cai, and Haowen Li. 2023. "Research on a Clustering Forecasting Method for Short-Term Precipitation in Guangdong Based on the CMA-TRAMS Ensemble Model" Atmosphere 14, no. 10: 1488. https://doi.org/10.3390/atmos14101488

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop