1. Introduction
Atmospheric circulation patterns and their variations have significant impacts on regional weather and climate. Therefore, accurately classifying and characterizing these patterns is crucial in weather analysis and climate research [
1,
2,
3,
4]. Circulation Type Classification (CTC) divides the circulation patterns into several Circulation Types (CTs) with significant differences, helping researchers identify key features of circulation patterns associated with weather events and climate change. CTC methods are typically categorized into three categories: subjective classification, objective classification, and hybrid classification [
5]. Subjective classification relies on meteorologists’ knowledge and experience, typically having strong specificity and being inefficient and unsystematic in handling large amounts of data. Hybrid classification attempts to automate the classification process based on predefined standards, offering some general applicability but often leading to redundant CTs [
6,
7]. In contrast, objective classification methods use algorithms to automatically analyze and classify atmospheric circulation patterns. enabling the systematic and adaptive capture of the main features and variations in circulation patterns, reducing human subjectivity. These methods typically offer better general applicability [
8,
9,
10]. With the growth of meteorological data and advancements in computer technology, various objective methods have emerged. Currently, mainstream objective classification methods include T-mode rotated principal component analysis (PCT), hierarchical clustering, K-means clustering, and self-organizing maps (SOM), etc. [
11,
12,
13,
14].
Different objective CTC methods attempt to extract the main features of circulation patterns through various metrics and strategies, aiming to obtain accurate and reasonable types that reflect circulation differences. There is a wide variety of objective CTC methods, each with its own advantages and disadvantages, making the selection of the most suitable method a challenge. Huth [
15] conducted an early and relatively comprehensive comparative study of objective CTC methods, comparing the effectiveness of five classification methods (correlation method, sums-of-squares method, average linkage, K-means, and PCT) in classifying 700 hPa geopotential heights in the European and adjacent Atlantic regions. The results showed that the CTC based on K-means clustering had stronger separability compared to other classification methods, but as the data dimensionality increased, distance metrics gradually became ineffective, making it difficult to find valid clusters, a phenomenon known as the “curse of dimensionality” [
16], which led to insufficient stability and reliability of the classification results [
14,
17]. Additionally, PCT has been shown to have advantages in reproducing predefined CTs, but it is more sensitive to outliers and can be influenced by extreme cases, causing shifts in the direction of the principal components [
18,
19]. These studies indicate that no single classification method performs best in all aspects (the no free lunch theorem [
20]). Furthermore, the European COST733 project (European Cooperation in Science and Technology) provided a more systematic evaluation of different CTC methods and, by establishing a consistent classification catalog, offered important references for weather and climate research in Europe [
5,
21].
China is located in the eastern part of Asia, with vast territory and complex geographical conditions, resulting in significant spatiotemporal variability in its circulation patterns. In the study of regional circulation patterns in China, some research based on objective classification methods has been conducted. For example, Liu et al. [
22] used SOM to classify the 500 hPa geopotential height over the Tibetan Plateau and correlated it with regional precipitation, revealing the impact of circulation changes on regional precipitation patterns from 1961 to 2010. Sun et al. [
23] applied hierarchical clustering to classify sea-level pressure, 850 hPa relative humidity, and wind fields during the formation and dissipation of pollution weather in the Sichuan Basin, revealing the influence of different meteorological patterns (such as high-pressure systems and weak high-pressure systems) on pollution events. Yi et al. [
6] used the K-means method to classify dust storm weather in northern China and found that certain cyclone patterns were closely related to the occurrence of dust storms. Liu et al. [
24] used SOM and K-means to classify sea-level pressure during ozone pollution events in the Guangzhou region from 2015 to 2022, finding strong consistency between the two methods in classification results. However, there is relatively little research on the applicability of CTC methods in China, and a systematic evaluation and comparison of the advantages and disadvantages of different methods has not been conducted. Choosing the appropriate CTC method to achieve more accurate classification has become a key issue in current research.
This study employs several mainstream objective CTC methods (PCT, Ward linkage, K-means, and SOM) to classify sea-level pressure fields in the China region, evaluating the applicability of each method from multiple perspectives. Additionally, an integrated classification approach combining PCT and K-means is proposed. The aim is to provide valuable guidance for selecting suitable CTC methods in future weather and climate research, both in China and globally.
3. Results
3.1. Determination of the Number of Types K
Before performing classification using PCTV, PCTO, Ward linkage, K-means, and SOM on the daily average sea-level pressure data of China from 1 January 1993 to 31 December 2023, it is necessary to manually set the number of CTs, K, i.e., the number of clusters. The elbow method based on the sum of squared errors is a common auxiliary approach for determining the number of clusters, but its effectiveness depends on the data distribution. When the data points are evenly distributed and the differences between clusters are clear, the elbow is easily identifiable. In contrast, when the data distribution is complex or contains outliers, the elbow may be unclear or even absent. In our study, due to the large spatial range, long time span (i.e., high feature dimensionality and many data points), and complex data distribution with some outliers, the elbow method fails. This makes it difficult to determine the optimal K value. However, since the focus of our study is to compare the effectiveness of different classification methods, the optimal K value is not necessary for our research. It is sufficient to determine a consistent K value, and we can then compare how different methods recognize the data structure and handle noise and outliers. Therefore, we chose K = 12, which strikes a balance between ensuring rich clustering and minimizing computational resource requirements, while also explaining over 70% of the variance in PCT. Additionally, the sea-level pressure data underwent anomaly pattern adjustment during preprocessing. This adjustment does not affect the classification results of Ward linkage, K-means, or SOM, which are based on Euclidean distance. However, for PCT, which uses a linear relationship metric, there is a significant difference between the results when using the anomaly pattern as opposed to the original pattern [
11]. The primary issue with using the original pattern in PCT is that it is influenced by the climatic mean state, causing most of the loadings to concentrate on the same principal component, leading to very few instances within certain CTs. This seems more like a detection method rather than a good classification method. In contrast, when using the anomaly pattern, the influence of the climatic mean state is removed, resulting in a more even distribution of loadings and more balanced results. This ensures consistency in the number of types across all methods, which is the main reason for using the anomaly pattern in PCT in this study.
3.2. Internal Metrics
Internal evaluation metrics do not rely on external labels or prior knowledge. They judge whether the clustering algorithm can reasonably partition the data into clusters with high similarity and distinguishable differences between clusters, based solely on the clustering results and the data itself. The effectiveness of internal evaluation metrics is crucial for CTC as a method within the clustering analysis framework. The two internal evaluation metrics (EDR and PCR) for the classification results of different methods are shown in
Figure 2. It can be observed that the PCT-based classification method consistently achieves higher PCR than the other methods, while the opposite is true for EDR. This indicates that PCT is stronger in capturing the “shape” characteristics but weaker in capturing the “value” characteristics, whereas Ward linkage, K-means, and SOM perform the opposite. This is not surprising, as we know from the method explanation that PCT is based on linear relationship metrics, while Ward linkage, K-means, and SOM are based on Euclidean distance metrics. This suggests that, in specific studies, the appropriate classification method should be chosen based on the focus of the research (whether on the correlation or intensity of circulation).
Between the two rotation methods in PCT, the orthogonal rotation method Varimax outperforms the oblique rotation method Oblimin in both PCR and EDR, which seems to contradict existing studies (which suggest that oblique rotations capture more realistic spatial modes). However, we still attempt to provide a reasonable explanation for this result: oblique rotations do not constrain the orthogonality of the principal components, meaning that after rotation, the principal component scores (PC scores) are correlated with each other. Additionally, oblique rotations tend to bring as many sample points as possible closer to the principal components, which results in more balanced variance of the scores, as shown in
Figure 3. This is why oblique rotations can capture more realistic spatial modes (which are correlated and more balanced in reality). However, capturing more realistic modes does not necessarily lead to better clustering performance. We know that loadings represent the correlation between the spatial modes of the original sample points and the scores of the principal components. The more similar the scores of the principal components are, the more likely it is that the original sample points become indistinguishable (at least in the strategy of using the maximum absolute loading for classification). Classification requires not only more realistic but also more separable principal component projections. The figure demonstrates the correlation between the principal component scores for each rotation method, which to some extent supports this argument.
Ward linkage, K-means, and SOM are all methods based on Euclidean distance metrics for CTC. Among them, Ward linkage shows poorer performance in both EDR and PCR, but because it does not involve iterative optimization, it has a result stability that the other two methods do not have. In other words, given a fixed dataset, the classification results are also fixed. The EDR and PCR values for K-means and SOM in
Figure 2 represent the average of 100 runs. K-means selects random samples as the initial centroids and sets a convergence threshold of 0.001 or a maximum of 300 iterations. SOM selects random samples as the initial weights for the neurons, with the initial learning rate randomly set within the range of [0.01, 0.99], and the initial neighborhood radius randomly set within the range of [1, 3], training for 10 epochs. It can be seen that both methods have similar average EDR and PCR values, but SOM has greater parameter tuning difficulty and a more complex training process. Meanwhile, K-means requires the least computational resources among the three methods.
3.3. Continuity
Large-scale weather patterns tend to exhibit both continuous and stable evolution processes, as well as rapid transitional changes. This pattern aligns with the geostrophic adjustment theory and practical experience. Therefore, good classification results should demonstrate more continuity and fewer isolated patterns.
Table 1 shows the proportion of different duration patterns in the classification process, including isolated events lasting only 1 day, short events lasting 2–3 days, medium events lasting 4–7 days, long events lasting 8–15 days, and super-long events lasting more than 16 days. From the table, it can be observed that short-term events account for the highest proportion across all methods, with the PCTO method exhibiting a particularly large proportion of short-term events in its classification results. At the same time, PCTO shows the most isolated events and the fewest medium, long, and super-long events, indicating weaker classification continuity. In contrast, the K-means method exhibits the fewest isolated events and performs relatively well in terms of medium and long-term events. However, since the true classification result cannot be determined, the possibility of false super-long events cannot be ruled out. Nonetheless, from the perspective of minimizing isolated events, K-means demonstrates the most reasonable continuity in its classification structure.
3.4. Seasonal Variability
The dominant circulation patterns typically vary across different seasons, and seasonal variability becomes an important indicator for evaluating the effectiveness of these methods. By analyzing the variability of CTs across different seasons, we can identify which methods reflect seasonal circulation pattern changes and which methods are unaffected by seasonal factors, leading to suboptimal classification results.
Figure 4 shows the seasonal variability of the classification results for five methods. The variability can be roughly categorized into three levels. Methods with higher variability include Ward linkage, K-means, and SOM, where the CTs in summer and winter do not overlap, and certain CTs appear only in specific seasons, showing significant seasonal variation. Next are PCT-V, where there is some overlap of CTs between winter and summer, but it still shows some seasonal variation. The methods with lower variability are PCTO, where the differences between CTs across seasons are minimal, and most CTs appear in all seasons, with the results generally being more uniform. It can be observed that classification methods based on Euclidean distance measurements are better at capturing seasonal variability compared to those based on linear relationship measurements.
3.5. Separability of Meteorological Elements
The effectiveness of CTC methods is not only reflected in their ability to accurately capture the differences between circulation patterns themselves but also in whether the classification results (CTs) implicitly capture the differences in certain related meteorological variables. To evaluate the quality of CTC methods, it’s necessary to assess whether there are significant differences in the relevant meteorological variables between the obtained CTs, especially the separability of key meteorological variables such as temperature and precipitation. If a CTC method produces CTs that clearly show changes in certain meteorological variables, the classification performance of the method is considered superior. On the other hand, if a method fails to effectively distinguish these meteorological elements, resulting in overlap or confusion between different CTs, its classification accuracy and practical value will be significantly reduced.
3.5.1. Temperature
Temperature, as a sensitive variable responding to circulation patterns, often exhibits significant differences under different circulation patterns. For example, when a region is under the control of the westerlies, the temperature may be relatively mild, while in high-pressure ridge areas or surface high-pressure systems, the temperature is typically lower. By examining whether there are significant temperature differences in the CTs, we can assess the usability of the CTs.
Figure 5 displays the distribution characteristics of the 2 m temperature mean in the central region of circulation fields (36° N–39° N, 103° E–107° E) belonging to different CTs from different CTC methods. Ideal CTs should have the following characteristics: the temperature distribution under the same CT should be highly concentrated, while the distributions of different CTs should be clearly distinguished, with minimal overlap. Therefore, the box plot should show relatively short boxes (indicating stronger concentration), and the overlap between the boxes of different CTs should be as small as possible, reflecting the superiority of CTs in capturing temperature differences. It can be observed that the methods such as Ward linkage, K-means, and SOM generally have shorter boxes and more dispersed box distributions compared to methods based on PCT.
3.5.2. Precipitation
In addition to examining the distribution characteristics of temperature, the separability of precipitation is also an important evaluation metric. As a key manifestation of weather systems, precipitation’s spatiotemporal distribution is significantly influenced by circulation patterns. Similarly to temperature fields, in an ideal CTC method, different CTs should effectively separate distinct precipitation characteristics. Specifically, precipitation distributions under the same CT should exhibit high similarity, while precipitation patterns under different CTs should show significant differences.
Figure 6 shows the distribution characteristics of the precipitation mean in the central region of circulation fields (36° N–39° N, 103° E–107° E) belonging to different CTs from different CTC methods. Ideal CTs should have the same characteristics as those in the temperature field. However, in practice, there is generally overlap between the boxes of all methods in the low-value region. This is because, compared to temperature, precipitation is a variable with a practical lower boundary, and this drop can be easily and commonly reached. Therefore, it is difficult to judge the effectiveness of classification methods simply based on box overlap. However, we can still assess whether the precipitation distribution under the same CT is concentrated by examining the length of the boxes. It can be observed that methods such as Ward linkage, K-means, and SOM generally have more extremely short boxes compared to methods based on PCT, indicating that these methods have better precipitation separability (at least to some extent, reflecting whether precipitation occurs).
3.6. Sensitivity of Temporal and Spatial Resolution
In order to assess the stability of different CTC methods, this study conducts sensitivity experiments on the spatial and temporal resolutions of the data. The aim is to investigate whether these changes significantly affect the stability and consistency of the classification results. If the results of a classification method show minimal variation under different resolutions, it indicates that the method has strong stability with respect to spatiotemporal resolution. Conversely, significant changes in results may suggest that the method is sensitive to variations in spatiotemporal resolution and has poor stability.
3.6.1. Spatial Resolution
The accuracy and reliability of classification results are often significantly influenced by the spatial resolution of the data. Spatial resolution largely determines whether the classification method can accurately capture the details of circulation features. Higher spatial resolution typically reveals more refined circulation structures, whereas lower resolution may result in the loss of local variations, thereby affecting the accuracy and precision of the classification results. Therefore, this study reduces the spatial resolution of the data from 0.25° × 0.25° to 1° × 1°, evaluating the performance of different CTC methods at lower spatial resolutions to assess their stability and reliability under various resolution conditions.
Figure 7 shows the normalized confusion matrix, where the classification results at the original resolution (Control group, Ctrl) are considered the true values, and the results at the lower resolution (Experimental group, Exp) are considered the predicted values. The percentage in the upper-right corner represents the macro-average recall, which is the mean of the main diagonal elements. The results indicate that the macro-average recall is higher for the PCT method, especially for PCT-V. Among the three methods based on Euclidean distance metrics, Ward Linkage excessively relies on the Euclidean distance between the data and lacks an adjustment or optimization process, leading to a lower macro-average recall. In contrast, K-means and SOM, although using an iterative strategy, result in some fluctuation in the classification outcomes. However, the fluctuation is relatively small, and they still exhibit better spatial stability.
3.6.2. Temporal Resolution
Since CTC essentially involves clustering at time points, time resolution has a more direct impact on the classification results. Within a fixed period, the size of the time resolution determines the number of circulation patterns. Fewer circulation patterns may lead to incomplete identification or increased errors in recognizing circulation patterns due to insufficient temporal information. On the other hand, more circulation patterns help improve classification accuracy by capturing richer details of circulation variations. Based on this, this study reduces the time resolution from 1 day to 4 days to investigate the stability of circulation classification under different numbers of circulation patterns.
Figure 8 is similar to
Figure 7 but focuses on time resolution sensitivity. It can be observed that, compared to spatial resolution, these 9 classification methods are more sensitive to time resolution. Even PCT-V, which performs excellently in terms of spatial resolution, experiences a significant reduction here but still achieves the highest recall rate. Among the three methods based on Euclidean distance metrics, Ward Linkage still exhibits the worst time stability. This hierarchical clustering result, which gradually merges the data, is highly dependent on the data, making the results fully reproducible on a fixed dataset, but seemingly also increasing the data sensitivity.
3.7. Method Optimization
As mentioned in
Section 2, PCT classifies based solely on the maximum absolute load at each time point, ignoring the loadings of other principal components. While this method may filter out some noise to a certain extent, it also loses a significant amount of information. To address this, the study proposes using K-means to perform more refined classification of principal component loadings, treating PCT as a feature extractor.
Table 2 shows the changes in the internal evaluation metrics for the improved PCTV and PCTO methods, where both EDR and PCR show considerable increases.
In iterative optimization algorithms such as K-means and SOM, the initialization of centroids and neurons is crucial. This study proposes the use of principal component scores extracted by PCT as the initial centroids and neuron weights for these algorithms. We use normalized principal component scores and their negative modes superimposed with the climate mean state as the initialization centroids and neurons for K-means and SOM. For K-means, initialization using PCTV and PCTO outperformed the average results (after 100 runs) in terms of both EDR and PCR, with the final results using PCTV slightly higher than PCTO. For SOM, the results from initializing with PCTV and PCTO showed no difference, indicating that SOM is less sensitive to initialization than K-means. However, this does not mean that SOM is more stable; rather, SOM shifts its instability to being more sensitive to other hyperparameters (such as learning rate and neighborhood radius) and the training process (such as sample order and decay functions). Although these initialization methods did not lead to significant performance improvements, they helped stabilize the algorithm’s results and ensure reproducibility in situations where other hyperparameters remain unchanged.
4. Discussion
This paper applies several commonly used objective CTC methods to classify sea level pressure fields in the China region, evaluating the performance of different methods in terms of internal metrics, persistence, seasonal variability, separability of related meteorological elements, and spatiotemporal stability. Although our study is based on the China region, the study area primarily serves as the weather and climate context for evaluation, rather than the main focus of the research itself. Therefore, our conclusions are also of certain general applicability and provide valuable insights.
PCT, Ward linkage, K-means, and SOM can be categorized into two groups based on their metric criteria: linear relationship metrics and Euclidean distance metrics. Methods based on linear relationships (PCT) tend to group more correlated circulation patterns together, which better captures the “shape similarity” of the circulation patterns. This is more important when conducting studies on larger-scale circulation features. In contrast, methods based on Euclidean distance (Ward linkage, K-means, and SOM) focus more on the overall intensity of the circulation patterns, which better captures the “value proximity” of the patterns, which is more advantageous for studies of medium and small-scale circulation features. Further research results indicate that classifications based on Euclidean distance better align with atmospheric patterns, such as having longer persistence, larger seasonal variability, and better differentiation of certain meteorological elements. This suggests that when considering factors like seasonal variability, methods such as K-means would be a good choice. Conversely, PCT would be a better choice for studies that do not require such considerations. In the PCT method, although oblique rotation can capture more realistic circulation patterns, it does not seem to offer an advantage in subsequent classifications. In fact, the correlation of principal components may lead to suboptimal classification results, and the spatiotemporal stability of the classification decreases.
Among Ward linkage, K-means, and SOM, all of which are based on the Euclidean distance metric, Ward linkage provides stable and reproducible results with fixed datasets, while the latter two methods, due to their iterative optimization process, produce slightly different results each time. Nevertheless, Ward linkage is more sensitive to spatial and temporal resolution, and varying data resolutions can significantly affect its results. In contrast, the latter two methods maintain relatively high spatiotemporal stability as long as the convergence conditions are consistent. Among these, K-means, as a simple and effective classification method, has advantages in both classification results and computational efficiency. While SOM produces classification results similar to K-means, it involves a more complex parameter tuning process.
Additionally, the study considers the combined application of multiple methods to leverage their respective advantages and address the limitations of individual methods. In this research, PCT and K-means are combined. One strategy is to use K-means to help PCT cluster the loadings, thereby capturing detailed information. From the internal metrics of the classification results, this strategy improves the intra-cluster tightness and inter-cluster separation. Another strategy is to use the circulation patterns captured by PCT to replace the initial centroids in K-means, which makes the classification results reproducible while maintaining the original performance. A similar approach is applied to SOM, where the circulation patterns captured by PCT replace the initial neurons in SOM, making the results more stable. However, consistency cannot be fully achieved as it is in K-means, because SOM is also more influenced by the learning rate, neighborhood radius, and the order of training samples.
It is hoped that the comparison of these methods will provide theoretical and technical support for future climate research and extreme weather forecasting in China and similar regions globally. Future research can further explore the combined application of different classification methods to better serve climate prediction and the development of climate change adaptation strategies for the China region.