5.2.1. Cluster Grading
Some hyperparameters need to be preset before clustering.
Fuzzy index m: James found that m is related to the convergence of the algorithm, and its value is linked to the quantity of sample data (
n). m should be greater than
, and its empirical range is
. Nikhil et al. believed that the optimal value for
m falls within [1.5, 2.5], and it is recommended to take a value of 2 [
23], based on the experimental verification of clustering algorithm’s effectiveness. The geometric characteristics of clustering vary with
m. Clustering exhibits Boolean characteristics when
m approaches 1, with most fuzzy membership degrees being either close to 0 or to 1. The fuzzy membership degree of the cluster resembles a Gaussian function when
m = 2. The function of fuzzy membership tends to peak as
m increases [
24]. Considering these insights,
m = 2 in the work.
The maximum number of iterations is set to 100 to ensure that the algorithm has enough opportunities to identify an appropriate clustering center.
Iteration precision is set to 10−5 to ensure the accuracy of clustering results
Status values undergo fuzzy processing using FCM based on data calculated by the entropy weight-TOPSIS method to yield a fuzzy membership matrix.
The relationship matrix is first initialized using the interruption number and the traffic interruption status value; the distance between the samples is calculated.
Then, its fuzzy affiliation matrix is obtained after 100 iterations by updating the affiliation matrix
and the cluster center matrix
V, using Equations (14) and (15). A portion of this matrix, specifically about the attributes of traffic interruption severity, is presented because the following discussion involves the clustering effects of varying number of clusters. This matrix shows the membership degree of each traffic interruption to levels 1–4. The sum of the membership degrees for each traffic interruption equals 1, which signifies the likelihood of the event belonging to different levels (
Table 5).
FCM converts quantitative attribute
into attribute set
represented by fuzzy attributes. Then, the clustering center of each attribute set (
) is determined to minimize the value function of non-similarity indices. Finally, grading results for four number of clusters are obtained (
Table 6).
The above table shows the grading results for varying number of clusters. The categorization of status values becomes more refined with the higher number of clusters, which facilitates a more precise evaluation of the severity of traffic interruptions.
5.2.2. Validity Analysis
The number of clusters should be preset in FCM clustering analysis, which relies on personal experience. However, this preset number does not guarantee an optimal clustering outcome. Therefore, the optimal number of clusters is determined through alternative methods to ensure the ideal clustering effect.
PE, a validity index specifically designed for evaluating the performance of fuzzy clustering analysis, quantifies the uncertainty or information entropy within the membership matrix [
25]. A lower PE value indicates less uncertainty in the clustering outcome and a more distinct classification of each data point’s affinity to a particular cluster. Calculating PE across various number of clusters can identify the optimal option, which helps to achieve the ideal clustering effect (Equation (16)).
where
is the number of data points;
is the number of clusters;
is the membership degree of the
jth sample (
) belonging to the
ith category.
Table 7 shows the PCE values for varying number of clusters.
When the number of clusters is changed, there is an impact on the results, and this impact can be discussed from three perspectives:
- (1)
Lesser number of clusters: Interruptions were categorized as “major interruptions” and “minor interruptions”. While this categorization simplifies the understanding of the data, it may hide some important nuances. For example, some highways were interrupted for the same reasons, but there were significant differences in the duration and mileage of the interruptions. Take the interruptions numbered 160 to 200 on the Sichuan–Tibet and Yunnan–Tibet highways as an example: although most of these events were caused by collapse and debris flow, they were grouped into the same level when n_cluster = 2 and into different levels when n_cluster = 4 and 5. This suggests that using a smaller number of clusters may overlook these specific differences, leading to an incomplete understanding of the actual situation.
Advantages: This simple categorization offers a quick decision-making tool during emergencies, allowing managers to quickly identify which events need immediate attention.
Disadvantages: However, this approach overlooks significant differences between events, potentially leading to inefficient resource allocation. For example, two highways might experience interruptions due to the same cause, but the interruption duration and the affected distance may differ significantly. A smaller number of clusters may lead to underestimating the severity of certain interruptions, affecting timely and appropriate responses.
- (2)
Moderate number of clusters: This provides a more detailed categorization, identifying levels of interruptions with similar causes of interruption. This helps to reveal which highways are more susceptible to specific conditions (e.g., severe weather or geologic hazards) and how they are affected differently.
Advantages: Moderate clustering helps distinguish between various types of interruption events, enabling managers to craft more appropriate emergency response plans. For example, some highways may be more vulnerable to weather conditions, while others may experience more frequent geological hazards.
Disadvantages: Although this level of clustering provides more detailed information, it may still have limitations. As the number of clusters increases, the similarities between some events may become diluted, possibly categorizing very similar events into different clusters and slightly increasing the complexity of the analysis.
- (3)
Higher number of clusters: The interruption classification can be further subdivided to show more specific features. However, when there are too many levels, the boundaries between different categories may become blurred, resulting in similar samples being assigned to different categories, reducing the validity of the clustering. For example, some events were categorized into level-1 and level-5 when n_cluster = 5.
Advantages: More clusters can help identify specific traffic interruption patterns and reveal more nuanced characteristics of the interruptions. For example, by using a large number of clusters, very rare but highly impactful events can be identified, aiding in the formulation of more precise recovery plans.
Disadvantages: However, with too many clusters, the classifications become overly detailed, leading to blurred boundaries between categories. This can cause very similar events to be inconsistently grouped into different categories, which could confuse decision-makers and make management more complex.
By analyzing the different numbers of clusters, we can observe that the gradual decrease in the PE values except for the two-level classification indicates that the validity of the clustering results has been improved. Specifically, the PE value is 0.043 for the two-level classification and 0.057 for the four-level classification. Although the two-level classification is more valid than the four-level classification in terms of PE value, we recommend the four-level classification scheme rather than the two-level scheme from a practical application point of view. The four-level classification scheme not only has a lower value of PE, but also ensures an accurate classification of the value of the traffic interruption status, contributing to a more comprehensive understanding of the severity and impact of the different highways to Tibet in the face of traffic interruptions. In the four-level classification, it can be found that the interruptions of the Sichuan–Tibet Highway (events 128–186) and Xinjiang–Tibet Highway (events 207–642) belong to level-3 and level-4 serious interruptions more than the other two highways, and most of the high-level interruptions of the Qinghai–Tibet Highway (events 1–127) are level-2 ordinary interruptions. The Yunnan–Tibet Highway, due to the limited data and its classification primarily in level-1, is not analyzed in detail in this paper.
The Sichuan–Tibet Highway was constructed in the 1950s, when lower construction standards and complex geological and climatic conditions made the route more prone to serious traffic disruptions. The Xinjiang–Tibet Highway passes through the world-famous Kunlun Mountains and the Himalayas, with an average altitude of more than 4500 m, making it the world’s highest and most dangerous plateau highway. It is also prone to serious traffic interruptions at levels-3 and level-4. The Qinghai–Tibet Corridor, on the other hand, is relatively less severe overall, despite the fact that there have been disruptions. These findings are important guidance for road management to develop targeted maintenance and emergency response measures. In particular, the Sichuan–Tibet Highway, with a lower construction standard and complex geo-climatic conditions, requires more attention and resource investment to improve its resilience and reduce the impact of traffic interruptions.