Importance Classification Method for Signalized Intersections Based on the SOM-K-GMM Clustering Algorithm
Abstract
:1. Introduction
- To develop a multi-dimensional evaluation system that incorporates network topology, traffic flow characteristics, and socio-economic factors to comprehensively assess intersection importance.
- To apply an optimized clustering algorithm that effectively classifies intersections into different importance levels while considering temporal traffic variations.
- To validate the proposed framework using real-world traffic data from 40 intersections in Zibo City and compare its effectiveness against existing methods.
2. Methods
2.1. Framework for Intersection Importance Classification
2.2. Construction of TNS Index System
2.2.1. Traffic Flow Dimension
- Intersection Saturation
- 2.
- Traffic balance degree
2.2.2. Road Network Topology Dimension
- Node mileage coverage
- 2.
- Node energy intensity
- 3.
- Node efficiency betweenness
2.2.3. Socio-Economic Dimension
2.3. Construction of the Clustering Algorithm Model
2.3.1. SOMs Neural Network Coarse Clustering
2.3.2. K-Means Fine Clustering
2.3.3. GMM Probability Verification
- Formula composition
- 2.
- Parameter estimation
2.4. Intersection Classification Based on Mahalanobis Clustering
3. Example Verification and Discussion Analysis
3.1. Database Construction
3.2. Indicator Calculation
3.2.1. Calculation Process of Intersection Saturation Degree
3.2.2. Spatial Characteristics of Intersection Index
- Traffic flow dimension
- ➀
- As shown in Figure 5a and Figure 6, the scores for the intersection saturation indicator under static conditions exhibited distinct differences while maintaining a relatively uniform distribution. Approximately 50% of intersections fell within the range of 0.31–0.79, demonstrating clustering tendencies and continuity across the road network. This indicates that 90% of intersections within the study area operated under smooth or slightly congested traffic conditions, while a few intersections experienced congestion. Notably, these congestion-prone nodes significantly impacted the normal operation of the traffic network and should be prioritized for further analysis and management.
- ➁
- According to Figure 5b and Figure 6, the maximum value of the normalized flow balance index was 0.49, the middle index was 0.31, and 50% of the intersections were distributed in the range of 0.18–0.38 (except for individual intersections). From the point of view of node saturation, the equilibrium degree of nodes with higher saturation was relatively low, which indicates that when a node bears higher traffic pressure, the flow distribution in all directions is unbalanced. This proves that it is necessary to add the flow equilibrium degree when constructing the traffic flow dimension.
- Road network topology dimension
- ➀
- As shown in Figure 5c,d and Figure 6, the degree and energy intensity of each node in the study area were similar, with a maximum value of 1 and minimum values of 0.70 and 0.71, which proves that the intersection was evenly distributed in space. This thus meant that the traffic conditions in the study area were representative and universal. It also reflected the fact that the connectivity between the nodes of the traffic network was good, which means that the vehicles could flow smoothly between different intersections. However, the data distribution of the two was different. The median value of the degree in the node was 0.81, and the distribution was scattered, while the median value of the node energy strength was as high as 0.86. In addition, the data distribution showed obvious aggregation, indicating that there were key nodes in the network. They had high connection ability and played an important role in the overall performance and stability of the road network. The distribution concentration of node energy intensity may imply that some of the nodes in the network played a core role in information dissemination and resource allocation. At the same time, the degree of dispersion in the nodes may indicate that the connections in the network were more uniform and that there was no obvious centralization trend.
- ➁
- As shown in Figure 5e and Figure 6, 80% of the data distribution was 0.25–0.63, which was in the medium situation, except for the high value of the node performance betweenness at individual points. This showed that most of the nodes were of moderate importance in the traffic network, which was neither an absolute traffic bottleneck nor a completely irrelevant node. This medium level of node effectiveness betweenness may mean that the traffic flow distribution in the traffic network was relatively balanced and that there was no excessive concentration or dispersion.
- Socio-economic dimension
3.3. Importance Classification of Intersections
3.3.1. Comparison Under the Same Conditions
- SOMs neural network rough clustering
- 2.
- K-Means method for fine clustering
- 3.
- GMM Gaussian mixture model validation
- 4.
- Importance classification results of the intersections
- 5.
- The results of clustering analysis
- ➀
- As shown in Figure 11a,b, within the traffic flow dimension, the results for the intersection saturation indicator reveal that critical intersections have the highest values, followed by important intersections. This finding indicates that higher intersection saturation corresponds to greater intersection importance. Meanwhile, regarding the traffic balance degree, larger values represent more evenly distributed flows across different directions. However, both critical and important intersections exhibit relatively low flow balance, implying that unbalanced traffic may lead to congestion and thus underscore the significance of these intersections. Overall, the clustering outcomes for these two indicators align well with actual conditions.
- ➁
- As illustrated in Figure 11b–d, within the road network topology dimension, the three indicators—node mileage coverage, node energy intensity, and node efficiency betweenness—exhibit a similar pattern. Critical intersections and important intersections display relatively high values, whereas secondary and normal intersections present lower values. These findings indicate that larger values in these metrics correspond to greater capacity in the road network topology dimension, thus reflecting a higher level of intersection importance.
- ➂
- As depicted in Figure 11f, within the socio-economic dimension, the normalized results of the node vitality indicator reveal that intersections situated in areas with higher economic vitality exhibit higher vitality values. Consequently, such intersections draw greater traffic flows, which in turn elevates their importance level. These findings underscore that an intersection’s significance is closely tied to the surrounding socio-economic context.
- ➃
- As illustrated in Figure 11, there is no clear boundary among intersections of different importance levels in certain indicators, particularly when comparing secondary intersections with normal intersections. Therefore, relying solely on a single indicator or subjective evaluation methods for intersection importance classification is generally insufficient. A comprehensive consideration of multiple intersection attributes is necessary to achieve an objective importance ranking.
3.3.2. Comparison Under Different Conditions
- Temporal Variation Characteristics of the Traffic Node Distribution
- 2.
- Dynamic Evolution of Node Importance Over Time
3.3.3. Comparison of the Same Intersection Across Different Conditions
- Intersections No. 17 and No. 40 maintained consistent classification across all time periods, including the static state, morning peak, off-peak, and evening peak periods. This indicates that these intersections remain stable within the road network. Intersection No. 17 has consistently been classified as a key intersection, demonstrating its crucial role in the road network. Traffic management authorities should prioritize its monitoring and continuously optimize its operation to ensure smooth traffic flow. Intersection No. 40 has consistently been classified as a secondary intersection, indicating relatively low importance. Traffic management departments only need to allocate minimal resources to ensure its normal operation. This allows more resources and attention to be focused on key intersections, thereby enhancing overall road network efficiency and preventing unnecessary resource wastage.
- Intersections No. 29 and No. 34 showed classification fluctuations throughout the day. In the static classification, Intersection No. 29 was categorized as an ordinary intersection, while Intersection No. 34 was classified as an important intersection. However, during different time periods throughout the day, such as the morning and evening peak hours, their classifications changed. These fluctuations are primarily caused by variations in traffic flow within the road network. The changes in importance classification during peak hours suggest that these intersections require increased attention. Traffic management authorities should allocate resources dynamically based on these variations to ensure smooth and safe traffic flow within the network.
3.4. Clustering Algorithm Effect Evaluation
4. Conclusions
- The TNS evaluation index system assesses signalized intersections from three key perspectives. Analysis of six normalized indicators highlights the significant influence of traffic organization, lane division, road structure, and surrounding economic conditions on intersection classification, providing a basis for the importance classification of signalized intersections.
- Compared with the traditional K-Means method and other optimization methods, the method proposed in this study applied the SOMs method for rough clustering to determine the k value, thus avoiding the iterative process of exploring the k value; in addition, the GMM method was used to calculate the cluster attribution rate of each data point, which overcomes the limitation of the K-Means spherical clustering region.
- The proposed method achieved a silhouette coefficient of 0.737, representing a 78.1% improvement over the standalone SOMs method, a 65.2% improvement over the K-Means method, an 11.5% improvement over the SOM-K-Means method, and a 4.6% improvement over the modified fuzzy C-means method. This demonstrates its superior robustness and accuracy.
- This study establishes an importance-based classification scheme for signalized urban intersections, offering substantial practical implications for urban traffic management and sustainable development. The specific details are as follows:
- ➀
- Provide data-driven support for transportation authorities to allocate resources more efficiently, thereby reducing traffic resource waste.
- ➁
- Offer a foundation and direction for upgrading and renovating urban intersection infrastructure.
- ➂
- Adapt to temporal fluctuations, enabling more refined and dynamic decision-making in urban traffic management.
- Insufficient sample coverage: The sample range of this study was relatively limited as it was mainly concentrated at intersections in specific areas. Future research can be extended to more urban and regional intersections to enhance the universality and practicality of the method.
- Computational complexity: Although the SOM-K-GMM method improves the clustering efficiency to a certain extent, it may still face the problem of high computational complexity when dealing with particularly large-scale datasets. Future research can further optimize the algorithm, improve its computational efficiency, and reduce resource consumption.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Barthelemy, M. The role of parsimonious models in addressing mobility challenges. NPJ Sustain. Mobil. Transp. 2024, 1, 11. [Google Scholar] [CrossRef]
- Wu, P.; Chen, T.; Wong, Y.; Meng, X.; Wang, X.; Liu, W. Exploring key spatio-temporal features of crash risk hot spots on urban road network: A machine learning approach. Transp. Res. Part A Policy Pract. 2023, 173, 103717. [Google Scholar] [CrossRef]
- Saberi, M.; Lilasathapornkit, T. Scalability challenges of machine learning models for estimating walking and cycling volumes in large networks. npj Sustain. Mobil. Transp. 2024, 1, 8. [Google Scholar] [CrossRef]
- Sun, C.; Pei, X.; Hao, J.; Wang, Y.; Zhang, Z.; Wong, S. Role of road network features in the evaluation of incident impacts on urban traffic mobility. Transp. Res. Part B Methodol. 2018, 117, 101–116. [Google Scholar] [CrossRef]
- Kang, A.; Oh, J. The configuration and evolution of Korean automotive supply network: An empirical study based on k-core network analysis. Oper. Manag. Res. 2023, 16, 1251–1270. [Google Scholar] [CrossRef]
- Lalou, M.; Tahraoui, M.; Kheddouci, H. The Critical Node Detection Problem in networks: A survey. Comput. Sci. Rev. 2018, 28, 92–117. [Google Scholar] [CrossRef]
- Qi, X.; Fuller, E.; Wu, Q.; Wu, Y.; Zhang, C. Laplacian centrality: A new centrality measure for weighted networks. Inf. Sci. 2012, 194, 240–253. [Google Scholar] [CrossRef]
- Huang, W.; Li, H.; Yin, Y.; Zhang, Z.; Xie, A.; Zhang, Y.; Cheng, G. Node importance identification of unweighted urban rail transit network: An Adjacency Information Entropy based approach. Reliab. Eng. Syst. Saf. 2024, 242, 109766. [Google Scholar] [CrossRef]
- Yang, Y.; Ye, Z.; Zhao, H.; Meng, L.; Xiao, Y. GFNC: Unsupervised Link Prediction Based on Gravitational Field and Node Contraction. IEEE Trans. Comput. Soc. Syst. 2023, 10, 1835–1851. [Google Scholar] [CrossRef]
- Taylor, D.; Myers, S.A.; Clauset, A.; Porter, M.A.; Mucha, P.J. Eigenvector-Based Centrality Measures for Temporal Networks. Multiscale Model. Simul. 2017, 15, 537–574. [Google Scholar] [CrossRef]
- Gómez, D.; González-Arangüena, E.; Manuel, C.; Owen, G.; del Pozo, M.; Tejada, J. Centrality and power in social networks: A game theoretic approach. Math. Soc. Sci. 2003, 46, 27–54. [Google Scholar] [CrossRef]
- Zhao, T.; Li, M.; Dong, H.; Su, F.; Zhang, Z. Analysis of Urban Road Traffic Network Based on Complex Network. Procedia Eng. 2016, 137, 537–546. [Google Scholar] [CrossRef]
- Liu, W.; Li, X.; Liu, T.; Liu, B. Approximating betweenness centrality to identify key nodes in a weighted urban complex transportation network. J. Adv. Transp. 2019, 2, 1. [Google Scholar] [CrossRef]
- Lv, W.; Tang, W.; Huang, H.; Chen, T. Research and application of intersection clustering algorithm based on PCA feature extraction and k-means. J. Phys. Conf. Ser. 2021, 1861, 012001. [Google Scholar] [CrossRef]
- Reyes, G.; Tolozano-Benites, R.; Lanzarini, L.; Estrebou, C.; Bariviera, A.F.; Barzola-Monteses, J. Methodology for the Identification of Vehicle Congestion Based on Dynamic Clustering. Sustainability 2023, 15, 16575. [Google Scholar] [CrossRef]
- Mingoti, S.A.; Lima, J.O. Comparing SOM neural network with Fuzzy c-means, K-means and traditional hierarchical clustering algorithms. Eur. J. Oper. Res. 2005, 174, 1742–1759. [Google Scholar] [CrossRef]
- Sarle, S.W. Finding Groups in Data: An Introduction to Cluster Analysis. J. Am. Stat. Assoc. 1991, 86, 830–832. [Google Scholar] [CrossRef]
- Abdulsahib, A.K.; Balafar, M.A.; Baradarani, A. DGBPSO-DBSCAN: An Optimized Clustering Technique based on Supervised/Unsupervised Text Representation. IEEE Access 2024, 12, 110798–110812. [Google Scholar] [CrossRef]
- Wang, Z.; Hu, L.; Wang, F.; Lin, M.; Wu, N. Assessing the Impact of Different Population Density Scenarios on Two-Wheeler Accident Characteristics at Intersections. Sustainability 2024, 16, 1737. [Google Scholar] [CrossRef]
- Kohonen, T. Self-organized formation of topologically correct feature maps. Biol. Cybern. 1982, 43, 59–69. [Google Scholar] [CrossRef]
- Cai, J.; Zhang, Y.; Wang, S.; Fan, J.; Guo, W. Wasserstein embedding learning for deep clustering: A generative approach. IEEE Trans. Multimed. 2024, 26, 7567–7580. [Google Scholar] [CrossRef]
- Zhao, Z.; Liang, X.; Huang, H.; Wang, K. Deep federated learning hybrid optimization model based on encrypted aligned data. Pattern Recognit. 2024, 148, 110193. [Google Scholar] [CrossRef]
- Huang, X.; Chen, J.; Cai, M.; Wang, W.; Hu, X. Traffic node importance evaluation based on clustering in represented transportation networks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16622–16631. [Google Scholar] [CrossRef]
- Moradi, H.; Sasaninejad, S.; Wittevrongel, S.; Walraevens, J. Dynamically estimating saturation flow rate at signalized intersections: A data-driven technique. Transp. Plan. Technol. 2023, 46, 160–181. [Google Scholar] [CrossRef]
- Thiesmeier, R.; Skyving, M.; Möller, J.; Orsini, N. A probabilistic bias analysis on the magnitude of unmeasured confounding: The impact of driving mileage on road traffic crashes. Accid. Anal. Prev. 2023, 191, 107144. [Google Scholar] [CrossRef]
- Sun, B.; Zhang, Q.; Wei, N.; Jia, Z.; Li, C.; Mao, H. The energy flow of moving vehicles for different traffic states in the intersection. Phys. A: Stat. Mech. Its Appl. 2020, 605, 128025. [Google Scholar] [CrossRef]
- Kirkley, A.; Barbosa, H.; Barthelemy, M.; Ghoshal, G. From the betweenness centrality in street networks to structural invariants in random planar graphs. Nat. Commun. 2018, 9, 2501. [Google Scholar] [CrossRef]
- Jin, J.; Song, Y.; Kan, D.; Zhang, B.; Yan, L.; Zhang, J.; Lu, H. Learning context-aware region similarity with effective spatial normalization over Point-of-Interest data. Inf. Process. Manag. 2024, 61, 103673. [Google Scholar] [CrossRef]
- Naskath, J.; Sivakamasundari, G.; Begum, A.A.S. A study on different deep learning algorithms used in deep neural nets: MLP SOM and DBN. Wirel. Pers. Commun. 2023, 128, 2913–2936. [Google Scholar] [CrossRef]
- Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Jia, H. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. 1977, 39, 1–22. [Google Scholar] [CrossRef]
- Hansen, T.F.; Aarset, A. Unsupervised machine learning for data-driven rock mass classification: Addressing limitations in existing systems using drilling data. Rock Mech. Rock Eng. 2024. [Google Scholar] [CrossRef]
- Putra, D.M.; Abdulloh, F.F. Comparison of Clustering Algorithms: Fuzzy C-Means, K-Means, and DBSCAN for House Classification Based on Specifications and Price. J. Appl. Inform. Comput. 2024, 8, 509–515. [Google Scholar] [CrossRef]
Collection Entries | Concrete Form | Explanation |
---|---|---|
Intersection number | XX | Intersection number. |
Identification number | *C51*** | The license plate number identified by the electric police (a license plate number that cannot be identified is logged as ‘–’ or ‘no license plate’). |
Type of license plate | 02 | Different types of vehicle license plate categories: 01 for large cars; 02 for small cars, etc. |
Import road no. | 1 | The intersection entrance lane number, where 1 is the east entrance; 2 is for western imports; 3 is the north import; and 4 is the south import. |
Lane number | 1 | The lane number of the entrance lane where the vehicle is located, and the number of the entrance lane of each intersection from left to right increases from 1. |
Elapsed time | 13 December 2023 07:00:04 | The time when the vehicle passes through the intersection stop line. |
Collection places | XX road–XX road intersection Electric police North to south | The acquisition location is fixed, and the subsequent auxiliary calculation of speed, flow, and other indicators can be calculated. |
KMO and Bartlett Tests | ||
---|---|---|
KMO sampling suitability quantity | - | 0.867 |
Bartlett test | Approximate chi-square | 2884.139 |
degree of freedom | 276 | |
significance | 0 |
Component | Initial Eigenvalue | Extract the Load Sum of Squares | ||||
---|---|---|---|---|---|---|
Total | Variance Proportion | Cumulative (%) | Total | Variance Proportion | Cumulative (%) | |
1 | 22.119 | 92.163 | 92.163 | 22.119 | 92.163 | 92.163 |
2 | 0.659 | 2.746 | 94.91 | |||
3 | 0.446 | 1.856 | 96.766 | |||
4 | 0.233 | 0.971 | 97.738 | |||
5 | 0.130 | 0.543 | 98.28 |
Clustering Center Point and Its Coordinates | The Optimal Number of Neurons | |||
---|---|---|---|---|
Clustering Center | Traffic Flow Index | Structural Topological Index | Node Activity | 4 |
O1 | 0.42740958 | 0.74183229 | 0.29761905 | |
O2 | 0.43941927 | 0.66642738 | 0.06885271 | |
O3 | 0.49875811 | 0.76441887 | 0.91428571 | |
O4 | 0.39724298 | 0.65961679 | 0.56190476 |
Clustering Center | Mahalanobis Distance | Billing | Importance Degree | Quantity |
---|---|---|---|---|
O1 | 7.49294403 | 3 | Secondary intersection | 17 |
O2 | 6.79285047 | 4 | Ordinary intersection | 9 |
O3 | 8.08415443 | 1 | Key intersection | 6 |
O4 | 7.69146252 | 2 | Important intersection | 8 |
Levels | Static Periods | Off-Peak Period | Morning Peak | Evening Peak |
---|---|---|---|---|
Key intersection | 6 | 7 | 8 | 8 |
Important intersection | 8 | 9 | 8 | 9 |
Secondary intersection | 17 | 6 | 13 | 11 |
Ordinary intersection | 9 | 18 | 11 | 12 |
Algorithm | SOMs | K-Means | SOM-K-Means | FCM | SOM-K-GMM |
---|---|---|---|---|---|
Silhouette coefficient | 0.414 | 0.446 | 0.661 | 0.705 | 0.737 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Z.; Chen, Y.; Guo, D.; Jiao, F.; Zhou, B.; Sun, F. Importance Classification Method for Signalized Intersections Based on the SOM-K-GMM Clustering Algorithm. Sustainability 2025, 17, 2827. https://doi.org/10.3390/su17072827
Yang Z, Chen Y, Guo D, Jiao F, Zhou B, Sun F. Importance Classification Method for Signalized Intersections Based on the SOM-K-GMM Clustering Algorithm. Sustainability. 2025; 17(7):2827. https://doi.org/10.3390/su17072827
Chicago/Turabian StyleYang, Ziyi, Yang Chen, Dong Guo, Fangtong Jiao, Bin Zhou, and Feng Sun. 2025. "Importance Classification Method for Signalized Intersections Based on the SOM-K-GMM Clustering Algorithm" Sustainability 17, no. 7: 2827. https://doi.org/10.3390/su17072827
APA StyleYang, Z., Chen, Y., Guo, D., Jiao, F., Zhou, B., & Sun, F. (2025). Importance Classification Method for Signalized Intersections Based on the SOM-K-GMM Clustering Algorithm. Sustainability, 17(7), 2827. https://doi.org/10.3390/su17072827