A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method
Abstract
:1. Introduction
- (1)
- A novel line segments generation technique is proposed using the angle-based partitioning method (AngPart).
- (2)
- A novel fuzzy C-means (NFCM) clustering algorithm is put forth, combining the Lagrange operator with AngPart and K-means++.
- (3)
- A trajectory regression technique that is based on LSR is presented, which can be used to explain state of population migration around trajectories and can be used as reference for road planning of the city.
- (4)
- FCML is shown to work on real-world taxi GPS data in Beijing, China.
2. Description of Real-World Taxi GPS Data
3. Preliminary
4. Methodology
4.1. Angle-Based Partitioning and Cosine-Based Constraint
Algorithm 1. Angle-based partitioning and cosine-based constraint algorithm |
Input: a given GPS dataset including location information and angles, the number of iterations, the angle threshold T Output: regression trajectories Procedure: Divide into Taxi GPS data: location information and angles, which are set to numbers for each location and angle Define list Lx which is used as temporary storage for three taxi GPS data points Define list // which is used to store line segments /* Steering angle-based partitioning */ WHILE DO // select a data point from D, and record the angle of the ; mark as ; // The “count” is used to count the number of selected data points; // Calculate distances between the selected and ; // select second point from D; // Call the built-in function pdist2 of the Matlab to calculate the ; // Sort the distances in descending order according to the Euclidean; DO // Indicate the angle is effective in D, // Calculate angle difference of the selected data point ; IF Use Equation (2) to normalize the angles; ELSE Use Equation (3) to normalize the angles; END IF WHILE // the “10” is a given condition, which is used to handle taxi GPS selection // we select another 10 GPS data points around first points IF // Indicates the two taxi GPS data points have been chosen // Denotes the shortest distance between and ELSE END IF // when the DO … WHILE does not satisfy any given values (e.g., 10), then continue to loop // is shown in Figure 6 END WHILE IF // whether two data points are selected from D; Select third GPS data point as above operation steps // above operation steps stand for method of the selected first and second point; END IF // three GPS data points are selected from D IF // Calculate intersection angle between and , where is a vertex IF // Denotes line segment () is separated from D, and is stored in ELSE // Put in D, which are used to recalculate line segments END IF // Put in D END IF END WHILE release Lx |
4.2. Fuzzy C-Means Measure Based on the Lagrange Equation
- (1)
- ||*|| is any norm expressing the similarity between any measured data and the center.
- (2)
- L is the number of line segments, which is defined in Section 3.
- (3)
- K is the number of clusters of line segments.
- (4)
- m is the fuzzy partition matrix exponent used to control the degree of fuzzy overlap, which in in this paper is set to m = 2 according to Reference [41].
- (5)
- and are defined in Section 3.
- (6)
- is the degree of membership of in the kth cluster, as shown in Equation (6)
Algorithm 2. The novel fuzzy C-means algorithm (FCML) |
Input: line segments (see Algorithm 1), K Output: clustering results of line segments: Procedure: (1) Randomly initialize the clusters membership values in terms of line segments results; (2) Use Equation (8) to produce cluster centers; (3) Use Equation (6) to update membership values; (4) Use Equation (5) to calculate objective function values; (5) Use Equation (9) to repair error the rate of FCM, as well as to improve the global optimization and balance iterative hill-climbing; (6) Repeat steps 2–5 until improves by less than the specified maximum number of iterations. |
4.3. Trajectory Regression Clustering Based on the Least Squares Model
Algorithm 3. LSR-based regression for clustering results from Algorithm 2 |
Input: clustering results (CR) of the line segments in Algorithm 2, number-order regression based on the least squares method Output: regression trajectories Procedure: FOR 1 to K // K is number of clusters FOR 1 to n // n is number of taxi GPS data points // is the regression function based on LSR // is the output function, and x and y denote the x axis and y axis, respectively END FOR END FOR |
5. Experiment Results
6. Conclusions
Author Contributions
Acknowledgments
Conflicts of Interest
References
- Zheng, Y.; Liu, Y.; Yuan, J.; Xie, X. Urban computing with taxicabs. In Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, 17–21 September 2011; pp. 89–98. [Google Scholar]
- D’Andrea, E.; Marcelloni, F. Detection of traffic congestion and incidents from gps trace analysis. Expert Syst. Appl. 2017, 73, 43–56. [Google Scholar] [CrossRef]
- An, S.; Yang, H.; Wang, J.; Cui, N.; Cui, J. Mining urban recurrent congestion evolution patterns from gps-equipped vehicle mobility data. Inf. Sci. 2016, 373, 515–526. [Google Scholar] [CrossRef]
- Yang, Y.; Xu, Y.; Han, J.; Wang, E.; Chen, W.; Yue, L. Efficient traffic congestion estimation using multiple spatio-temporal properties. Neurocomputing 2017, 267, 344–353. [Google Scholar] [CrossRef]
- Cui, J.; Liu, F.; Hu, J.; Janssens, D.; Wets, G.; Cools, M. Identifying mismatch between urban travel demand and transport network services using gps data: A case study in the fast growing chinese city of harbin. Neurocomputing 2016, 181, 4–18. [Google Scholar] [CrossRef]
- Qu, M.; Zhu, H.; Liu, J.; Liu, G.; Xiong, H. A cost-effective recommender system for taxi drivers. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2014; pp. 45–54. [Google Scholar]
- Cui, J.; Liu, F.; Janssens, D.; An, S.; Wets, G.; Cools, M. Detecting urban road network accessibility problems using taxi gps data. J. Transp. Geogr. 2016, 51, 147–157. [Google Scholar] [CrossRef]
- Ferreira, N.; Poco, J.; Vo, H.T.; Freire, J.; Silva, C.T. Visual exploration of big spatio-temporal urban data: A study of new york city taxi trips. IEEE Trans. Vis. Comput. Graph. 2013, 19, 2149–2158. [Google Scholar] [CrossRef] [PubMed]
- Kharrat, A.; Popa, I.S.; Zeitouni, K.; Faiz, S. Clustering algorithm for network constraint trajectories. In Headway in Spatial Data Handling; Springer: Berlin/Heidelberg, Germany, 2008; pp. 631–647. [Google Scholar]
- Lee, J.-G.; Han, J.; Whang, K.-Y. Trajectory clustering: A partition-and-group framework. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, 12–14 June 2007; pp. 593–604. [Google Scholar]
- Deng, Z.; Hu, Y.; Zhu, M.; Huang, X.; Du, B. A scalable and fast optics for clustering trajectory big data. Clust. Comput. 2015, 18, 549–562. [Google Scholar] [CrossRef]
- Han, B.; Liu, L.; Omiecinski, E. Road-network aware trajectory clustering: Integrating locality, flow, and density. IEEE Trans. Mob. Comput. 2015, 14, 416–429. [Google Scholar]
- Lou, Y.; Zhang, C.; Zheng, Y.; Xie, X.; Wang, W.; Huang, Y. Map-matching for low-sampling-rate gps trajectories. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 4–6 November 2009; pp. 352–361. [Google Scholar]
- Yuan, J.; Zheng, Y.; Zhang, C.; Xie, X.; Sun, G.-Z. An interactive-voting based map matching algorithm. In Proceedings of the 2010 Eleventh International Conference on Mobile Data Management (MDM), Kansas City, MO, USA, 23–26 May 2010; pp. 43–52. [Google Scholar]
- Ciscal-Terry, W.; Dell’Amico, M.; Hadjidimitriou, N.S.; Iori, M. An analysis of drivers route choice behaviour using gps data and optimal alternatives. J. Transp. Geogr. 2016, 51, 119–129. [Google Scholar] [CrossRef]
- Luo, T.; Zheng, X.; Xu, G.; Fu, K.; Ren, W. An improved dbscan algorithm to detect stops in individual trajectories. ISPRS Int. J. Geo-Inf. 2017, 6, 63. [Google Scholar] [CrossRef]
- Mai, G.; Janowicz, K.; Hu, Y.; Gao, S. Adcn: An anisotropic density-based clustering algorithm for discovering spatial point patterns with noise. Trans. GIS 2018, 22, 348–349. [Google Scholar] [CrossRef]
- Han, J.; Pei, J.; Kamber, M. Data Mining: Concepts and Techniques; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
- Jain, A.K. Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
- Lv, M.; Chen, L.; Xu, Z.; Li, Y.; Chen, G. The discovery of personally semantic places based on trajectory data mining. Neurocomputing 2016, 173, 1142–1153. [Google Scholar] [CrossRef]
- Lecue, F.; Mehandjiev, N. Seeking quality of web service composition in a semantic dimension. IEEE Trans. Knowl. Data Eng. 2011, 23, 942–959. [Google Scholar] [CrossRef]
- Arthur, D.; Vassilvitskii, S. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
- Bahmani, B.; Moseley, B.; Vattani, A.; Kumar, R.; Vassilvitskii, S. Scalable k-means++. Proc. VLDB Endow. 2012, 5, 622–633. [Google Scholar] [CrossRef]
- Pal, N.R.; Bezdek, J.C. On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 1995, 3, 370–379. [Google Scholar] [CrossRef]
- Henrikson, J. Completeness and total boundedness of the hausdorff metric. MIT Undergrad. J. Math. 1999, 1, 69–80. [Google Scholar]
- Bandyopadhyay, S.; Maulik, U. An evolutionary technique based on k-means algorithm for optimal clustering in rn. Inf. Sci. 2002, 146, 221–237. [Google Scholar] [CrossRef]
- Pakhira, M.K.; Bandyopadhyay, S.; Maulik, U. Validity index for crisp and fuzzy clusters. Pattern Recognit. 2004, 37, 487–501. [Google Scholar] [CrossRef]
- Maulik, U.; Bandyopadhyay, S. Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 1650–1654. [Google Scholar] [CrossRef]
- Real-World Taxi-Gps Data Sets. Available online: https://github.com/bigdata002/Location-data-sets (accessed on 10 October 2017).
- Zhou, X.; Gu, J.; Shen, S.; Ma, H.; Miao, F.; Zhang, H.; Gong, H. An automatic k-means clustering algorithm of gps data combining a novel niche genetic algorithm with noise and density. ISPRS Int. J. Geo-Inf. 2017, 6, 392. [Google Scholar] [CrossRef]
- Lu, M.; Liang, J.; Wang, Z.; Yuan, X. Exploring od patterns of interested region based on taxi trajectories. J. Vis. 2016, 19, 811–821. [Google Scholar] [CrossRef]
- Spaccapietra, S.; Parent, C.; Damiani, M.L.; de Macedo, J.A.; Porto, F.; Vangenot, C. A conceptual view on trajectories. Data Knowl. Eng. 2008, 65, 126–146. [Google Scholar] [CrossRef] [Green Version]
- Luo, C.; Junlin, L.; Li, G.; Wei, W.; Li, Y.; Li, J. Efficient reverse spatial and textual k nearest neighbor queries on road networks. Knowl.-Based Syst. 2016, 93, 121–134. [Google Scholar] [CrossRef]
- Chang, C.; Zhou, B. Multi-granularity visualization of trajectory clusters using sub-trajectory clustering. In Proceedings of the IEEE International Conference on Data Mining Workshops, Miami, FL, USA, 6–9 December 2009; pp. 577–582. [Google Scholar]
- Li, Y.; Bandar, Z.A.; McLean, D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 2003, 15, 871–882. [Google Scholar]
- Sclim, S.; Lsmailm, A. Means-type algorithm: A generalized convergence theorem and characterization of local optimality. IEEE. Trans. Pattern Anal. 1984, 6, 81–87. [Google Scholar]
- Cox, E. Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration; Elsevier: Amsterdam, The Netherlands, 2005; pp. 421–481. [Google Scholar]
- Saha, A.; Das, S. Axiomatic generalization of the membership degree weighting function for fuzzy c means clustering: Theoretical development and convergence analysis. Inf. Sci. 2017, 408, 129–145. [Google Scholar] [CrossRef]
- Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Plenum Press: New York, NY, USA, 1981; pp. 203–239. [Google Scholar]
- Ding, Y.; Fu, X. Kernel-based fuzzy c-means clustering algorithm based on genetic algorithm. Neurocomputing 2016, 188, 233–238. [Google Scholar] [CrossRef]
- Mukhopadhyay, A.; Maulik, U. Towards improving fuzzy clustering using support vector machine: Application to gene expression data. Pattern Recognit. 2009, 42, 2744–2763. [Google Scholar] [CrossRef]
- Yuan, H.; Zheng, J.; Lai, L.L.; Tang, Y.Y. A constrained least squares regression model. Inf. Sci. 2018, 429, 247–259. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, S. Fast quantum algorithms for least squares regression and statistic leverage scores. Theor. Comput. Sci. 2017, 657, 38–47. [Google Scholar] [CrossRef]
- Chen, K.; Lv, Q.; Lu, Y.; Dou, Y. Robust regularized extreme learning machine for regression using iteratively reweighted least squares. Neurocomputing 2017, 230, 345–358. [Google Scholar] [CrossRef]
- Gui, J.; Sun, Z.; Ji, S.; Tao, D.; Tan, T. Feature selection based on structured sparsity: A comprehensive study. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 1490–1507. [Google Scholar] [CrossRef] [PubMed]
- Davies, D.L.; Bouldin, D.W. A cluster separation measure. IEEE. Trans. Pattern Anal. 1979, 224–227. [Google Scholar] [CrossRef]
- Dunn, J.C. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J. Cybern. 1973, 3, 32–57. [Google Scholar] [CrossRef]
- Chang, D.-X.; Zhang, X.-D.; Zheng, C.-W. A genetic algorithm with gene rearrangement for k-means clustering. Pattern Recognit. 2009, 42, 1210–1222. [Google Scholar] [CrossRef]
- Bandyopadhyay, S.; Maulik, U.; Mukhopadhyay, A. Multiobjective genetic clustering for pixel classification in remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1506–1511. [Google Scholar] [CrossRef]
The Number of Clusters | K-Means | K-Median | FCM | FCML |
---|---|---|---|---|
20 | 7.124884 | 6.835498 | 3.731286 | 3.728852 |
40 | 10.473148 | 10.035824 | 3.835722 | 3.824766 |
80 | 17.149494 | 16.434827 | 4.134353 | 4.018201 |
100 | 20.561356 | 19.632886 | 4.241650 | 4.117618 |
Values | K-Means | K-Median | FCM | FCML |
---|---|---|---|---|
K = 20 | ||||
Max | 0.072370 | 0.076390 | 0.080722 | 0.090330 |
Mean | 0.059763 | 0.061500 | 0.080010 | 0.088628 |
Min | 0.045748 | 0.046031 | 0.079090 | 0.086792 |
K = 40 | ||||
Max | 0.053466 | 0.054535 | 0.062047 | 0.066194 |
Mean | 0.047736 | 0.045803 | 0.060685 | 0.067674 |
Min | 0.040186 | 0.027937 | 0.060046 | 0.067124 |
K = 80 | ||||
Max | 0.042144 | 0.038428 | 0.046632 | 0.050012 |
Mean | 0.035606 | 0.035992 | 0.044379 | 0.048858 |
Min | 0.029208 | 0.032240 | 0.043320 | 0.048086 |
K = 100 | ||||
Max | 0.035346 | 0.039044 | 0.041666 | 0.044469 |
Mean | 0.032129 | 0.033527 | 0.041100 | 0.043674 |
Min | 0.027112 | 0.030160 | 0.040440 | 0.043137 |
Cluster | K-Means | K-Median | FCM | FCML |
---|---|---|---|---|
1 | 22 | 7367 | 1674 | 1185 |
2 | 0 | 0 | 1275 | 1262 |
3 | 0 | 0 | 587 | 1541 |
4 | 0 | 0 | 967 | 1629 |
5 | 0 | 0 | 1421 | 874 |
6 | 1145 | 0 | 1086 | 1344 |
7 | 5449 | 89 | 1325 | 680 |
8 | 0 | 0 | 1140 | 1095 |
9 | 0 | 112 | 1295 | 1014 |
10 | 819 | 0 | 1925 | 1420 |
11 | 0 | 0 | 1487 | 908 |
12 | 0 | 0 | 993 | 1372 |
13 | 0 | 0 | 1190 | 946 |
14 | 3735 | 3695 | 1079 | 1189 |
15 | 0 | 12,522 | 684 | 1128 |
16 | 0 | 0 | 1533 | 1260 |
17 | 12,436 | 0 | 819 | 1267 |
18 | 0 | 0 | 1541 | 1475 |
19 | 179 | 0 | 976 | 1617 |
20 | 0 | 0 | 783 | 579 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, X.; Miao, F.; Ma, H.; Zhang, H.; Gong, H. A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method. ISPRS Int. J. Geo-Inf. 2018, 7, 164. https://doi.org/10.3390/ijgi7050164
Zhou X, Miao F, Ma H, Zhang H, Gong H. A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method. ISPRS International Journal of Geo-Information. 2018; 7(5):164. https://doi.org/10.3390/ijgi7050164
Chicago/Turabian StyleZhou, Xiangbing, Fang Miao, Hongjiang Ma, Hua Zhang, and Huaming Gong. 2018. "A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method" ISPRS International Journal of Geo-Information 7, no. 5: 164. https://doi.org/10.3390/ijgi7050164