HGST: A Hilbert-GeoSOT Spatio-Temporal Meshing and Coding Method for Efficient Spatio-Temporal Range Query on Massive Trajectory Data
Abstract
:1. Introduction
- We propose a spatio-temporal meshing and coding method called HGST. It uses Hilbert instead of default Z curves for spatial grid coding, and constructs a unified time division standard to obtain the time identification of any time under a custom time resolution. This method provides a novel spatio-temporal index for trajectory data;
- Based on HGST, we design an adaptive spatio-temporal scaling and coding method to determine the optimal subdivision level depending on the query range, and also propose a query code merging strategy to further reduce the complexity of spatio-temporal range queries;
- We implement a prototype system on top of HBase and Spark, and develop an efficient algorithm implementing the Spark paradigm to accelerate the parallel execution of spatio-temporal range queries.
2. Related Work
2.1. Data-Driven Approaches
2.2. Space-Driven Approaches
3. Preliminary
3.1. Problem Formulation
3.2. GeoSOT
4. Method
4.1. Overview
4.2. HGST Subdivision Model
4.2.1. Spatial Encoding
4.2.2. Temporal Subdivision and Encoding
4.2.3. Calculate the HGST Spatio-Temporal Code
4.2.4. Characteristics of the HGSTCode
4.3. Index Construction
4.4. Spatio-Temporal Range Query
4.4.1. Filter Stage
Algorithm 1: Adaptive spatio-temporal scaling and coding method based on HGST |
Algorithm 2: The query on the same temporal and spatial scale |
Algorithm 3: The query for a long time range and a small spatial range |
Algorithm 4: The query for a short time range and a large spatial range |
4.4.2. Refinement Stage
Algorithm 5: Spark-based method for the refinement stage of trajectory spatio-temporal range query |
5. Experiments and Results
5.1. Experimental Setup and Methodology
5.2. Evaluation of the Efficiency of the Index Construction
5.3. Performance of Spatio-Temporal Query
5.4. Effects of Method Optimization
5.4.1. Effect of Hilbert Filling Curve
5.4.2. Effect of Merging Query Codes
5.4.3. Effect of Using Spark
6. Discussion
- The current work lacks scan optimization for filtering operations. If too many HGST query codes are generated, it may need to scan the database too many times, resulting in reduced efficiency. In our future works, we will consider using multi-threads to trigger operations over the underlying key-value data storage in parallel, or remotely execute coded scan filtering in RegionServers based on the HBase endpoint coprocessor to achieve parallel queries at the Region level of Table. The above may provide researchers with more efficient data discovery capabilities;
- In the current work, HGST cannot intelligently select a standalone version or Spark mode to execute jobs according to the size of the data. To overcome this limitation, in our future works, we will consider using data sampling technology to estimate the amount of data requested, and then choose a single-machine version for a small data request, which will save the overhead of cluster resource scheduling to a certain extent, making it more suitable for practical applications.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Xie, X.; Mei, B.; Chen, J.; Du, X.; Jensen, C.S. Elite: An elastic infrastructure for big spatiotemporal trajectories. VLDB J. 2016, 25, 473–493. [Google Scholar] [CrossRef]
- Gao, C.; Zhang, Z.; Huang, C.; Yin, H.; Yang, Q.; Shao, J. Semantic trajectory representation and retrieval via hierarchical embedding. Inf. Sci. 2020, 538, 176–192. [Google Scholar] [CrossRef]
- Dodge, S.; Gao, S.; Tomko, M.; Weibel, R. Progress in computational movement analysis—Towards movement data science. Int. J. Geogr. Inf. Sci. 2020, 34, 2395–2400. [Google Scholar] [CrossRef]
- Burger, C.N.; Kleynhans, W.; Grobler, T.L. Extended linear regression model for vessel trajectory prediction with a priori AIS information. Geo-Spat. Inf. Sci. 2022, 1–19. [Google Scholar] [CrossRef]
- Zheng, Y.; Capra, L.; Wolfson, O.; Yang, H. Urban Computing: Concepts, Methodologies, and Applications. ACM Trans. Intell. Syst. Technol. 2014, 5, 38:1–38:55. [Google Scholar] [CrossRef]
- Bakli, M.S.; Sakr, M.A.; Zimányi, E. Distributed Spatiotemporal Trajectory Query Processing in SQL. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3–6 November 2020. [Google Scholar]
- Deng, K.; Xie, K.; Zheng, K.; Zhou, X. Trajectory Indexing and Retrieval. In Computing with Spatial Trajectories; Springer: New York, NY, USA, 2011. [Google Scholar]
- Ghosh, S.; Ghosh, S.K.; Buyya, R. MARIO: A spatio-temporal data mining framework on Google Cloud to explore mobility dynamics from taxi trajectories. J. Netw. Comput. Appl. 2020, 164, 102692. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, W.; He, Z. Measuring positive public transit accessibility using big transit data. Geo-Spat. Inf. Sci. 2021, 24, 722–741. [Google Scholar] [CrossRef]
- Kothari, P.; Kreiss, S.; Alahi, A. Human Trajectory Forecasting in Crowds: A Deep Learning Perspective. IEEE Trans. Intell. Transp. Syst. 2022, 23, 7386–7400. [Google Scholar] [CrossRef]
- Fang, Z.; Chen, L.; Gao, Y.; Pan, L.; Jensen, C.S. Dragoon: A hybrid and efficient big trajectory management system for offline and online analytics. VLDB J. 2021, 30, 287–310. [Google Scholar] [CrossRef]
- Zhu, Q.; Gong, J.; Zhang, Y. An efficient 3D R-tree spatial index method for virtual geographic environments. ISPRS J. Photogramm. Remote. Sens. 2007, 62, 217–224. [Google Scholar] [CrossRef]
- Pfoser, D.; Jensen, C.S.; Theodoridis, Y. Novel Approaches to the Indexing of Moving Object Trajectories. Proc. VLDB 2000, 2000, 395–406. [Google Scholar]
- Song, Z.; Roussopoulos, N. SEB-tree: An Approach to Index Continuously Moving Objects. In Proceedings of the Mobile Data Management, Melbourne, Australia, 21–24 January 2003. [Google Scholar]
- Nidzwetzki, J.K.; Güting, R.H. BBoxDB—A Scalable Data Store for Multi-Dimensional Big Data. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22 October 2018. [Google Scholar]
- Fox, A.D.; Eichelberger, C.N.; Hughes, J.N.; Lyon, S. Spatio-temporal indexing in non-relational distributed databases. In Proceedings of the 2013 IEEE International Conference on Big Data, Silicon Valley, CA, USA, 6–9 October 2013; pp. 291–299. [Google Scholar]
- Qian, C.; Yi, C.; Cheng, C.; Pu, G.; Wei, X.; Zhang, H. GeoSOT-Based Spatiotemporal Index of Massive Trajectory Data. ISPRS Int. J. Geo Inf. 2019, 8, 284. [Google Scholar] [CrossRef] [Green Version]
- Li, R.; He, H.; Wang, R.; Huang, Y.; Liu, J.; Ruan, S.; He, T.; Bao, J.; Zheng, Y.X. JUST: JD Urban Spatio-Temporal Data Engine. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; pp. 1558–1569. [Google Scholar]
- Xu, P.; Nguyen, C.; Tirthapura, S. Onion Curve: A Space Filling Curve with Near-Optimal Clustering. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE), Paris, France, 16–19 April 2018; pp. 1236–1239. [Google Scholar]
- Jiang, H.; Kang, J.; Du, Z.; Zhang, F.; Huang, X.; Liu, R.; Zhang, X. Vector Spatial Big Data Storage and Optimized Query Based on the Multi-Level Hilbert Grid Index in HBase. Information 2018, 9, 116. [Google Scholar] [CrossRef] [Green Version]
- Lei, Y.; Tong, X.; Zhang, Y.; Qiu, C.; Wu, X.S.; Lai, G.; Li, H.; Guo, C.; Zhang, Y. Global multi-scale grid integer coding and spatial indexing: A novel approach for big earth observation data. ISPRS J. Photogramm. Remote. Sens. 2020, 163, 202–213. [Google Scholar] [CrossRef]
- Guo, N.; Xiong, W.; Wu, Y.; Chen, L.; Jing, N. A Geographic Meshing and Coding Method Based on Adaptive Hilbert-Geohash. IEEE Access 2019, 7, 39815–39825. [Google Scholar] [CrossRef]
- Huang, X.; Deng, Z.; Yan, J.; Li, J.; Chen, Y.; Wang, L. A High-Performance Spatial Range Query-Based Data Discovery Method on Massive Remote Sensing Data via Adaptive Geographic Meshing and Coding. IEEE J. Miniaturizat. Air Space Syst. 2021, 2, 117–128. [Google Scholar] [CrossRef]
- Wu, Y.; Cao, X.; An, Z. A Spatiotemporal Trajectory Data Index Based on the Hilbert Curve Code. IOP Conf. Ser. Earth Environ. Sci. 2020, 502, 012005. [Google Scholar] [CrossRef]
- Wang, X.; Sun, Y.; Sun, Q.; Lin, W.; Wang, J.Z.; Li, W. HCIndex: A Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems. Clust. Comput. 2022, 1–15. [Google Scholar] [CrossRef]
- Moon, B.; Jagadish, H.V.; Faloutsos, C.; Saltz, J. Analysis of the Clustering Properties of the Hilbert Space-Filling Curve. IEEE Trans. Knowl. Data Eng. 2001, 13, 124–141. [Google Scholar] [CrossRef] [Green Version]
- Shang, Z.; Li, G.; Bao, Z. DITA: Distributed In-Memory Trajectory Analytics. In Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018. [Google Scholar]
- Zheng, K.; Zhao, Y.; Lian, D.; Zheng, B.; Liu, G.; Zhou, X. Reference-Based Framework for Spatio-Temporal Trajectory Compression and Query Processing. IEEE Trans. Knowl. Data Eng. 2020, 32, 2227–2240. [Google Scholar] [CrossRef]
- Pelekis, N.; Frentzos, E.; Giatrakos, N.; Theodoridis, Y. HERMES: A Trajectory DB Engine for Mobility-Centric Applications. Int. J. Knowl. Based Organ. 2015, 5, 19–41. [Google Scholar] [CrossRef] [Green Version]
- Zimányi, E.; Sakr, M.A.; Lesuisse, A. MobilityDB: A Mobility Database Based on PostgreSQL and PostGIS. ACM Trans. Database Syst. 2020, 45, 19:1–19:42. [Google Scholar] [CrossRef]
- Aji, A.; Wang, F.; Vo, H.; Lee, R.; Liu, Q.; Zhang, X.; Saltz, J. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. Proc. VLDB Endow. Int. Conf. Very Large Data Bases 2013, 6, 11. [Google Scholar] [CrossRef]
- Bakli, M.S.; Sakr, M.A.; Soliman, T.H.A. HadoopTrajectory: A Hadoop spatiotemporal data processing extension. J. Geogr. Syst. 2019, 21, 211–235. [Google Scholar] [CrossRef]
- Tian, R.; Zhai, H.; Zhang, W.; Wang, F.; Guan, Y. A Survey of Spatio-Temporal Big Data Indexing Methods in Distributed Environment. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2022, 15, 4132–4155. [Google Scholar] [CrossRef]
- Li, G.; Tang, J. A New R-tree Spatial Index Based on Space Grid Coordinate Division. In Proceedings of the 2011 International Conference on Informatics, Cybernetics, and Computer Engineering (ICCE2011), Melbourne, Australia, 19–20 November 2011. [Google Scholar]
- Qi, J. Packing R-trees with Space-Filling Curves: Theoretical Optimality, Empirical Efficiency, and Bulk-loading Parallelizability. ACM Trans. Database Syst. 2020, 45, 1–47. [Google Scholar]
- Guan, X.; Bo, C.; Li, Z.; Yu, Y. ST-hash: An efficient spatiotemporal index for massive trajectory data in a NoSQL database. In Proceedings of the 2017 25th International Conference on Geoinformatics, Buffalo, NY, USA, 2–4 August 2017; pp. 1–7. [Google Scholar]
- Ding, R.; Meng, X. A quadtree based dynamic attribute index structure and query process. In Proceedings of the 2001 International Conference on Computer Networks and Mobile Computing, Beijing, China, 16–19 October 2001; pp. 446–451. [Google Scholar]
- Huang, M.; Hu, P.; Xia, L. A grid based trajectory indexing method for moving objects on fixed network. In Proceedings of the 2010 18th International Conference on Geoinformatics, Beijing, China, 18–20 June 2010; pp. 1–4. [Google Scholar]
- Qu, T.; Wang, L.; Yu, J.; Yan, J.; Xu, G.; Li, M.H.; Cheng, C.; Hou, K.; Chen, B. STGI: A spatio-temporal grid index model for marine big data. Big Earth Data 2020, 4, 435–450. [Google Scholar] [CrossRef]
- Liu, H.; Yan, J.; Huang, X. HBase-based spatial-temporal index model for trajectory data. Iop Conf. Ser. Earth Environ. Sci. 2022, 1004, 012007. [Google Scholar] [CrossRef]
- Li, R.; He, H.; Wang, R.; Ruan, S.; Sui, Y.; Bao, J.; Zheng, Y. TrajMesa: A Distributed NoSQL Storage Engine for Big Trajectory Data. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; pp. 2002–2005. [Google Scholar]
- Lê, H.V.; Takasu, A. G-HBase: A High Performance Geographical Database Based on HBase. IEICE Trans. Inf. Syst. 2018, 101-D, 1053–1065. [Google Scholar] [CrossRef] [Green Version]
- Yang, S.; He, Z.; Chen, Y.P.P. GCOTraj: A storage approach for historical trajectory data sets using grid cells ordering. Inf. Sci. 2018, 459, 1–19. [Google Scholar] [CrossRef]
- Wang, C.; Zourlidou, S.; Golze, J.; Sester, M. Trajectory analysis at intersections for traffic rule identification. Geo-Spat. Inf. Sci. 2021, 24, 75–84. [Google Scholar] [CrossRef]
- Cheng, C.; Tong, X.; Chen, B.; Zhai, W. A Subdivision Method to Unify the Existing Latitude and Longitude Grids. ISPRS Int. J. Geo Inf. 2016, 5, 161. [Google Scholar] [CrossRef] [Green Version]
- Li, S.; Pu, G.; Cheng, C.; Chen, B. Method for managing and querying geo-spatial data using a grid-code-array spatial index. Earth Sci. Inform. 2018, 12, 173–181. [Google Scholar] [CrossRef]
- Bakli, M.S.; Sakr, M.A.; Soliman, T.H.A. A spatiotemporal algebra in Hadoop for moving objects. Geo-Spat. Inf. Sci. 2018, 21, 102–114. [Google Scholar] [CrossRef] [Green Version]
- Borthakur, D. HDFS architecture guide. Hadoop Apache Proj. 2008, 53, 2. [Google Scholar]
- Hagedorn, S.; Goötze, P.; Sattler, K.U. The STARK Framework for Spatio-Temporal Data Analytics on Spark. In Datenbanksysteme für Business, Technologie und Web (BTW 2017); Mitschang, B., Nicklas, D., Leymann, F., Schöning, H., Herschel, M., Teubner, J., Härder, T., Kopp, O., Wieland, M., Eds.; Gesellschaft für Informatik: Bonn, Germany, 2017; pp. 123–142. [Google Scholar]
- Zaharia, M.; Chowdhury, M.; Das, T.; Dave, A.; Ma, J.; McCauly, M.; Franklin, M.; Shenker, S.; Stoica, I. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the NSDI, San Jose, CA, USA, 25–27 April 2012. [Google Scholar]
- Yuan, J.; Zheng, Y.; Zhang, C.; Xie, W.; Xie, X.; Sun, G.; Huang, Y. T-drive: Driving directions based on taxi trajectories. In Proceedings of the ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010. [Google Scholar]
- Yuan, J.; Zheng, Y.; Xie, X.; Sun, G. Driving with knowledge from the physical world. In Proceedings of the Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011. [Google Scholar]
Level | Scale | Level | Scale | Level | Scale | Level | Scale |
---|---|---|---|---|---|---|---|
0 | 32 year | 7 | 4 month | 14 | 1 day | 21 | 16 min |
1 | 16 year | 8 | 2 month | 15 | 16 h | 22 | 8 min 1/ |
2 | 8 year | 9 | 1 month | 16 | 8 h | 23 | 4 min 1/ |
3 | 4 year | 10 | 16 day | 17 | 4 h | 24 | 2 min 1/ |
4 | 2 year | 11 | 8 day | 18 | 2 h | 25 | 1 min 1/ |
5 | 1 year | 12 | 4 day | 19 | 1 h | ||
6 | 8 month | 13 | 2 day | 20 | 32 min |
Parameters | Setting |
---|---|
Data Size (millions) | 1, 3, 5, 10, 15 |
Time Window | 1 h, 4 h, 12 h, 1 day, 3 day |
Spatial Window () | 3 × 3, 5 × 5, 10 × 10, 20 × 20, 30 × 30 |
(116.41961, 39.95879, 6 February 2008 18:18:50) | (116.31314, 39.95514, 6 February 2008 07:06:19) | (116.44057, 39.91701, 5 February 2008 05:45:46) | (116.40911, 39.95973, 2 February 2008 09:33:17) | |||||
---|---|---|---|---|---|---|---|---|
time window | ||||||||
1 h | 18/18 | 7.409/7.212 | 12/12 | 5.663/5.631 | 12/12 | 4.425/4.364 | 12/12 | 0.264/0.261 |
4 h | 12/5 | 7.057/6.409 | 12/5 | 6.184/5.345 | 12/5 | 4.816/4.216 | 6/6 | 0.24/0.236 |
12 h | 28/7 | 9.754/7.715 | 28/7 | 8.515/6.223 | 28/7 | 7.394/5.718 | 14/14 | 0.68/0.598 |
1 day | 52/10 | 12.943/8.606 | 52/10 | 12.492/8.153 | 52/10 | 12.868/9.34 | 26/26 | 1.918/1.862 |
3 days | 148/22 | 31.381/17.772 | 148/22 | 28.438/15.605 | 148/22 | 30.33/19.214 | 74/74 | 6.904/6.75 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, H.; Yan, J.; Wang, J.; Chen, B.; Chen, M.; Huang, X. HGST: A Hilbert-GeoSOT Spatio-Temporal Meshing and Coding Method for Efficient Spatio-Temporal Range Query on Massive Trajectory Data. ISPRS Int. J. Geo-Inf. 2023, 12, 113. https://doi.org/10.3390/ijgi12030113
Liu H, Yan J, Wang J, Chen B, Chen M, Huang X. HGST: A Hilbert-GeoSOT Spatio-Temporal Meshing and Coding Method for Efficient Spatio-Temporal Range Query on Massive Trajectory Data. ISPRS International Journal of Geo-Information. 2023; 12(3):113. https://doi.org/10.3390/ijgi12030113
Chicago/Turabian StyleLiu, Hong, Jining Yan, Jinlin Wang, Bo Chen, Meng Chen, and Xiaohui Huang. 2023. "HGST: A Hilbert-GeoSOT Spatio-Temporal Meshing and Coding Method for Efficient Spatio-Temporal Range Query on Massive Trajectory Data" ISPRS International Journal of Geo-Information 12, no. 3: 113. https://doi.org/10.3390/ijgi12030113
APA StyleLiu, H., Yan, J., Wang, J., Chen, B., Chen, M., & Huang, X. (2023). HGST: A Hilbert-GeoSOT Spatio-Temporal Meshing and Coding Method for Efficient Spatio-Temporal Range Query on Massive Trajectory Data. ISPRS International Journal of Geo-Information, 12(3), 113. https://doi.org/10.3390/ijgi12030113