A Deep-Learning-Based Multimodal Data Fusion Framework for Urban Region Function Recognition
Abstract
:1. Introduction
- (1)
- We designed a unified deep-learning framework and integrated remote-sensing images and POIs to recognize urban functional zones. Our method can extract visual features, social features, and spatial relationship features from different data for more accurate urban functional zone recognition, while existing relevant studies rarely take into account all three features simultaneously.
- (2)
- We investigated which POI categories have a greater impact on the final urban functional zone recognition accuracy, as well as the advantages of using POI data compared to using RSI data, through a series of experiments, which contributes to a further understanding of the role of multimodal data in the urban functional zone recognition task.
- (3)
- The synergy mechanism of remote-sensing image data and point-of-interest data in the urban functional zone recognition task has rarely been studied. In this study, we used a feature fusion module to adaptively fuse the visual and social features and further analyzed the specific effects of this synergy mechanism for different urban functional areas.
2. Data and Methods
2.1. Data Source
2.2. Methods
2.2.1. Complementary Feature Learning and Fusion
Visual Feature Learning
Social Feature Learning
Complementary Feature Fusion
2.2.2. Spatial Relationship Modeling
Local Spatial Relationship Modeling
Global Spatial Relationship Modeling
- (a)
- Dynamic Position Bias
- (b)
- Long- and Short-Distance Attention
Feature Aggregation Module
3. Experimental Analysis
3.1. Implementation Details
3.2. Evaluation Metrics
3.3. Experiments
3.3.1. Comparative Experiments for the Second Part
3.3.2. Comparative Experiments with Other Methods
3.3.3. Ablation Experiment
4. Discussion
4.1. Specific Impact of POIs on UFZ Recognition
4.2. Synergy Mechanism of POIs and RSIs in UFZ Recognition
4.3. Layer Activation Visualization of Visual Features
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Tu, W.; Hu, Z.; Li, L.; Cao, J.; Jiang, J.; Li, Q.; Li, Q. Portraying urban functional zones by coupling remote sensing imagery and human sensing data. Remote Sens. 2018, 10, 141. [Google Scholar] [CrossRef]
- Hu, T.; Yang, J.; Li, X.; Gong, P. Mapping urban land use by using landsat images and open social data. Remote Sens. 2016, 8, 151. [Google Scholar] [CrossRef]
- Zhang, Z.; Wang, Y.; Liu, Q.; Li, L.; Wang, P. A CNN based functional zone classification method for aerial images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 5449–5452. [Google Scholar]
- Ge, P.; He, J.; Zhang, S.; Zhang, L.; She, J. An integrated framework combining multiple human activity features for land use classification. ISPRS Int. J. Geoinf. 2019, 8, 90. [Google Scholar] [CrossRef]
- Song, J.; Tong, X.; Wang, L.; Zhao, C.; Prishchepov, A.V. Monitoring finer-scale population density in urban functional zones: A remote sensing data fusion approach. Landsc. Urban Plan. 2019, 190, 103580. [Google Scholar] [CrossRef]
- Yu, B.; Wang, Z.; Mu, H.; Sun, L.; Hu, F. Identification of urban functional regions based on floating car track data and POI data. Sustainability 2019, 11, 6541. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, Q.; Huang, H.; Wu, W.; Du, X.; Wang, H. The combined use of remote sensing and social sensing data in fine-grained urban land use mapping: A case study in Beijing. Remote Sens. 2017, 9, 865. [Google Scholar] [CrossRef]
- Banzhaf, E.; Netzband, M. Monitoring urban land use changes with remote sensing techniques. In Applied Urban Ecology: A Global Framework; Wiley: Hoboken, NJ, USA, 2011; pp. 18–32. [Google Scholar]
- Herold, M.; Couclelis, H.; Clarke, K.C. The role of spatial metrics in the analysis and modeling of urban landuse change. Comput. Environ. Urban Syst. 2005, 29, 369–399. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Dalal, N.; Bill, T. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
- Xiao, Z.; Liu, Q.; Tang, G.; Zhai, X. Elliptic Fourier transformation-based histograms of oriented gradients for rotationally invariant object detection in remote-sensing images. Int. J. Remote Sens. 2015, 36, 618–644. [Google Scholar] [CrossRef]
- Li, M.; Stein, A.; Bijker, W.; Zhan, Q. Urban land use extraction from very high resolution remote sensing imagery using a Bayesian network. ISPRS J. Photogramm. Remote Sens. 2016, 122, 192–205. [Google Scholar] [CrossRef]
- Zhang, X.; Du, S.; Wang, Q. Integrating bottom-up classification and top-down feedback for improving urban land-cover and functional-zone mapping. Remote Sens. Environ. 2018, 212, 231–248. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–16 July 2017; pp. 1251–1258. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014; pp. 1–14. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2 August 2017; p. 30. [Google Scholar]
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Zhou, W.; Ming, D.; Lv, X.; Zhou, K.; Bao, H.; Hong, Z. SO–CNN based urban functional zone fine division with VHR remote sensing image. Remote Sens. Environ. 2020, 236, 111458. [Google Scholar] [CrossRef]
- Du, S.; Du, S.; Liu, B.; Zhang, X. Mapping large-scale and fine-grained urban functional zones from VHR images using a multi-scale semantic segmentation network and object based approach. Remote Sens. Environ. 2021, 261, 112480. [Google Scholar] [CrossRef]
- Voltersen, M.; Berger, C.; Hese, S.; Schmullius, C. Object-based land cover mapping and comprehensive feature calculation for an automated derivation of urban structure types at block level. Remote Sens. Environ. 2014, 154, 192–201. [Google Scholar] [CrossRef]
- Peng, F.; Weng, Q. A time series analysis of urbanization induced land use and land cover change and its impact on land surface temperature with Landsat imagery. Remote Sens. Environ. 2016, 175, 205–214. [Google Scholar]
- Pei, T.; Sobolevsky, S.; Ratti, C.; Shaw, S.L.; Li, T.; Zhou, C. A new insight into land use classification based on aggregated mobile phone data. Int. J. Geogr. Inf. Sci. 2014, 28, 1988–2007. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, X.; Gao, S.; Gong, L.; Kang, C.; Zhi, Y.; Chi, G.; Shi, L. Social Sensing: A New Approach to Understanding Our Socioeconomic Environments. Ann. Assoc. Am. Geogr. 2015, 105, 512–530. [Google Scholar] [CrossRef]
- Zhang, X.; Du, S.; Wang, Q. Hierarchical semantic cognition for urban functional zones with VHR satellite images and POI data. ISPRS J. Photogramm. Remote Sens. 2017, 132, 170–184. [Google Scholar] [CrossRef]
- Zhang, C.; Sargent, I.; Pan, X.; Li, H.; Gardiner, A.; Hare, J.; Atkinson, P.M. Joint deep learning for land cover and land use classification. Remote Sens. Environ. 2019, 221, 173–187. [Google Scholar] [CrossRef]
- Yang, H.; Wolfson, O.; Zheng, Y.; Capra, L. Urban Computing: Concepts, Methodologies, and Applications. ACM Trans. Intell. Syst. Technol. 2014, 5, 1–55. [Google Scholar] [CrossRef]
- Liu, X.; He, J.; Yao, Y.; Zhang, J.; Liang, H.; Wang, H.; Hong, Y. Classifying urban land use by integrating remote sensing and social media data. Int. J. Geograph. Informat. Sci. 2017, 31, 1675–1696. [Google Scholar] [CrossRef]
- Jia, Y.; Ge, Y.; Ling, F.; Guo, X.; Wang, J.; Wang, L.; Chen, Y.; Li, X. Urban land use mapping by combining remote sensing imagery and mobile phone positioning data. Remote Sens. 2018, 10, 446. [Google Scholar] [CrossRef]
- Tu, W.; Cao, J.; Yue, Y.; Shaw, S.-L.; Zhou, M.; Wang, Z.; Chang, X.; Xu, Y.; Li, Q. Coupling mobile phone and social media data: A new approach to understanding urban functions and diurnal patterns. Int. J. Geograph. Informat. Sci. 2017, 31, 2331–2358. [Google Scholar] [CrossRef]
- Gao, S.; Janowicz, K.; Couclelis, H. Extracting urban functional regions from points of interest and human activities on location-based social networks. Trans. GIS 2017, 21, 446–467. [Google Scholar] [CrossRef]
- Cao, R.; Zhu, J.; Tu, W.; Li, Q.; Cao, J.; Liu, B.; Zhang, Q.; Qiu, G. Integrating aerial and street view images for urban land use classification. Remote Sens. 2018, 10, 1553. [Google Scholar] [CrossRef]
- Zhu, Y.; Newsam, S. Land use classification using convolutional neural networks applied to ground-level images. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3–6 November 2015; ACM: New York, NY, USA, 2015; pp. 61:1–61:4. [Google Scholar]
- Tu, W.; Cao, R.; Yue, Y.; Zhou, B.; Li, Q.; Li, Q. Spatial variations in urban public ridership derived from GPS trajectories and smart card data. J. Transp. Geogr. 2018, 69, 45–57. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, F.; Xiao, Y.; Gao, S. Urban land uses and traffic ‘source-sink areas’: Evidence from GPS-enabled taxi data in Shanghai. Landsc. Urban Plann. 2012, 106, 73–87. [Google Scholar] [CrossRef]
- Qian, Z.; Liu, X.; Tao, F.; Zhou, T. Identification of urban functional areas by coupling satellite images and taxi GPS trajectories. Remote Sens. 2020, 12, 2449. [Google Scholar] [CrossRef]
- Cao, R.; Tu, W.; Yang, C.; Li, Q.; Liu, J.; Zhu, J.; Zhang, Q.; Li, Q.; Qiu, G. Deep learning-based remote and social sensing data fusion for urban region function recognition. ISPRS J. Photogramm. Remote Sens. 2020, 163, 82–97. [Google Scholar] [CrossRef]
- Song, J.; Lin, T.; Li, X.; Prishchepov, A.V. Mapping Urban Functional Zones by Integrating Very High Spatial Resolution Remote Sensing Imagery and Points of Interest: A Case Study of Xiamen, China. Remote Sens. 2018, 10, 1737. [Google Scholar] [CrossRef]
- Xu, S.; Qing, L.; Han, L.; Liu, M.; Peng, Y.; Shen, L. A New Remote Sensing Images and Point-of-Interest Fused (RPF) Model for Sensing Urban Functional Regions. Remote Sens. 2020, 12, 1032. [Google Scholar] [CrossRef]
- Lu, W.; Tao, C.; Li, H.; Qi, J.; Li, Y. A unified deep learning framework for urban functional zone extraction based on multi-source heterogeneous data. Remote Sens. Environ. 2022, 270, 112830. [Google Scholar] [CrossRef]
- Bao, H.; Ming, D.; Guo, Y.; Zhang, K.; Zhou, K.; Du, S. DFCNN-based semantic recognition of urban functional zones by integrating remote sensing data and POI data. Remote Sens. 2020, 12, 1088. [Google Scholar] [CrossRef]
- Tao, C.; Lu, W.; Qi, J.; Wang, H. Spatial information considered network for scene classification. IEEE Geosci. Remote Sens. Lett. 2020, 18, 984–998. [Google Scholar] [CrossRef]
- Wang, W.; Chen, W.; Qiu, Q.; Chen, L.; Wu, B.; Lin, B.; He, X.; Liu, W. Crossformer++: A versatile vision transformer hinging on cross-scale attention. arXiv 2023, arXiv:2303.06908. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zheng, F.; Lin, S.; Zhou, W.; Huang, H. A Lightweight Dual-branch Swin Transformer for Remote Sensing Scene Classification. Remote Sens. 2023, 15, 2865. [Google Scholar] [CrossRef]
- Poudel, R.P.; Liwicki, S.; Cipolla, R. Fast-scnn: Fast semantic segmentation network. arXiv 2019, arXiv:1902.04502. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 568–578. [Google Scholar]
- Chung, J.; Gulcehre, C.; Cho, K.H.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Zhao, W.; Bo, Y.; Chen, J.; Tiede, D.; Blaschke, T.; Emery, W.J. Exploring semantic elements for urban scene recognition: Deep integration of high-resolution imagery and OpenStreetMap (OSM). ISPRS J. Photogramm. Remote Sens. 2019, 151, 237–250. [Google Scholar] [CrossRef]
- Selvaraju, R.; Cogswell, M.; Das, A.; Vedantam, A.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Category | Commerce | Industry | Residence | Construction | Institution | Transport | Open Space | Water |
---|---|---|---|---|---|---|---|---|
Training set | 6011 | 1396 | 30,407 | 346 | 1764 | 20,744 | 9418 | 10,714 |
Testing set | 1269 | 698 | 15,127 | 206 | 570 | 8364 | 4811 | 3355 |
Category | Residence | Retail | Supermarket | Nature | Hotel | Public Service | Institution | Industry | Airport |
---|---|---|---|---|---|---|---|---|---|
Number of POIs | 21,045 | 447 | 310 | 13 | 91 | 56 | 62 | 118 | 20 |
Models s a | F1 Scores | Kappa | OA | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Transport | Commerce | Residence | Construction | Industry | Institution | Open Space | Water | |||
Lu-CF | 0.8820 | 0.7865 | 0.9300 | 0.6528 | 0.8661 | 0.7514 | 0.9125 | 0.9596 | 0.8697 | 0.9136 |
SWINT | 0.8796 | 0.7458 | 0.9280 | 0.6213 | 0.8669 | 0.7349 | 0.9109 | 0.9600 | 0.8594 | 0.8956 |
PVT | 0.8794 | 0.7338 | 0.9276 | 0.6471 | 0.8690 | 0.7202 | 0.9105 | 0.9571 | 0.8587 | 0.8924 |
LSTM | 0.7805 | 0.6578 | 0.9146 | 0.4512 | 0.8392 | 0.7041 | 0.8772 | 0.9474 | 0.7993 | 0.8451 |
GRU | 0.8196 | 0.6707 | 0.9225 | 0.5325 | 0.8139 | 0.6938 | 0.8915 | 0.9476 | 0.8276 | 0.8762 |
Models s a | F1 Scores | Kappa | OA | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Transport | Commerce | Residence | Construction | Industry | Institution | Open Space | Water | |||
Lu-CF | 0.8820 | 0.7865 | 0.9300 | 0.6528 | 0.8661 | 0.7514 | 0.9125 | 0.9596 | 0.8697 | 0.9136 |
RPFM | 0.8613 | 0.7215 | 0.9063 | 0.6276 | 0.8479 | 0.7153 | 0.8995 | 0.9432 | 0.8413 | 0.8752 |
DHAO | 0.8143 | 0.6736 | 0.8965 | 0.5928 | 0.8035 | 0.6814 | 0.8719 | 0.9361 | 0.8167 | 0.8567 |
SO-CNN | 0.8332 | 0.6961 | 0.9325 | 0.6016 | 0.8113 | 0.6976 | 0.8817 | 0.9456 | 0.8325 | 0.8643 |
UNet | 0.7516 | 0.6198 | 0.8746 | 0.4332 | 0.7652 | 0.5678 | 0.7764 | 0.9175 | 0.7361 | 0.7623 |
Stage | Strategy | Kappa | OA | ||||||
---|---|---|---|---|---|---|---|---|---|
RSI | POI | LWM | FFM | LOCAL | GLOBAL | FGM | |||
1 | √ | 0.7364 | 0.7639 | ||||||
2 | √ | √ | 0.8157 | 0.8418 | |||||
3 | √ | √ | √ | 0.8246 | 0.8579 | ||||
4 | √ | √ | √ | √ | 0.8379 | 0.8713 | |||
5 | √ | √ | √ | √ | √ | 0.8548 | 0.8925 | ||
6 | √ | √ | √ | √ | √ | 0.8637 | 0.9012 | ||
7 | √ | √ | √ | √ | √ | √ | 0.8654 | 0.9058 | |
8 | √ | √ | √ | √ | √ | √ | √ | 0.8697 | 0.9136 |
Drop POI Category | Institution | Residence | Industry | Nature | Airport | Public Service | Retail | Hotel | Supermarket |
---|---|---|---|---|---|---|---|---|---|
Kappa | 0.8391 | 0.8345 | 0.8456 | 0.8479 | 0.8527 | 0.8443 | 0.8613 | 0.8634 | 0.8625 |
Decreasing rate of Kappa (%) | 3.52 | 4.05 | 2.77 | 2.51 | 1.95 | 2.96 | 0.97 | 0.72 | 0.83 |
Using all POIs | Kappa = 0.8697 |
RSI | POI | F1 Score | Kappa | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Transport | Commerce | Industry | Residence | Construction | Institution | Water | Open Space | |||
27,000 | √ | 0.8339 | 0.6051 | 0.5642 | 0.8655 | 0.3156 | 0.3912 | 0.8965 | 0.8749 | 0.7257 |
54,000 | × | 0.8271 | 0.3925 | 0.2313 | 0.8552 | 0.3119 | 0.1756 | 0.9125 | 0.7912 | 0.6913 |
80,800 | × | 0.8615 | 0.4349 | 0.3723 | 0.8964 | 0.3586 | 0.3146 | 0.9324 | 0.8153 | 0.7294 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, M.; Xu, H.; Zhou, F.; Xu, S.; Yin, H. A Deep-Learning-Based Multimodal Data Fusion Framework for Urban Region Function Recognition. ISPRS Int. J. Geo-Inf. 2023, 12, 468. https://doi.org/10.3390/ijgi12120468
Yu M, Xu H, Zhou F, Xu S, Yin H. A Deep-Learning-Based Multimodal Data Fusion Framework for Urban Region Function Recognition. ISPRS International Journal of Geo-Information. 2023; 12(12):468. https://doi.org/10.3390/ijgi12120468
Chicago/Turabian StyleYu, Mingyang, Haiqing Xu, Fangliang Zhou, Shuai Xu, and Hongling Yin. 2023. "A Deep-Learning-Based Multimodal Data Fusion Framework for Urban Region Function Recognition" ISPRS International Journal of Geo-Information 12, no. 12: 468. https://doi.org/10.3390/ijgi12120468