People Flow Trend Estimation Approach and Quantitative Explanation Based on the Scene Level Deep Learning of Street View Images
Abstract
:1. Introduction
- We developed an efficient scene-level end-to-end deep learning people flow trend estimation approach based on street view images and human subjective scores.
- We developed a subjective feeling score extraction method and implemented semantic segmentation to provide ancillary information for the model.
- We implemented Grad-CAM and combined it with the semantic segmentation and subjective score extraction results; we also developed a novel quantitative deep learning explanation approach to explain the discussed models and used L1-based sparse modeling to verify rationality.
2. Related Work
3. Data
3.1. People Flow Trend Data
3.2. Street View Image Data
4. Methodology
4.1. People Flow Trend Estimation
4.1.1. Data Preparation
4.1.2. People Flow Estimation with Deep Learning Algorithms
4.2. Ancillary Data Generation and Model Improvement
4.2.1. Subjective Score Extraction Dataset
4.2.2. Human Subjective Score Extraction and People Flow Estimation Model Improvement
4.2.3. Pixel Level Categories Information Extracted through Semantic Segmentation
4.3. Explanation of Deep Learning Processing and Results
4.3.1. Forward Objective and Subjective Explanation
4.3.2. Implementation of Grad-CAM and Proposal of Gradient Impact Method
4.3.3. Backward Objective and Subjective Explanation
5. Experiments and Results
5.1. People Flow Trend Estimation
5.1.1. Image Data Pre-Processing
5.1.2. Results of People Flow Trend Estimation
5.1.3. Comparation of Discrete Classification and Continuous Regression
5.2. Ancillary Data Generation and Model Improvement
5.2.1. Siamese Network Training
5.2.2. Human Subjective Score Extraction and People Flow Estimation Model Improvement
5.2.3. Pixel Level Categories Information Extracted by Semantic Segmentation
5.3. Explanation of Deep Learning Processing and Results
5.3.1. Forward Objective and Subjective Explanation
5.3.2. Grad-CAM Implementation
5.3.3. Backward Objective and Subjective Explanations
5.3.4. Rationality Analysis of Backward Objective and Subjective Explanations
6. Discussion
6.1. People Flow Trend Estimation
6.2. Ancillary Data Generation and Model Improvement
6.3. Explanation of Deep Learning Processing and Results
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yuan, J.; Zheng, Y.; Xie, X. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 186–194. [Google Scholar]
- Song, X.; Zhang, Q.; Sekimoto, Y.; Shibasaki, R. Prediction of human emergency behavior and their mobility following large-scale disaster. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 5–14. [Google Scholar]
- Akagi, Y.; Nishimura, T.; Kurashima, T.; Toda, H. A Fast and Accurate Method for Estimating People Flow from Spatiotemporal Population Data. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; pp. 3293–3300. [Google Scholar]
- Jia, J.S.; Lu, X.; Yuan, Y.; Xu, G.; Jia, J.; Christakis, N.A. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature 2020, 582, 389–394. [Google Scholar] [CrossRef] [PubMed]
- Li, C.; Wu, Z.; Zhu, L.; Liu, L.; Zhang, C. Changes of spatiotemporal pattern and network characteristic in population flow under COVID-19 epidemic. ISPRS Int. J. Geo-Inf. 2021, 10, 145. [Google Scholar] [CrossRef]
- Miranda, J.G.V.; Silva, M.S.; Bertolino, J.G.; Vasconcelos, R.N.; Cambui, E.C.B.; Araújo, M.L.V.; Saba, H.; Costa, D.P.; Duverger, S.G.; de Oliveira, M.T.; et al. Scaling effect in COVID-19 spreading: The role of heterogeneity in a hybrid ODE-network model with restrictions on the inter-cities flow. Phys. D Nonlinear Phenom. 2021, 415, 132792. [Google Scholar] [CrossRef] [PubMed]
- Witayangkurn, A.; Horanont, T.; Shibasaki, R. The design of large scale data management for spatial analysis on mobile phone dataset. Asian J. Geoinform. 2013, 13. [Google Scholar]
- Terada, M.; Nagata, T.; Kobayashi, M. Population estimation technology for mobile spatial statistics. NTT DOCOMO Tech. J. 2013, 14, 10–15. [Google Scholar]
- Liu, K. Spatiotemporal analysis of human mobility in Manila metropolitan area with person-trip data. Urban Sci. 2018, 2, 3. [Google Scholar] [CrossRef] [Green Version]
- Chen, X.; Wan, X.; Ding, F.; Li, Q.; McCarthy, C.; Cheng, Y.; Ran, B. Data-driven prediction system of dynamic people-flow in large urban network using cellular probe data. J. Adv. Transp. 2019, 9401630. [Google Scholar]
- Tanaka, Y.; Iwata, T.; Kurashima, T.; Toda, H.; Ueda, N. Estimating Latent People Flow without Tracking Individuals. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; pp. 3556–3563. [Google Scholar]
- Sato, D.; Matsubayashi, T.; Nagano, S.; Toda, H. People flow prediction by multi-agent simulator. In Proceedings of the 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), Kyoto, Japan, 27 February–2 March 2019; pp. 1–4. [Google Scholar]
- Hara, Y.; Uchiyama, A.; Umedu, T.; Higashino, T. Sidewalk-level people flow estimation using dashboard cameras based on deep learning. In Proceedings of the 2018 Eleventh International Conference on Mobile Computing and Ubiquitous Network (ICMU), Auckland, New Zealand, 5–8 October 2018; pp. 1–6. [Google Scholar]
- Tianmin, X.; Xiaochun, L.; Xingwen, Z.; Junyan, C.; Yizhou, F. Design of people flow monitoring system in public place based on md-mcnn. J. Phys. Conf. Ser. 2020, 1606, 012012. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Lloyd, C.T.; Sorichetta, A.; Tatem, A.J. High resolution global gridded data for use in population studies. Sci. Data 2017, 4, 1–17. [Google Scholar] [CrossRef] [Green Version]
- Balk, D.; Yetman, G. The Global Distribution of Population: Evaluating the Gains in Resolution Refinement; Center for International Earth Science Information Network (CIESIN), Columbia University: New York, MY, USA, 2004. [Google Scholar]
- Deichmann, U.; Balk, D.; Yetman, G. Transforming Population Data for Interdisciplinary Usages: From Census to Grid; Center for International Earth Science Information Network: Washington, DC, USA, 2001. [Google Scholar]
- Balk, D.L.; Deichmann, U.; Yetman, G.; Pozzi, F.; Hay, S.I.; Nelson, A. Determining global population distribution: Methods, applications and data. Adv. Parasitol. 2006, 62, 119–156. [Google Scholar]
- Eicher, C.L.; Brewer, C.A. Dasymetric mapping and areal interpolation: Implementation and evaluation. Cartogr. Geogr. Inf. Sci. 2001, 28, 125–138. [Google Scholar] [CrossRef]
- Stevens, F.R.; Gaughan, A.E.; Linard, C.; Tatem, A.J. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PLoS ONE 2015, 10, e0107042. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Osaragi, T.; Kudo, R. Upgrading Spatiotemporal Demographic Data by the Integration of Detailed Population Attributes. Adv. Cartogr. GIScience ICA 2021, 3. [Google Scholar] [CrossRef]
- Lwin, K.K.; Sugiura, K.; Zettsu, K. Space–time multiple regression model for grid-based population estimation in urban areas. Int. J. Geogr. Inf. Sci. 2016, 30, 1579–1593. [Google Scholar] [CrossRef]
- Zhang, L.; Ma, T.; Du, Y.; Tao, P.; Hui, P. Mapping hourly dynamics of urban population using trajectories reconstructed from mobile phone records. Trans. GIS 2018, 22, 494–513. [Google Scholar]
- Yao, Y.; Liu, X.; Li, X.; Zhang, J.; Liang, Z.; Mai, K.; Zhang, Y. Mapping fine-scale population distributions at the building level by integrating multisource geospatial big data. Int. J. Geogr. Inf. Sci. 2017, 31, 1220–1244. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar] [CrossRef] [Green Version]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Yu, J.; Wang, Z.; Vasudevan, V.; Yeung, L.; Seyedhosseini, M.; Wu, Y. Coca: Contrastive captioners are image-text foundation models. arXiv 2022, arXiv:2205.01917. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October2017; pp. 2961–2969. [Google Scholar]
- Zhang, Y.; Zhou, D.; Chen, S.; Gao, S.; Ma, Y. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 589–597. [Google Scholar]
- Wang, Z.J.; Turko, R.; Shaikh, O.; Park, H.; Das, N.; Hohman, F.; Chau, D.H.P. Cnn explainer: Learning convolutional neural networks with interactive visualization. IEEE Trans. Vis. Comput. Graph. 2020, 27, 1396–1406. [Google Scholar] [CrossRef] [PubMed]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 818–833. [Google Scholar]
- Levering, A.; Marcos, D.; Tuia, D. On the relation between landscape beauty and land cover: A case study in the UK at Sentinel-2 resolution with interpretable AI. ISPRS J. Photogramm. Remote. Sens. 2021, 177, 194–203. [Google Scholar] [CrossRef]
- Oki, T.; Kizawa, S. Evaluating Visual Impressions Based on Gaze Analysis and Deep Learning: A Case Study of Attractiveness Evaluation of Streets in Densely Built-Up Wooden Residential Area. Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2021, 43, 887–894. [Google Scholar] [CrossRef]
- Xu, X.; Qiu, W.; Li, W.; Liu, X.; Zhang, Z.; Li, X.; Luo, D. Associations between Street-View Perceptions and Housing Prices: Subjective vs. Objective Measures Using Computer Vision and Machine Learning Techniques. Remote. Sens. 2022, 14, 891. [Google Scholar] [CrossRef]
- Dubey, A.; Naik, N.; Parikh, D.; Raskar, R.; Hidalgo, C.A. Deep learning the city: Quantifying urban perception at a global scale. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland; pp. 196–212. [Google Scholar]
- Briggs, D.J.; Gulliver, J.; Fecht, D.; Vienneau, D.M. Dasymetric modelling of small-area population distribution using land cover and light emissions data. Remote. Sens. Environ. 2007, 108, 451–466. [Google Scholar] [CrossRef]
- Wang, M.; Deng, W. Deep visual domain adaptation: A survey. Neurocomputing 2018, 312, 135–153. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. arXiv 2022, arXiv:2201.03545. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Daudt, R.C.; Le Saux, B.; Boulch, A. Fully convolutional siamese networks for change detection. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4063–4067. [Google Scholar]
- Dong, X.; Shen, J. Triplet loss in siamese network for object tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 459–474. [Google Scholar]
- Xiao, T.; Liu, Y.; Zhou, B.; Jiang, Y.; Sun, J. Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 418–434. [Google Scholar]
- Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 633–641. [Google Scholar]
- Suzumura, T.; Sugiki, A.; Takizawa, H.; Imakura, A.; Nakamura, H.; Taura, K.; Uchibayashi, T. mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations. arXiv 2022, arXiv:2203.14188. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Helbich, M.; Yao, Y.; Liu, Y.; Zhang, J.; Liu, P.; Wang, R. Using deep learning to examine street view green and blue spaces and their associations with geriatric depression in Beijing, China. Environ. Int. 2019, 126, 107–117. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Cai, B.Y.; Li, X.; Seiferling, I.; Ratti, C. Treepedia 2.0: Applying deep learning for large-scale quantification of urban tree cover. In Proceedings of the 2018 IEEE International Congress on Big Data (BigData Congress), San Francisco, CA, USA, 2–7 July 2018; pp. 49–56. [Google Scholar]
- Zhang, F.; Wu, L.; Zhu, D.; Liu, Y. Social sensing from street-level imagery: A case study in learning spatio-temporal urban mobility patterns. ISPRS J. Photogramm. Remote Sens. 2019, 153, 48–58. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Chen, C.; Li, H.; Luo, W.; Xie, J.; Yao, J.; Wu, L.; Xia, Y. Predicting the effect of street environment on residents’ mood states in large urban areas using machine learning and street view images. Sci. Total. Environ. 2022, 816, 151605. [Google Scholar] [CrossRef] [PubMed]
- Moya, L.; Muhari, A.; Adriano, B.; Koshimura, S.; Mas, E.; Marval-Perez, L.R.; Yokoya, N. Detecting urban changes using phase correlation and 1-based sparse model for early disaster response: A case study of the 2018 Sulawesi Indonesia earthquake-tsunami. Remote. Sens. Environ. 2020, 242, 111743. [Google Scholar] [CrossRef]
- Oki, T.; Ogawa, Y. Model for Estimation of Building Structure and Built Year Using Building Facade Images and Attributes Obtained from a Real Estate Database. In Urban Informatics and Future Cities; Springer: Cham, Switzerland, 2021; pp. 549–573. [Google Scholar]
- Wang, M.; Vermeulen, F. Life between buildings from a street view image: What do big data analytics reveal about neighbourhood organisational vitality? Urban Stud. 2021, 58, 3118–3139. [Google Scholar] [CrossRef]
- Mittal, S. A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform. J. Syst. Archit. 2019, 97, 428–442. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. arXiv 2021, arXiv:2111.06377. [Google Scholar]
- Nagata, S.; Nakaya, T.; Hanibuchi, T.; Amagasa, S.; Kikuchi, H.; Inoue, S. Objective scoring of streetscape walkability related to leisure walking: Statistical modeling approach with semantic segmentation of Google Street View images. Health Place 2020, 66, 102428. [Google Scholar] [CrossRef]
- Frank, L.D.; Sallis, J.F.; Saelens, B.E.; Leary, L.; Cain, K.; Conway, T.L.; Hess, P.M. The development of a walkability index: Application to the Neighborhood Quality of Life Study. Br. J. Sport. Med. 2010, 44, 924–933. [Google Scholar] [CrossRef] [PubMed]
- Lefebvre-Ropars, G.; Morency, C.; Singleton, P.A.; Clifton, K.J. Spatial transferability assessment of a composite walkability index: The Pedestrian Index of the Environment (PIE). Transp. Res. Transp. Environ. 2017, 57, 378–391. [Google Scholar] [CrossRef] [Green Version]
- Buck, C.; Pohlabeln, H.; Huybrechts, I.; De Bourdeaudhuij, I.; Pitsiladis, Y.; Reisch, L.; Pigeot, I. Development and application of a moveability index to quantify possibilities for physical activity in the built environment of children. Health Place 2011, 17, 1191–1201. [Google Scholar] [CrossRef] [PubMed]
Networks | Params (M) | FLOPs (G) |
---|---|---|
ResNet-101 | 44.55 | 7.85 |
Swin transformer small | 49.61 | 8.52 |
Swin transformer base | 87.77 | 15.14 |
ConvNeXt base | 88.59 | 15.36 |
Approach | Recall | Precision | mF1 | Accuracy |
---|---|---|---|---|
ResNet-101 | 0.4974 | 0.4888 | 0.4898 | 0.4894 |
Swin-S | 0.7272 | 0.7242 | 0.7255 | 0.7174 |
Swin-B | 0.7676 | 0.7669 | 0.7672 | 0.7584 |
ConvNext-B | 0.7904 | 0.7883 | 0.7892 | 0.7812 |
Swin-B, total | 0.7802 | 0.7059 | 0.7367 | 0.7106 |
ConvNext-B, total | 0.7981 | 0.7211 | 0.7528 | 0.7271 |
Object | Recall | Precision | mF1 | Accuracy |
---|---|---|---|---|
Day/Stay | 0.6544 | 0.6566 | 0.6552 | 0.6444 |
Day/Move | 0.7147 | 0.7154 | 0.7146 | 0.7027 |
Night/Stay | 0.6983 | 0.6988 | 0.6983 | 0.6866 |
Night/Move | 0.7093 | 0.7113 | 0.7100 | 0.6996 |
Original + Subjective | Recall | Precision | mF1 | Accuracy |
---|---|---|---|---|
Overall | 0.7921 | 0.7909 | 0.7914 | 0.7832 |
Day/Stay | 0.6559 | 0.6572 | 0.6561 | 0.6462 |
Day/Move | 0.7155 | 0.7194 | 0.7169 | 0.7040 |
Night/Stay | 0.6995 | 0.7015 | 0.6999 | 0.6880 |
Night/Move | 0.7109 | 0.7132 | 0.7117 | 0.7012 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, C.; Ogawa, Y.; Chen, S.; Oki, T.; Sekimoto, Y. People Flow Trend Estimation Approach and Quantitative Explanation Based on the Scene Level Deep Learning of Street View Images. Remote Sens. 2023, 15, 1362. https://doi.org/10.3390/rs15051362
Zhao C, Ogawa Y, Chen S, Oki T, Sekimoto Y. People Flow Trend Estimation Approach and Quantitative Explanation Based on the Scene Level Deep Learning of Street View Images. Remote Sensing. 2023; 15(5):1362. https://doi.org/10.3390/rs15051362
Chicago/Turabian StyleZhao, Chenbo, Yoshiki Ogawa, Shenglong Chen, Takuya Oki, and Yoshihide Sekimoto. 2023. "People Flow Trend Estimation Approach and Quantitative Explanation Based on the Scene Level Deep Learning of Street View Images" Remote Sensing 15, no. 5: 1362. https://doi.org/10.3390/rs15051362
APA StyleZhao, C., Ogawa, Y., Chen, S., Oki, T., & Sekimoto, Y. (2023). People Flow Trend Estimation Approach and Quantitative Explanation Based on the Scene Level Deep Learning of Street View Images. Remote Sensing, 15(5), 1362. https://doi.org/10.3390/rs15051362