A Context-Aware, Computer-Vision-Based Approach for the Detection of Taxi Street-Hailing Scenes from Video Streams
Abstract
:1. Introduction
2. Related Work
2.1. Human–Autonomous Vehicle Interaction (HAVI) and Body Gesture Recognition
2.2. Predicting Intentions of Pedestrians from 2D Scenes
2.3. Identification of Taxi Street-Hhailing Behavior
3. Visual and Contextual Information
4. Proposed Method
4.1. First Stage: Detection of Objects of Interest
4.2. Second Stage: Extraction of the Visual Information
4.2.1. Person’s Position Estimation
4.2.2. Person’s Head Direction
4.2.3. Person Tracking and Hailing Gesture Detection
- Let A, B, C, and D be the key points for the shoulder, elbow, wrist, and hip, which respectively correspond to key points numbers 2, 3, 4, and 8 in Figure 4b, for the right side and numbers 5, 6, 7 and 11 for the left side, respectively.
- We define as the angle between hip, shoulder and elbow, so .
- We define as the angle between shoulder, elbow, and wrist, so where the angle between 3 points in 2-dimensional space is defined as follows:
- –
- –
- , arctan2 is the element-wise arc tangent of choosing the quadrant correctly.
- –
4.3. Final Stage: Scoring and Hailing Detection
- (1)
- Elements of the explicit street hailing are evaluated first, and if they are all detected, a street-hailing case is recognized.
- (2)
- If one or more elements of the visual information is/are not detected, contextual information is used in order to evaluate the existence of implicit street hailing.
- –
- Hailing gesture is scored as if detected and zero otherwise.
- –
- The score of the standing position is if the person is on the side road, if the person is close to the side road (e.g., on the road), and zero otherwise.
- –
- The score of the head direction is if the person is looking toward the road or the taxi and zero otherwise.
- –
- Spatiotemporal information: can go as high as 60 points in the case of a high-demand area and time and as low as zero contrariwise.
- –
- Meteorological information: can go as high as 30 points in the case of very bad weather and as low as zero contrariwise.
- –
- Event information: if there an event in the corresponding place–date and zero otherwise.
- –
- The maximum score is 100 points.
- –
- If the visual information reaches = 90 points, the person is classified as a street hailer, and no contextual information is used.
- –
- Otherwise, the contextual information is used as a percentage of the difference between 100 and the visual information points, e.g., if the visual information is 60 points and the contextual information is 50 points, the score is points.
5. Dataset and Experimental Settings
5.1. Datasets
5.2. Experiment Settings
6. Experimental Results and Discussion
- Accuracy: It measures the overall correctness of the model’s predictions and is defined by the following:
- Precision: It measures the proportion of positive predictions that are correct, and is defined by
- Recall (also known as sensitivity): It measures the proportion of actual positive cases that were correctly identified by the model. It is defined by
- Specificity: It measures the proportion of actual negative cases that were correctly identified by the model. It is defined by the ratio
6.1. Detection of Objects of Interest
6.2. Detection of Street Hailing
6.2.1. Example 1: Explicit StreetHailing
6.2.2. Example 2: Implicit Street Hailing
6.2.3. Example 3: No Street Hailing
6.2.4. Example 4: Undetected Street Hailing
6.3. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Faisal, A.; Kamruzzaman, M.; Yigitcanlar, T.; Currie, G. Understanding autonomous vehicles. J. Transp. Land Use 2019, 12, 45–72. [Google Scholar] [CrossRef]
- McFarland, M. Waymo to Expand Robotaxi Service to Los Angeles. Available online: https://edition.cnn.com/2022/10/19/business/waymo-los-angeles-rides/index.html (accessed on 4 March 2023).
- CBS NEWS. Robotaxis Are Taking over China’s Roads. Here’s How They Stack Up to the Old-Fashioned Version. Available online: https://www.cbsnews.com/news/china-robotaxis-self-driving-cabs-taking-over-cbs-test-ride (accessed on 11 December 2022).
- Hope, G. Hyundai Launches Robotaxi Trial with Its Own AV Tech. Available online: https://www.iotworldtoday.com/2022/06/13/hyundai-launches-robotaxi-trial-with-its-own-av-tech/ (accessed on 11 December 2022).
- Yonhap News. S. Korea to Complete Preparations for Level 4 Autonomous Car by 2024: Minister. Available online: https://en.yna.co.kr/view/AEN20230108002100320 (accessed on 4 March 2023).
- Bellan, R. Uber and Motional Launch Robotaxi Service in Las Vegas. Available online: https://techcrunch.com/2022/12/07/uber-and-motional-launch-robotaxi-service-in-las-vegas/ (accessed on 4 March 2023).
- npr. Driverless Taxis Are Coming to the Streets of San Francisco. Available online: https://www.npr.org/2022/06/03/1102922330/driverless-self-driving-taxis-san-francisco-gm-cruise (accessed on 11 December 2022).
- Bloomberg. Uber Launches Robotaxis But Driverless Fleet Is ‘Long Time’ Away. Available online: https://europe.autonews.com/automakers/uber-launches-robotaxis-us (accessed on 13 December 2022).
- Cozzens, T. DeepRoute.ai Unveils Autonomous ‘Robotaxi’ Fleet. Available online: https://www.gpsworld.com/deeproute-ai-unveils-autonomous-robotaxi-fleet/ (accessed on 11 December 2022).
- Kim, S.; Chang, J.J.E.; Park, H.H.; Song, S.U.; Cha, C.B.; Kim, J.W.; Kang, N. Autonomous taxi service design and user experience. Int. J. Hum.-Interact. 2020, 36, 429–448. [Google Scholar] [CrossRef]
- Lee, S.; Yoo, S.; Kim, S.; Kim, E.; Kang, N. Effect of robo-taxi user experience on user acceptance: Field test data analysis. Transp. Res. Rec. 2022, 2676, 350–366. [Google Scholar] [CrossRef]
- Hallewell, M.; Large, D.; Harvey, C.; Briars, L.; Evans, J.; Coffey, M.; Burnett, G. Deriving UX Dimensions for Future Autonomous Taxi Interface Design. J. Usability Stud. 2022, 17, 140–163. [Google Scholar]
- Anderson, D.N. The taxicab-hailing encounter: The politics of gesture in the interaction order. Semiotica 2014, 2014, 609–629. [Google Scholar] [CrossRef]
- Smith, T.; Vardhan, H.; Cherniavsky, L. Humanising Autonomy: Where are We Going; USTWO: London, UK, 2017. [Google Scholar]
- Wang, Z.; Lian, J.; Li, L.; Zhou, Y. Understanding Pedestrians’ Car-Hailing Intention in Traffic Scenes. Int. J. Automot. Technol. 2022, 23, 1023–1034. [Google Scholar] [CrossRef]
- Krueger, M.W. Artificial Reality II; Addison-Wesley: Boston, MA, USA, 1991. [Google Scholar]
- Ohn-Bar, E.; Trivedi, M.M. Hand gesture recognition in real time for automotive interfaces: A multimodal vision-based approach and evaluations. IEEE Trans. Intell. Transp. Syst. 2014, 15, 2368–2377. [Google Scholar] [CrossRef]
- Rasouli, A.; Tsotsos, J.K. Autonomous vehicles that interact with pedestrians: A survey of theory and practice. IEEE Trans. Intell. Transp. Syst. 2019, 21, 900–918. [Google Scholar] [CrossRef]
- Holzbock, A.; Tsaregorodtsev, A.; Dawoud, Y.; Dietmayer, K.; Belagiannis, V. A Spatio-Temporal Multilayer Perceptron for Gesture Recognition. arXiv 2022, arXiv:2204.11511. [Google Scholar]
- Martin, M.; Roitberg, A.; Haurilet, M.; Horne, M.; Reiß, S.; Voit, M.; Stiefelhagen, R. Drive&act: A multi-modal dataset for fine-grained driver behavior recognition in autonomous vehicles. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 2801–2810. [Google Scholar]
- Meyer, R.; Graf von Spee, R.; Altendorf, E.; Flemisch, F.O. Gesture-based vehicle control in partially and highly automated driving for impaired and non-impaired vehicle operators: A pilot study. In Proceedings of the International Conference on Universal Access in Human-Computer Interaction, Las Vegas, NV, USA, 15–20 July 2018; pp. 216–227. [Google Scholar]
- Rasouli, A.; Kotseruba, I.; Tsotsos, J.K. Towards social autonomous vehicles: Understanding pedestrian-driver interactions. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 729–734. [Google Scholar]
- Shaotran, E.; Cruz, J.J.; Reddi, V.J. Gesture Learning For Self-Driving Cars. In Proceedings of the 2021 IEEE International Conference on Autonomous Systems (ICAS), Montréal, QC, Canada, 11–13 August 2021; pp. 1–5. [Google Scholar]
- Hou, M.; Mahadevan, K.; Somanath, S.; Sharlin, E.; Oehlberg, L. Autonomous vehicle-cyclist interaction: Peril and promise. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020; pp. 1–12. [Google Scholar]
- Mishra, A.; Kim, J.; Cha, J.; Kim, D.; Kim, S. Authorized traffic controller hand gesture recognition for situation-aware autonomous driving. Sensors 2021, 21, 7914. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Li, B.; Gao, M. Skeleton-based Approaches based on Machine Vision: A Survey. arXiv 2020, arXiv:2012.12447. [Google Scholar]
- De Smedt, Q.; Wannous, H.; Vandeborre, J.P. Skeleton-based dynamic hand gesture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 27–30 June 2016; pp. 1–9. [Google Scholar]
- Brás, A.; Simão, M.; Neto, P. Gesture Recognition from Skeleton Data for Intuitive Human-Machine Interaction. arXiv 2020, arXiv:2008.11497. [Google Scholar]
- Chen, L.; Li, Y.; Liu, Y. Human body gesture recognition method based on deep learning. In Proceedings of the 2020 Chinese Control And Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 587–591. [Google Scholar]
- Nguyen, D.H.; Ly, T.N.; Truong, T.H.; Nguyen, D.D. Multi-column CNNs for skeleton based human gesture recognition. In Proceedings of the 2017 9th International Conference on Knowledge and Systems Engineering (KSE), Hue, Vietnam, 19–21 October 2017; pp. 179–184. [Google Scholar]
- Yuanyuan, S.; Yunan, L.; Xiaolong, F.; Kaibin, M.; Qiguang, M. Review of dynamic gesture recognition. Virtual Real. Intell. Hardw. 2021, 3, 183–206. [Google Scholar]
- Oudah, M.; Al-Naji, A.; Chahl, J. Hand gesture recognition based on computer vision: A review of techniques. J. Imaging 2020, 6, 73. [Google Scholar] [CrossRef]
- Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T.; Sukthankar, R.; Fei-Fei, L. Large-scale Video Classification with Convolutional Neural Networks. In Proceedings of the CVPR, Columbus, OH, USA, 23–38 June 2014. [Google Scholar]
- Rasouli, A.; Kotseruba, I.; Tsotsos, J.K. Understanding pedestrian behavior in complex traffic scenes. IEEE Trans. Intell. Veh. 2017, 3, 61–70. [Google Scholar] [CrossRef]
- Saleh, K.; Hossny, M.; Nahavandi, S. Real-time intent prediction of pedestrians for autonomous ground vehicles via spatio-temporal densenet. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 9704–9710. [Google Scholar]
- Gujjar, P.; Vaughan, R. Classifying Pedestrian Actions In Advance Using Predicted Video Of Urban Driving Scenes. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 2097–2103. [Google Scholar] [CrossRef]
- Shahroudy, A.; Liu, J.; Ng, T.T.; Wang, G. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1010–1019. [Google Scholar]
- Neogi, S.; Hoy, M.; Chaoqun, W.; Dauwels, J. Context based pedestrian intention prediction using factored latent dynamic conditional random fields. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; pp. 1–8. [Google Scholar] [CrossRef]
- Ross, J.I. Taxi driving and street culture. In Routledge Handbook of Street Culture; Routledge: Abingdon, UK, 2020. [Google Scholar]
- Matsubara, Y.; Li, L.; Papalexakis, E.; Lo, D.; Sakurai, Y.; Faloutsos, C. F-trail: Finding patterns in taxi trajectories. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, 14–17 April 2013; pp. 86–98. [Google Scholar]
- Hu, X.; An, S.; Wang, J. Taxi driver’s operation behavior and passengers’ demand analysis based on GPS data. J. Adv. Transp. 2018, 2018, 6197549. [Google Scholar] [CrossRef]
- Li, B.; Zhang, D.; Sun, L.; Chen, C.; Li, S.; Qi, G.; Yang, Q. Hunting or waiting? Discovering passenger-finding strategies from a large-scale real-world taxi dataset. In Proceedings of the 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), Kyoto, Japan, 11–15 March 2011; pp. 63–68. [Google Scholar]
- Yuan, J.; Zheng, Y.; Zhang, L.; Xie, X.; Sun, G. Where to find my next passenger. In Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, 17–21 September 2011; pp. 109–118. [Google Scholar]
- Zhang, D.; Sun, L.; Li, B.; Chen, C.; Pan, G.; Li, S.; Wu, Z. Understanding taxi service strategies from taxi GPS traces. IEEE Trans. Intell. Transp. Syst. 2014, 16, 123–135. [Google Scholar] [CrossRef]
- Kamga, C.; Yazici, M.A.; Singhal, A. Hailing in the rain: Temporal and weather-related variations in taxi ridership and taxi demand-supply equilibrium. In Proceedings of the Transportation Research Board 92nd Annual Meeting, Washington, DC, USA, 13–17 January 2013; Volume 1. [Google Scholar]
- Tong, Y.; Chen, Y.; Zhou, Z.; Chen, L.; Wang, J.; Yang, Q.; Ye, J.; Lv, W. The simpler the better: A unified approach to predicting original taxi demands based on large-scale online platforms. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1653–1662. [Google Scholar]
- Zhang, J.; Zheng, Y.; Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Boumeddane, S.; Hamdad, L.; Bouregag, A.A.E.F.; Damene, M.; Sadeg, S. A Model Stacking Approach for Ride-Hailing Demand Forecasting: A Case Study of Algiers. In Proceedings of the 2020 2nd International Workshop on Human-Centric Smart Environments for Health and Well-being (IHSH), Boumerdes, Algeria, 9–10 February 2021; pp. 16–21. [Google Scholar]
- Sun, Z.; Cao, S.; Yang, Y.; Kitani, K.M. Rethinking transformer-based set prediction for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3611–3620. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Elharrouss, O.; Akbari, Y.; Almaadeed, N.; Al-Maadeed, S. Backbones-review: Feature extraction networks for deep learning and deep reinforcement learning approaches. arXiv 2022, arXiv:2206.08016. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Wu, Y.; Kirillov, A.; Massa, F.; Lo, W.Y.; Girshick, R. Detectron2. Available online: https://github.com/facebookresearch/detectron2 (accessed on 11 April 2023).
- Dai, X. HybridNet: A fast vehicle detection system for autonomous driving. Signal Process. Image Commun. 2019, 70, 79–88. [Google Scholar] [CrossRef]
- Han, C.; Zhao, Q.; Zhang, S.; Chen, Y.; Zhang, Z.; Yuan, J. YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception. arXiv 2022, arXiv:2208.11434. [Google Scholar]
- Caesar, H.; Uijlings, J.; Ferrari, V. Coco-stuff: Thing and stuff classes in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1209–1218. [Google Scholar]
Accuracy | Precision | Recall | Specificity | |
---|---|---|---|---|
Hailing | 0.8 | 0.84 | 0.84 | 0.73 |
Implicit Hailing | N/A | N/A | 0.8 | N/A |
Explicit hailing | N/A | N/A | 0.85 | N/A |
Hailing | Head Direction | Standing Position |
---|---|---|
50 | 20 | 30 |
Visual Information | Hailing | Head Direction | Standing Position |
10 | 20 | 30 | |
Contextual Information | Spatiotemporal | Weather | Event |
60 | 30 | 0 |
Visual Information | Hailing | Head Direction | Standing Position |
0 | 10 | 30 | |
Contextual Information | Spatiotemporal | Meteorological | Event |
30 | 10 | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mastouri, M.; Bouyahia, Z.; Haddad, H.; Horchani, L.; Jabeur, N. A Context-Aware, Computer-Vision-Based Approach for the Detection of Taxi Street-Hailing Scenes from Video Streams. Sensors 2023, 23, 4796. https://doi.org/10.3390/s23104796
Mastouri M, Bouyahia Z, Haddad H, Horchani L, Jabeur N. A Context-Aware, Computer-Vision-Based Approach for the Detection of Taxi Street-Hailing Scenes from Video Streams. Sensors. 2023; 23(10):4796. https://doi.org/10.3390/s23104796
Chicago/Turabian StyleMastouri, Mahmoud, Zied Bouyahia, Hedi Haddad, Leila Horchani, and Nafaa Jabeur. 2023. "A Context-Aware, Computer-Vision-Based Approach for the Detection of Taxi Street-Hailing Scenes from Video Streams" Sensors 23, no. 10: 4796. https://doi.org/10.3390/s23104796