SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions
Abstract
:1. Introduction
- A visual place recognition pipeline combining global scene semantics with appearance-based local correspondences.
- A robust feature extraction method is presented that extracts visually distinctive local features across all the feature maps obtained from the higher layer of CNNs. Such features possess scale- and viewpoint-invariant properties.
- A novel semantic visual and spatial information-based place matching method that utilizes distinctive local key correspondences between the image pairs for robust visual place recognition in extreme seasonal, light, and viewpoint variations.
- The SVS-VPR successfully attains a significant recall rate at 100% precision against state-of-the-art methods on challenging benchmark datasets.
2. Related Works
2.1. Appearance-Based Methods
2.2. Deep Learning-Based Methods
2.3. Semantics-Based Methods
3. Proposed Method
3.1. Global Semantics-Based Scene Matching
3.1.1. Semantics Extraction
3.1.2. Global Scene Descriptor Computation and Matching
3.2. Local Feature-Based Appearance Matching
3.2.1. Distinctive CNN Point Feature Extraction
3.2.2. Semantic–Visual–Spatial Matching
4. Experimental Results
4.1. Implementation Setup
4.2. Datasets
4.2.1. Mapillary
4.2.2. Oxford Robotcar
4.2.3. Synthia
4.3. Ablation Study
Global Scene Similarity-Based Candidate Selection
4.4. VPR Performance Analysis
4.5. Computational Cost Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lowry, S.; Sünderhauf, N.; Newman, P.; Leonard, J.J.; Cox, D.; Corke, P.; Milford, M.J. Visual Place Recognition: A Survey. IEEE Trans. Robot. 2016, 32, 1–19. [Google Scholar] [CrossRef]
- Arshad, S.; Kim, G.W. Role of deep learning in loop closure detection for visual and lidar slam: A survey. Sensors 2021, 21, 1243. [Google Scholar] [CrossRef]
- Sünderhauf, N.; Shirazi, S.; Jacobson, A.; Dayoub, F.; Pepperell, E.; Upcroft, B.; Milford, M. Place recognition with ConvNet landmarks: Viewpoint-robust, condition-robust, training-free. In Robotics: Science and Systems XI; Sapienza University of Rome: Rome, Italy, 2015. [Google Scholar]
- Sünderhauf, N.; Shirazi, S.; Dayoub, F.; Upcroft, B.; Milford, M. On the performance of ConvNet features for place recognition. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Hamburg, Germany, 28 September–2 October 2015; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2015; pp. 4297–4304. [Google Scholar] [CrossRef]
- Angeli, A.; Filliat, D.; Doncieux, S.; Meyer, J.A. Fast and incremental method for loop-closure detection using bags of visual words. IEEE Trans. Robot. 2008, 24, 1027–1037. [Google Scholar] [CrossRef]
- Cummins, M.; Newman, P. FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance. Int. J. Robot. Res. 2008, 27, 647–665. [Google Scholar] [CrossRef]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5297–5307. [Google Scholar]
- Lowry, S.; Andreasson, H. Lightweight, Viewpoint-Invariant Visual Place Recognition in Changing Environments. IEEE Robot. Autom. Lett. 2018, 3, 57–964. [Google Scholar] [CrossRef]
- Khaliq, A.; Ehsan, S.; Chen, Z.; Milford, M.; McDonald-Maier, K. A Holistic Visual Place Recognition Approach Using Lightweight CNNs for Significant ViewPoint and Appearance Changes. IEEE Trans. Robot. 2020, 36, 561–569. [Google Scholar] [CrossRef]
- Yu, X.; Chaturvedi, S.; Feng, C.; Taguchi, Y.; Lee, T.Y.; Fernandes, C.; Ramalingam, S. VLASE: Vehicle Localization by Aggregating Semantic Edges. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Madrid, Spain, 1–5 October 2018; pp. 3196–3203. [Google Scholar] [CrossRef]
- Benbihi, A.; Arravechia, S.; Geist, M.; Pradalier, C. Image-Based Place Recognition on Bucolic Environment Across Seasons from Semantic Edge Description. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 3032–3038. [Google Scholar] [CrossRef]
- Gawel, A.; Don, C.D.; Siegwart, R.; Nieto, J.; Cadena, C. X-View: Graph-Based Semantic Multiview Localization. IEEE Robot. Autom. Lett. 2018, 3, 1687–1694. [Google Scholar] [CrossRef]
- Guo, X.; Hu, J.; Chen, J.; Deng, F.; Lam, T.L. Semantic Histogram Based Graph Matching for Real-Time Multi-Robot Global Localization in Large Scale Environment. IEEE Robot. Autom. Lett. 2021, 6, 8349–8356. [Google Scholar] [CrossRef]
- Lin, G.; Liu, F.; Milan, A.; Shen, C.; Reid, I. RefineNet: Multi-Path Refinement Networks for Dense Prediction. IEEE Trans. Pattern. Anal. Mach. Intell. 2020, 42, 1228–1242. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Gool, L.V. SURF: Speeded Up Robust Features. In Proceedings of the Computer Vision and Image Understanding, Graz, Austria, 7–13 May 2006; Springer: Berlin/Heidelberg, Germany, 2008; pp. 346–359. [Google Scholar] [CrossRef]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2564–2571. [Google Scholar] [CrossRef]
- Gálvez-López, D.; Tardós, J.D. Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 2012, 28, 1188–1197. [Google Scholar] [CrossRef]
- Garcia-Fidalgo, E.; Ortiz, A. IBoW-LCD: An Appearance-Based Loop-Closure Detection Approach Using Incremental Bags of Binary Words. IEEE Robot. Autom. Lett. 2018, 3, 3051–3057. [Google Scholar] [CrossRef]
- Zaffar, M.; Ehsan, S.; Milford, M.; McDonald-Maier, K. CoHOG: A light-weight, compute-efficient, and training-free visual place recognition technique for changing environments. IEEE Robot. Autom. Lett. 2020, 5, 1835–1842. [Google Scholar] [CrossRef]
- Arshad, S.; Kim, G.W. A Robust Feature Matching Strategy for Fast and Effective Visual Place Recognition in Challenging Environmental Conditions. Int. J. Control. Autom. Syst. 2023, 21, 948–962. [Google Scholar] [CrossRef]
- Chen, Z.; Lam, O.; Jacobson, A.; Milford, M. Convolutional Neural Network-based Place Recognition. In Proceedings of the 16th Australasian Conference on Robotics and Automation 2014. Australasian Robotics and Automation Association. arXiv 2014, arXiv:1411.1509. [Google Scholar]
- Merrill, N.; Huang, G. Lightweight Unsupervised Deep Loop Closure. In Robotics: Science and Systems, Pittsburgh, PA: arXiv, May 2018. arXiv 2018, arXiv:1805.07703. [Google Scholar]
- Gao, X.; Zhang, T. Unsupervised learning to detect loops using deep neural networks for visual SLAM system. Auton. Robots 2017, 41, 1–18. [Google Scholar] [CrossRef]
- Hou, Y.; Zhang, H.; Zhou, S. Convolutional neural network-based image representation for visual loop closure detection. In Proceedings of the 2015 IEEE International Conference on Information and Automation, Lijiang, China, 8–10 August 2015; ICIA 2015—In Conjunction with 2015 IEEE International Conference on Automation and Logistics. Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2015; pp. 2238–2245. [Google Scholar] [CrossRef]
- Cai, Y.; Zhao, J.; Cui, J.; Zhang, F.; Feng, T.; Ye, C. Patch-NetVLAD+: Learned patch descriptor and weighted matching strategy for place recognition. In Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Bedford, UK, 20–22 September 2022; Volume 2022. [Google Scholar] [CrossRef]
- Hausler, S.; Garg, S.; Xu, M.; Milford, M.; Fischer, T. Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14141–14152. [Google Scholar]
- Keetha, N.; Mishra, A.; Karhade, J.; Jatavallabhula, K.M.; Scherer, S.; Krishna, M.; Garg, S. Anyloc: Towards universal visual place recognition. IEEE Robot. Autom. Lett. 2023, 9, 1286–1293. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Chen, Z.; Maffra, F.; Sa, I.; Chli, M. Only look once, mining distinctive landmarks from ConvNet for visual place recognition. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Vancouver, BC, Canada, 24–28 September 2017; pp. 9–16. [Google Scholar] [CrossRef]
- Rocco, I.; Arandjelovi, R.; Sivic, J.; Ens, D. Convolutional Neural Network Architecture for Geometric Matching. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6148–6157. [Google Scholar]
- Han, K.; Rezende, R.S.; Ham, B.; Wong, K.Y.K.; Cho, M.; Schmid, C.; Ponce, J. SCNet: Learning Semantic Correspondence. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1831–1840. [Google Scholar]
- Hariharan, B.; Arbelaez, P.; Girshick, R.; Malik, J. Hypercolumns for Object Segmentation and Fine-Grained Localization. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 447–456. [Google Scholar]
- Lu, F.; Chen, B.; Zhou, X.D.; Song, D. STA-VPR: Spatio-Temporal Alignment for Visual Place Recognition. IEEE Robot. Autom. Lett. 2021, 6, 4297–4304. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Naseer, T.; Oliveira, G.L.; Brox, T.; Burgard, W. Semantics-aware visual localization under challenging perceptual conditions. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2017; pp. 2614–2620. [Google Scholar] [CrossRef]
- Garg, S.; Jacobson, A.; Kumar, S.; Milford, M. Improving condition- and environment-invariant place recognition with semantic place categorization. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 6863–6870. [Google Scholar] [CrossRef]
- Hou, Y.; Zhang, H.; Zhou, S.; Zou, H. Use of Roadway Scene Semantic Information and Geometry–Preserving Landmark Pairs to Improve Visual Place Recognition in Changing Environments. IEEE Access 2017, 5, 7702–7713. [Google Scholar] [CrossRef]
- Garg, S.; Sunderhauf, N.; Milford, M.; Suenderhauf, N. LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics. arXiv 2018, arXiv:1804.05526. [Google Scholar]
- Wu, P.; Wang, J.; Wang, C.; Zhang, L.; Wang, Y. A novel fusing semantic- and appearance-based descriptors for visual loop closure detection. Optik 2021, 243, 167230. [Google Scholar] [CrossRef]
- Chen, B.; Song, X.; Shen, H.; Lu, T. Hierarchical Visual Place Recognition Based on Semantic-Aggregation. Appl. Sci. 2021, 11, 9540. [Google Scholar] [CrossRef]
- Singh, G.; Wu, M.; Lam, S.K.; Minh, D.V. Hierarchical Loop Closure Detection for Long-term Visual SLAM with Semantic-Geometric Descriptors. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Indianapolis, IN, USA, 19–22 September 2021; Proceedings, ITSC. Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2021; pp. 2909–2916. [Google Scholar] [CrossRef]
- Yuan, Z.; Xu, K.; Zhou, X.; Deng, B.; Ma, Y. SVG-Loop: Semantic–Visual–Geometric Information-Based Loop Closure Detection. Remote Sens. 2021, 13, 3520. [Google Scholar] [CrossRef]
- Hou, P.; Chen, J.; Nie, J.; Liu, Y.; Zhao, J. Forest: A Lightweight Semantic Image Descriptor for Robust Visual Place Recognition. IEEE Robot. Autom. Lett. 2022, 7, 12531–12538. [Google Scholar] [CrossRef]
- Cummins, M.; Newman, P. Appearance-only SLAM at large scale with FAB-MAP 2.0. Int. J. Robot. Res. 2011, 30, 1100–1123. [Google Scholar] [CrossRef]
- Tolias, G.; Sicre, R.; Jégou, H. Particular object retrieval with integral max-pooling of CNN activations. arXiv 2015, arXiv:1511.05879. [Google Scholar]
- Kryszkiewicz, M. The cosine similarity in terms of the euclidean distance. In Encyclopedia of Business Analytics and Optimization; IGI Global: Hershey, PA, USA, 2014; pp. 2498–2508. [Google Scholar]
- Maddern, W.; Pascoe, G.; Linegar, C.; Newman, P. 1 year, 1000 km: The Oxford RobotCar dataset. Int. J. Robot. Res. 2017, 36, 3–15. [Google Scholar] [CrossRef]
- Ros, G.; Sellart, L.; Materzynska, J.; Vazquez, D.; Lopez, A.M. The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3234–3243. [Google Scholar]
- Milford, M.J.; Wyeth, G.F. SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In Proceedings of the IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2012; pp. 1643–1649. [Google Scholar] [CrossRef]
- Pepperell, E.; Corke, P.I.; Milford, M.J. All-environment visual place recognition with SMART. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May 2014–7 June 2014; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2014; pp. 1612–1618. [Google Scholar] [CrossRef]
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 405–420. [Google Scholar]
Category | Semantic Classes | Labels |
---|---|---|
Static Objects | Road, Building, Vegetation, Pole, Traffic_Light, Traffic_Sign, Terrain, Wall, Fence, and Sidewalk | 0, 1, ..., 9 |
Dynamic Objects | Sky, Person, Rider, Car, Truck, Bus, Train, Motorcycle, Bicycle, and Void | 10, 11, ..., 18, 255 |
Methods | Mapillary | Oxford RobotCar | Synthia | ||||||
---|---|---|---|---|---|---|---|---|---|
Berlin A100 | Berlin Kudamm | Summer- Autumn | Summer- Winter | Autumn- Winter | Day- Night | Spring- Summer | Spring- Winter | Dawn- Night | |
NetVLAD | 0.74 | 0.35 | 0.65 | 0.75 | 0.71 | 0.33 | 0.31 | 0.72 | 0.86 |
LoSTX | 0.78 | 0.27 | 0.73 | 0.78 | 0.73 | 0.34 | 0.29 | 0.79 | 0.87 |
STA-VPR | 0.53 | 0.16 | 0.69 | 0.81 | 0.72 | 0.37 | 0.36 | 0.88 | 0.89 |
Sem-VPR (ours) | 0.84 | 0.49 | 0.76 | 0.82 | 0.77 | 0.41 | 0.40 | 0.92 | 0.91 |
Components of Proposed Method | Mapillary | Oxford Robotcar | Synthia |
---|---|---|---|
Pixel-wise Semantic Segmentation | 468.9 | 464.9 | 620.2 |
Global Semantics-based Scene Matching | 0.027 | 0.029 | 0.041 |
Local Feature-based Appearance Matching | 0.107 | 0.258 | 0.242 |
Methods | Mapillary | Oxford Robotcar | Synthia |
---|---|---|---|
NetVLAD | 0.910 | 1.953 | 1.42 |
LoSTX | 0.685 | 1.052 | 1.491 |
STA-VPR | 0.557 | 0.702 | 0.888 |
SVS-VPR | 0.469 | 0.465 | 0.620 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arshad, S.; Park, T.-H. SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions. Sensors 2024, 24, 906. https://doi.org/10.3390/s24030906
Arshad S, Park T-H. SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions. Sensors. 2024; 24(3):906. https://doi.org/10.3390/s24030906
Chicago/Turabian StyleArshad, Saba, and Tae-Hyoung Park. 2024. "SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions" Sensors 24, no. 3: 906. https://doi.org/10.3390/s24030906
APA StyleArshad, S., & Park, T.-H. (2024). SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions. Sensors, 24(3), 906. https://doi.org/10.3390/s24030906