Deep Neural Networks for Road Sign Detection and Embedded Modeling Using Oblique Aerial Images
Abstract
:1. Introduction
- (1)
- The limited number of 3D points is not sufficient to generate complete models of road signs with continuous surfaces due to image resolution and the accuracy of feature matching algorithms. Low-quality oblique aerial images also produce a blurred texture of road signs.
- (2)
- The sliced shape road signs are too thin to distinguish both sides. Thus, the cloud points are merged into a whole part and cannot be separated easily for meshing.
2. Related Work
2.1. Oblique Photogrammetry-Based Modeling and 3D Scene Augmentation
2.2. Road Sign Detection
2.3. Small Object Detection with Balanced Learning and Guided Anchoring
3. Method
- (1)
- Data synthesis and road sign detection. We present an end-to-end balance-learning framework for small object detection that takes advantage of the region-based CNN and a data synthesis strategy. First, data synthesis and augmentation are applied to moderate the negative effects caused by object imbalance in the training dataset. Second, a region-based CNN that combines balanced learning with guided anchoring strategies is used to enhance the detection results, as illustrated by comparative experiments.
- (2)
- Road sign 3D localization and orientation. Stereo vision of multiple image sets is used for 3D localization and orientation. First, all oblique aerial images containing the same road signs are grouped according to the imaging similarity. Second, under the geometric constraints of the bounding boxes, we use the SIFT feature to extract the corresponding points on the road signs of an image set. Third, we apply triangulation to the corresponding points in each image group to obtain 3D points. The triangulation results from all image groups are merged to generate the approximate location of a single 3D road sign. Additionally, we remove sparse outliers through statistical analysis to refine the 3D point cloud for a more precise 3D location. Finally, least-squares fitting is applied to the refined point cloud to fit a plane for orientation prediction. The fitted plane should be vertical to the known surface of the street and indicate the orientation of a road sign.
- (3)
- Model matching and embedding. On the basis of the classification results, we retrieve computer-aided design models of road signs from the database via template and texture matching and then embed them in 3D urban scenes with the predicted location and orientation.
3.1. Road Sign Detection
3.1.1. Imbalance Problems in the Training Dataset
- (1)
- A collection of computer-aided design 3D road sign models is created via the 3D modeling software 3DSMax. Considering the lighting conditions, we rotate the 3D road sign models in three dimensions to generate road sign masks.
- (2)
- According to the statistical information on the number of road signs within each category in the original training dataset, we render the road sign mask, which is scaled to the appropriate size on random background images, to create synthetic training images.
- (3)
- We count the number of signs in each category in the synthetic training images to keep the number of oblique aerial images and objects in the training dataset as balanced as possible.
- (4)
- We augment the synthetic data via rotation, flip and gamma transformations to avoid overfitting and to improve the detection accuracy.
3.1.2. Imbalance Problems in the Training Process
3.2. Road Sign 3D Localization and Orientation
- (1)
- Group all oblique aerial images containing the same road signs according to imaging conditions, such as shooting angles and lighting conditions.
- (2)
- Search corresponding points in image groups. Under the geometric constraints placed by the bounding boxes, the SIFT feature and optimized brute-force matcher are applied to extract corresponding points.
- (3)
- Apply corresponding points for triangulation to obtain a coarse location and refine it via outlier removal. Then, merge 3D points of the same road signs to obtain the location of a single 3D road sign. Finally, we fit a plane to the refined point cloud using least-squares fitting to estimate the orientation.
3.2.1. Image Grouping
3.2.2. Corresponding Point Extraction with Geometric Constraints
3.2.3. Triangulation and Unrefined Location
3.2.4. Refined 3D Location and Orientation
3.3. Road Sign Model Matching and Embedding
3.4. Evaluation Metric
4. Experiments
4.1. Aerial Oblique Image Collection and Preprocessing
4.2. Road Sign Detection
4.2.1. CNN Model Training Setup
4.2.2. Detection Result
4.3. Road Sign Embedded Modeling
4.3.1. Extracting Corresponding Points with Geometric Constraints in Image Groups
4.3.2. Localization and Orientation
4.3.3. Model Embedding
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Toschi, I.; Ramos, M.M.; Nocerino, E.; Menna, F.; Remondino, F.; Moe, K.; Poli, D.; Legat, K.; Fassi, F. Oblique photogrammetry supporting 3D urban reconstruction of complex scenarios. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, XLII-1/W1, 519–526. [Google Scholar] [CrossRef] [Green Version]
- Liu, J.; Guo, B. Reconstruction and simplification of urban scene models based on oblique images. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2014, XL-3, 197–204. [Google Scholar] [CrossRef] [Green Version]
- Yalcin, G.; Selcuk, O. 3D City Modelling with Oblique Photogrammetry Method. Procedia Technol. 2015, 19, 424–431, In Proceedings of the 8th International Conference Interdisciplinarity in Engineering, INTER-ENG 2014, Tirgu Mures, Romania, 9–10 October 2014. [Google Scholar] [CrossRef] [Green Version]
- Sumi, L.; Ranga, V. Intelligent Traffic Management System for Prioritizing Emergency Vehicles in a Smart City. Int. J. Eng. Trans. B Appl. 2018, 31, 278–283. [Google Scholar] [CrossRef]
- Chen, E.H.; Röthig, P.; Zeisler, J.; Burschka, D. Investigating Low Level Features in CNN for Traffic Sign Detection and Recognition. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 325–332. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 11–18 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. CoRR 2015. Available online: http://xxx.lanl.gov/abs/1506.01497 (accessed on 12 December 2020).
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Kisantal, M.; Wojna, Z.; Murawski, J.; Naruniec, J.; Cho, K. Augmentation for small object detection. CoRR 2019. Available online: http://xxx.lanl.gov/abs/1902.07296 (accessed on 12 December 2020).
- Shrivastava, A.; Gupta, A.; Girshick, R. Training Region-Based Object Detectors With Online Hard Example Mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards Balanced Learning for Object Detection. CoRR 2019. Available online: http://xxx.lanl.gov/abs/1904.02701 (accessed on 12 December 2020).
- Wang, J.; Chen, K.; Yang, S.; Loy, C.C.; Lin, D. Region Proposal by Guided Anchoring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Gerke, M. Dense matching in high resolution oblique airborne images. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2010, 38, W4. [Google Scholar]
- Aicardi, I.; Chiabrando, F.; Grasso, N.; Lingua, A.; Noardo, F.; Spanò, A. UAV photogrammetry with oblique images: First analysis on data acquisition and processing. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 835–842. [Google Scholar] [CrossRef] [Green Version]
- Jiang, S.; Jiang, W.; Huang, W.; Yang, L. UAV-Based Oblique Photogrammetry for Outdoor Data Acquisition and Offsite Visual Inspection of Transmission Line. Remote Sens. 2017, 9, 278. [Google Scholar] [CrossRef] [Green Version]
- Wang, J.; Wang, L.; Jia, M.; He, Z.; Bi, L. Construction and optimization method of the open-pit mine DEM based on the oblique photogrammetry generated DSM. Measurement 2020, 152, 107322. [Google Scholar] [CrossRef]
- Zhou, X.; Zhang, X. Individual Tree Parameters Estimation for Plantation Forests Based on UAV Oblique Photography. IEEE Access 2020, 8, 96184–96198. [Google Scholar] [CrossRef]
- Wu, B.; Linfu, X.; Hu, H.; Zhu, Q.; Yau, E. Integration of aerial oblique imagery and terrestrial imagery for optimized 3D modeling in urban areas. ISPRS J. Photogramm. Remote Sens. 2018, 139, 119–132. [Google Scholar] [CrossRef]
- Zhu, Q.; Wang, Z.; Hu, H.; Xie, L.; Ge, X.; Zhang, Y. Leveraging photogrammetric mesh models for aerial-ground feature point matching toward integrated 3D reconstruction. ISPRS J. Photogramm. Remote Sens. 2020, 166, 26–40. [Google Scholar] [CrossRef]
- Tack, F.; Buyuksalih, G.; Goossens, R. 3D building reconstruction based on given ground plan information and surface models extracted from spaceborne imagery. ISPRS J. Photogramm. Remote Sens. 2012, 67, 52–64. [Google Scholar] [CrossRef] [Green Version]
- Wang, S.; Cai, G.; Cheng, M.; Marcato, J., Jr.; Huang, S.; Wang, Z.; Su, S.; Li, J. Robust 3D reconstruction of building surfaces from point clouds based on structural and closed constraints. ISPRS J. Photogramm. Remote Sens. 2020, 170, 29–44. [Google Scholar] [CrossRef]
- Drešček, U.; Kosmatin Fras, M.; Tekavec, J.; Lisec, A. Spatial ETL for 3D Building Modelling Based on Unmanned Aerial Vehicle Data in Semi-Urban Areas. Remote Sens. 2020, 12, 1972. [Google Scholar] [CrossRef]
- Buyukdemircioglu, M.; Kocaman, S. Reconstruction and Efficient Visualization of Heterogeneous 3D City Models. Remote Sens. 2020, 12, 2128. [Google Scholar] [CrossRef]
- Zheng, X.; Wang, F.; Li, Z. A multi-UAV cooperative route planning methodology for 3D fine-resolution building model reconstruction. ISPRS J. Photogramm. Remote Sens. 2018, 146, 483–494. [Google Scholar] [CrossRef]
- Shao, T.; Xu, W.; Zhou, K.; Wang, J.; Li, D.; Guo, B. An Interactive Approach to Semantic Modeling of Indoor Scenes with an RGBD Camera. ACM Trans. Graph. (TOG) 2012, 31. [Google Scholar] [CrossRef]
- Avetisyan, A.; Dahnert, M.; Dai, A.; Savva, M.; Chang, A.X.; Niessner, M. Scan2CAD: Learning CAD Model Alignment in RGB-D Scans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Zhang, S.H.; Zhang, S.K.; Xie, W.Y.; Luo, C.Y.; Fu, H.B. Fast 3D Indoor Scene Synthesis with Discrete and Exact Layout Pattern Extraction. arXiv 2020, arXiv:2002.00328. [Google Scholar]
- Avetisyan, A.; Dai, A.; Nießner, M. End-to-end cad model retrieval and 9dof alignment in 3d scans. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 2551–2560. [Google Scholar]
- Liang, J.; Shen, S.; Gong, J.; Liu, J.; Zhang, J. Embedding user-generated content into oblique airborne photogrammetry-based 3D city model. Int. J. Geogr. Inf. Sci. 2016, 31, 1–16. [Google Scholar] [CrossRef]
- Agarwal, S.; du Terrail, J.O.; Jurie, F. Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks. CoRR 2018. Available online: http://xxx.lanl.gov/abs/1809.03193 (accessed on 12 December 2020).
- Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. CoRR 2019. Available online: http://xxx.lanl.gov/abs/1905.05055 (accessed on 12 December 2020).
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.W.; Chen, J.; Liu, X.; Pietikäinen, M. Deep Learning for Generic Object Detection: A Survey. CoRR 2018. Available online: http://xxx.lanl.gov/abs/1809.02165 (accessed on 12 December 2020).
- Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
- Tabernik, D.; Skocaj, D. Deep Learning for Large-Scale Traffic-Sign Detection and Recognition. CoRR 2019. Available online: http://xxx.lanl.gov/abs/1904.00649 (accessed on 12 December 2020).
- Ayachi, R.; Said, Y.; Atri, M. To perform road signs recognition for autonomous vehicles using cascaded deep learning pipeline. Artif. Intell. Adv. 2019, 1, 1–10. [Google Scholar] [CrossRef]
- Liu, L.; Tang, X.; Xie, J.; Gao, X.; Zhao, W.; Mo, F.; Zhang, G. Deep-learning and Depth-map based Approach for Detection and 3D Localization of Small Traffic Signs. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020. [Google Scholar] [CrossRef]
- Doval, G.N.; Al-Kaff, A.; Beltrán, J.; Fernández, F.G.; Fernández López, G. Traffic Sign Detection and 3D Localization via Deep Convolutional Neural Networks and Stereo Vision. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 1411–1416. [Google Scholar]
- Soheilian, B.; Paparoditis, N.; Vallet, B. Detection and 3D reconstruction of traffic signs from multiple view color images. ISPRS J. Photogramm. Remote Sens. 2013, 77, 1–20. [Google Scholar] [CrossRef]
- Lin, T.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Liang, Z.; Shao, J.; Zhang, D.; Gao, L. Small Object Detection Using Deep Feature Pyramid Networks. In Advances in Multimedia Information Processing—PCM 2018; Hong, R., Cheng, W.H., Yamasaki, T., Wang, M., Ngo, C.W., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 554–564. [Google Scholar]
- Singh, B.; Davis, L.S. An Analysis of Scale Invariance in Object Detection-SNIP. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3578–3587. [Google Scholar]
- Singh, B.; Najibi, M.; Davis, L.S. SNIPER: Efficient Multi-Scale Training. In Advances in Neural Information Processing Systems 31; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; pp. 9310–9320. [Google Scholar]
- Cao, C.; Wang, B.; Zhang, W.; Zeng, X.; Yan, X.; Feng, Z.; Liu, Y.; Wu, Z. An Improved Faster R-CNN for Small Object Detection. IEEE Access 2019, 7, 106838–106846. [Google Scholar] [CrossRef]
- Bai, Y.; Zhang, Y.; Ding, M.; Ghanem, B. SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 210–226. [Google Scholar]
- Rabbi, J.; Ray, N.; Schubert, M.; Chowdhury, S.; Chao, D. Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network. Remote Sens. 2020, 12, 1432. [Google Scholar] [CrossRef]
- Pang, J.; Li, C.; Shi, J.; Xu, Z.; Feng, H. 2 -CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5512–5524. [Google Scholar] [CrossRef] [Green Version]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Oksuz, K.; Cam, B.C.; Kalkan, S.; Akbas, E. Imbalance Problems in Object Detection: A Review. Trans. Pattern Anal. Mach. Intell. (TPAMI) 2020. [Google Scholar] [CrossRef] [Green Version]
- Mo, N.; Yan, L. Improved Faster RCNN Based on Feature Amplification and Oversampling Data Augmentation for Oriented Vehicle Detection in Aerial Images. Remote Sens. 2020, 12, 2558. [Google Scholar] [CrossRef]
- Afzal, S.; Maqsood, M.; Nazir, F.; Khan, U.; Aadil, F.; Awan, K.M.; Mehmood, I.; Song, O. A Data Augmentation-Based Framework to Handle Class Imbalance Problem for Alzheimer’s Stage Detection. IEEE Access 2019, 7, 115528–115539. [Google Scholar] [CrossRef]
- Lin, T.; Goyal, P.; Girshick, R.B.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. CoRR 2017. Available online: http://xxx.lanl.gov/abs/1708.02002 (accessed on 12 December 2020).
- Zhong, Y.; Wang, J.; Peng, J.; Zhang, L. Anchor Box Optimization for Object Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020. [Google Scholar]
- Zhong, Z.; Sun, L.; Huo, Q. An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches. CoRR 2018. Available online: http://xxx.lanl.gov/abs/1804.09003 (accessed on 12 December 2020).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. CoRR 2015. Available online: http://xxx.lanl.gov/abs/1512.03385 (accessed on 12 December 2020).
- Dwibedi, D.; Misra, I.; Hebert, M. Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1310–1319. [Google Scholar]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, MA, USA, 2003. [Google Scholar]
- Lingua, A.; Noardo, F.; Spanò, A.; Sanna, S.; Matrone, F. 3D model generation using oblique images acquired by UAV. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, XLII-4/W2, 107–115. [Google Scholar] [CrossRef] [Green Version]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 22–24 June 2009; pp. 248–255. [Google Scholar]
Class | Mandatory Sign | Warning Sign | Prohibitory Sign |
---|---|---|---|
Image number | 358 | 63 | 131 |
Object number | 509 | 71 | 227 |
Method | Mandatory Sign | Warning Sign | Prohibitory Sign | mAP |
---|---|---|---|---|
Faster | 0.803 | 0.785 | 0.909 | 0.832 |
Faster with GA | 0.876 | 0.832 | 0.818 | 0.842 |
Faster with SD | 0.907 | 0.886 | 0.897 | 0.897 |
Faster with GA&SD | 0.907 | 0.908 | 0.904 | 0.906 |
Faster with BL | 0.908 | 0.952 | 0.903 | 0.921 |
Faster with BL&GA | 0.909 | 0.987 | 0.909 | 0.935 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mao, Z.; Zhang, F.; Huang, X.; Jia, X.; Gong, Y.; Zou, Q. Deep Neural Networks for Road Sign Detection and Embedded Modeling Using Oblique Aerial Images. Remote Sens. 2021, 13, 879. https://doi.org/10.3390/rs13050879
Mao Z, Zhang F, Huang X, Jia X, Gong Y, Zou Q. Deep Neural Networks for Road Sign Detection and Embedded Modeling Using Oblique Aerial Images. Remote Sensing. 2021; 13(5):879. https://doi.org/10.3390/rs13050879
Chicago/Turabian StyleMao, Zhu, Fan Zhang, Xianfeng Huang, Xiangyang Jia, Yiping Gong, and Qin Zou. 2021. "Deep Neural Networks for Road Sign Detection and Embedded Modeling Using Oblique Aerial Images" Remote Sensing 13, no. 5: 879. https://doi.org/10.3390/rs13050879
APA StyleMao, Z., Zhang, F., Huang, X., Jia, X., Gong, Y., & Zou, Q. (2021). Deep Neural Networks for Road Sign Detection and Embedded Modeling Using Oblique Aerial Images. Remote Sensing, 13(5), 879. https://doi.org/10.3390/rs13050879