IRDC-Net: Lightweight Semantic Segmentation Network Based on Monocular Camera for Mobile Robot Navigation
Abstract
:1. Introduction
- -
- Integration of a lightweight FCN decoder (with multi-scale fusion) and MobilenetV2 encoder for efficient segmentation.
- -
- Further enhance performance and computational efficiency by using the Adam optimizer and quantization. In addition, data preprocessing is also improved with the use of appropriate filters.
- -
- Replacement of the Binary Cross Entropy loss function with the Balanced Cross-Entropy loss function for better handling of unbalanced datasets.
- -
- The proposed model is compared with a number of baselines across several datasets, followed by a practical evaluation with a mobile robot.
2. Related Works
2.1. Fully Convolutional Networks (FCNs)
2.2. SegNet
2.3. ENet
2.4. U-Net
3. Lightweight Semantic Segmentation FCN-MobilenetV2
3.1. Network Architecture
- -
- 1 conv2d layer (1280 × 1000 × 7 × 7): Synthesized feature output from the classifier class of MobileNet.
- -
- 1 class conv2d (1280, num_classes, kernel_size = 1): This layer condenses the model’s features.
- -
- 1 class convTranpos2D (numclass, numclass): Scale output of the model.
- -
- 1 Class ConvTranpos2D: This scales the output of the model. Build a model based on FCN network architecture, help upscaling output equal to input size, and classify each image pixel into separate classes.
3.2. Model Training
- -
- Unbalanced Data Processing: In binary classification problems, the BCE function addresses the issue of unbalanced data. It ensures that if the sample ratio between the two classes is unequal, the smaller sample will be considered more significant. This prevents the model from being biased towards the larger sample.
- -
- Error balancing: The BCE function takes into account error levels in both classes. This causes the model to strive to minimize the mean error for both classes, as opposed to concentrating excessively on the minority class.
- -
- Increased accuracy: By managing unbalanced data and equalizing errors, the BCE function can improve model accuracy in binary classification problems. It serves to balance class-based decisions and minimizes the impact of minority information.
3.3. Quantization
- -
- Precision Calibration: During training, FP32 (Floating Point 32) parameters and activations will be converted to FP16. Optimizing it will decrease stagnation and increase inference speed, but at the expense of a slight reduction in model accuracy. In real-time recognition, accuracy and inference speed must sometimes be compromised.
- -
- Layer and Tensor Fusion: Layer and tensor merge are performed to optimize GPU memory and bandwidth by merging nodes vertically, horizontally, or both. Vertical merging involves joining successive kernel processes, while horizontal merging involves merging layers with the same layer size and input but differing weights into a single layer.
4. Experimental Results and Discussion
4.1. Quantitative Results
4.2. Mobile Robot’s Frontal View
4.3. Practical Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Murat, L.; Ertugrul, C.; Faruk, U.; Ibrahim, U.; Salih, I.A. Initial Results of Testing a Multilayer Laser Scanner in a Collision Avoidance System for Light Rail Vehicles. Appl. Sci. 2018, 8, 475. [Google Scholar]
- Abukhalil, T.; Alksasbeh, M.; Alqaralleh, B.; Abukaraki, A. Robot navigation system using laser and monocular camera. J. Theor. Appl. Inf. Technol. 2020, 98, 714–724. [Google Scholar]
- Wang, W.C.; Ng, C.Y.; Chen, R. Vision-Aided Path Planning Using Low-Cost Gene Encoding for a Mobile Robot. Intell. Automat. Soft Comput. 2021, 32, 991–1007. [Google Scholar] [CrossRef]
- Maulana, I.; Rasdina, A.; Priramadhi, R.A. Lidar applications for Mapping and Robot Navigation on Closed Environment. J. Meas. Electron. Commun. Syst. 2018, 4, 767–782. [Google Scholar] [CrossRef]
- Damodaran, D.; Mozaffari, S.; Alirezaee, S.; Ahamed, M.J. Experimental Analysis of the Behavior of Mirror-like Objects in LiDAR-Based Robot Navigation. Appl. Sci. 2023, 13, 2908. [Google Scholar] [CrossRef]
- Al-Mallah, M.; Ali, M.; Al-Khawaldeh, M. Obstacles Avoidance for Mobile Robot Using Type-2 Fuzzy Logic Controller. Robotics 2022, 11, 130. [Google Scholar] [CrossRef]
- Dang, T.V.; Bui, N.T. Multi-Scale Fully Convolutional Network-Based Semantic Segmentation for Mobile Robot Navigation. Electronics 2023, 12, 533. [Google Scholar] [CrossRef]
- Zhao, C.Q.; Sun, Q.Y.; Zhang, C.Z.; Tang, Y.; Qian, F. Monocular depth estimation based on deep learning: An overview. Sci. China Technol. Sci. 2020, 63, 1612–1627. [Google Scholar] [CrossRef]
- Dong, Q. Path Planning Algorithm Based on Visual Image Feature Extraction for Mobile Robots. Mob. Inf. Syst. 2022, 2022, 4094472. [Google Scholar] [CrossRef]
- Dang, T.V.; Bui, N.T. Obstacle Avoidance Strategy for Mobile Robot Based on Monocular Camera. Electronics 2023, 12, 1932. [Google Scholar] [CrossRef]
- Pan, X.; Gao, L.; Marinoni, A.; Zhang, B.; Yang, F.; Gamba, P. Semantic Labeling of High Resolution Aerial Imagery and LiDAR Data with Fine Segmentation Network. Remote Sens. 2018, 10, 743. [Google Scholar] [CrossRef] [Green Version]
- Peng, C.; Li, Y.; Jiao, L.; Chen, Y.; Shang, R. Densely Based Multi-Scale and Multi-Modal Fully Convolutional Networks for High-Resolution Remote-Sensing Image Semantic Segmentation. IEEE Trans. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2612–2626. [Google Scholar] [CrossRef]
- Wang, Y.; Sun, Z.; Zhao, W. Encoder- and Decoder-Based Networks Using Multi-scale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation. IEEE Geosci. Remote Sens. Lett. 2021, 18, 1159–1163. [Google Scholar] [CrossRef]
- Pastorino, M.; Moser, G.; Serpico, S.B.; Zerubia, J. Semantic Segmentation of Remote-Sensing Images through Fully Convolutional Neural Networks and Hierarchical Probabilistic Graphical Models. IEEE Geosci. Remote Sens. 2022, 60, 5407116. [Google Scholar] [CrossRef]
- Lyu, C.; Hu, G.; Wang, D. HRED-Net: High-Resolution Encoder-Decoder Network for Fine-Grained Image Segmentation. IEEE Access 2020, 8, 38210–38220. [Google Scholar] [CrossRef]
- Rusli, L.; Nurhalim, B.; Rusyadi, R. Vision-based vanishing point detection of autonomous navigation of mobile robot for outdoor applications. J. Mechatron. Elect. Power Veh. Technol. 2021, 12, 117–125. [Google Scholar] [CrossRef]
- Minaee, S.; Boykov, Y.Y.; Porikli, F.; Plaza, A.J.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3523–3542. [Google Scholar] [CrossRef]
- Shelhamer, V.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 1–12. [Google Scholar] [CrossRef]
- Wang, C.; Zhao, Z.; Ren, Q.; Xu, Y.; Yu, Y. Dense U-Net based on patch-based learning for retinal vessel segmentation. Entropy 2019, 21, 168. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; Yu, K.; Hugonot, J.; Fua, P.; Salzmann, M. Recurrent U-Net for resource-constrained segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2142–2151. [Google Scholar]
- Agus, E.M.; Bagas, Y.S.; Yuda, M.; Hanung, A.N.; Zaidah, I. Convolutional Neural Network featuring VGG-16 Model for Glioma Classification. Int. J. Inform. Vis. 2022, 6, 660–666. [Google Scholar] [CrossRef]
- Alfred Daniel, J.; Chandru Vignesh, C.; Muthu, B.A.; Senthil Kumar, R.; Sivaparthipan, C.B.; Marin, C.E.M. Fully convolutional neural networks for LIDAR-camera fusion for pedestrian detection in autonomous vehicle. Multimed. Tools Appl. 2023, 82, 25107–25130. [Google Scholar] [CrossRef]
- Cruz, R.; Silva, D.T.; Goncalves, T.; Carneiro, D.; Cardoso, J.S. Two-Stage Framework for Faster Semantic Segmentation. Sensors 2023, 23, 3092. [Google Scholar] [CrossRef]
- Kong, X.; Xia, S.; Liu, N.; Wei, M. GADA-SegNet: Gated attentive domain adaptation network for semantic segmentation of LiDAR point clouds. Vis. Comput. 2023, 39, 2471–2481. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Alex, K.; Roberto, C. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Wang, Y. Remote sensing image semantic segmentation network based on ENet. J. Eng. 2022, 12, 1219–1227. [Google Scholar] [CrossRef]
- Qin, Y.; Tang, Q.; Xin, J.; Yang, C.; Zhang, Z.; Yang, X. A Rapid Identification Technique of Moving Loads Based on MobileNetV2 and Transfer Learning. Buildings 2023, 13, 572. [Google Scholar] [CrossRef]
- Wang, P.; Luo, F.; Wang, L.; Li, C.; Niu, Q.; Li, H. S-ResNet: An improved ResNet neural model capable of the identification of small insects. Front. Plant Sci. 2022, 13, 5241. [Google Scholar]
- Gao, L.; Huang, Y.; Zhang, X.; Liu, Q.; Chen, Z. Prediction of Prospecting Target Based on ResNet Convolutional Neural Network. Appl. Sci. 2022, 12, 11433. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Hassan, A.; Siva, M.; Lars, M.; Andreas, G.; Carsten, R. Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes. Int. J. Comput. Vis. (IJCV) 2018, 162, 961–972. [Google Scholar]
- Kirill, K.; Konstantin, C.; Anton, F.; Artyom, F. Autonomous Wheels And Camera Calibration In Duckietown Project. Procedia Comput. Sci. 2021, 186, 169–176. [Google Scholar]
- Quentin, J.; Liu, X.; Murata, T. Balanced softmax cross-entropy for incremental learning with and without memory. Comput. Vis. Image Underst. 2022, 225, 103582. [Google Scholar]
- Liu, M.; Yao, D.; Liu, Z.; Guo, J.; Chen, J. An Improved Adam Optimization Algorithm Combining Adaptive Coefficients and Composite Gradients Based on Randomized Block Coordinate Descent. Comput. Intell. Neurosci. 2023, 5, 4765891. [Google Scholar] [CrossRef]
- Kostková, J.; Flusser, J.; Lébl, M.; Pedone, M. Handling Gaussian Blur without Deconvolution. Pattern Recognit. 2020, 103, 107264. [Google Scholar] [CrossRef]
- Aghajarian, M.; McInroy, J.E.; Muknahallipatna, S. Deep learning algorithm for Gaussian noise removal from images. J. Electron. Imag. 2020, 29, 1. [Google Scholar] [CrossRef]
- Tsubota, K.; Aizawa, K. Comprehensive Comparisons of Uniform Quantization in Deep Image Compression. IEEE Access 2023, 11, 4455–4465. [Google Scholar] [CrossRef]
- Liang, X.; Hongfei, Z.; Eric, X. Dynamic-structured semantic propagation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 755–761. [Google Scholar]
- Shaw, A.; Hunter, D.; Landola, F.; Sidhu, S. Squeezenas: Fast neural architecture search for faster semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 2014–2024. [Google Scholar]
- Tian, Y.; Xie, L.; Zhang, X.; Fang, J.; Xu, H.; Huang, W.; Jiao, J.; Tian, Q.; Ye, Q. Semantic-Aware Generation for Self-Supervised Visual Representation Learning. arXiv 2021, arXiv:2111.13163. [Google Scholar]
- Ochs, M.; Kretz, A.; Mester, R. SDNet: Semantic Guided Depth Estimation Network. In Proceedings of the 41st DAGM German Conference, DAGM GCPR 2019, Dortmund, Germany, 10–13 September 2019; pp. 288–302. [Google Scholar]
- Singha, T.; Pham, D.; Krishna, A. A real-time semantic segmentation model using iteratively shared features in multiple sub-encoders. Pattern Recognit. 2023, 140, 109557. [Google Scholar] [CrossRef]
- Kong, S.; Fowlkes, C. Pixel-wise Attentional Gating for Parsimonious Pixel Labeling. arXiv 2018, arXiv:1805.01556. [Google Scholar]
- Marchand, E.; Uchiyama, H.; Spindler, F. Pose Estimation for Augmented Reality: A Hands-On Survey. IEEE Trans. Vis. Comput. Graph. 2016, 22, 2633–2651. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hartley, R.; Xisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
Model | Validated mIoU |
---|---|
DSSPN [39] | 77.8% |
SqueezeNAS [40] | 72.4% |
SaGe [41] | 76.9% |
IRDC-Net: Lightweight Segmentation | 78.1% |
Model | Validated mIoU |
---|---|
SDNet [42] | 79.62% |
SFRSeg [43] | 77.91% |
APMoE seg ROB [44] | 78.11% |
IRDC-Net: Lightweight Segmentation | 81.11% |
Model | Accuracy | Validated mIoU |
---|---|---|
Binary Segmentation FCN-VGG 16 [7] | 97.1% | 71.8% |
IRDC-Net: Lightweight Segmentation | 98.3% | 74.2% |
Frontal View of Mobile Robot | Steering Angle Changing (Rad) | Accuracy (%) |
---|---|---|
When rotating around X axis | 0.00 | 100 |
0.01 | 96.5 | |
0.02 | 93.2 | |
0.03 | 91.4 | |
0.04 | 88.6 | |
0.05 | 85.3 | |
0.07 | 82.5 | |
0.09 | 81.1 | |
When rotating around Y axis | 0.00 | 100 |
0.03 | 99 | |
0.05 | 94.6 | |
0.07 | 92.4 | |
0.09 | 90.2 | |
0.1 | 87.8 | |
0.12 | 83.2 | |
0.15 | 80.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dang, T.-V.; Tran, D.-M.-C.; Tan, P.X. IRDC-Net: Lightweight Semantic Segmentation Network Based on Monocular Camera for Mobile Robot Navigation. Sensors 2023, 23, 6907. https://doi.org/10.3390/s23156907
Dang T-V, Tran D-M-C, Tan PX. IRDC-Net: Lightweight Semantic Segmentation Network Based on Monocular Camera for Mobile Robot Navigation. Sensors. 2023; 23(15):6907. https://doi.org/10.3390/s23156907
Chicago/Turabian StyleDang, Thai-Viet, Dinh-Manh-Cuong Tran, and Phan Xuan Tan. 2023. "IRDC-Net: Lightweight Semantic Segmentation Network Based on Monocular Camera for Mobile Robot Navigation" Sensors 23, no. 15: 6907. https://doi.org/10.3390/s23156907
APA StyleDang, T. -V., Tran, D. -M. -C., & Tan, P. X. (2023). IRDC-Net: Lightweight Semantic Segmentation Network Based on Monocular Camera for Mobile Robot Navigation. Sensors, 23(15), 6907. https://doi.org/10.3390/s23156907