Pallet Recognition with Multi-Task Learning for Automated Guided Vehicles
Abstract
:1. Introduction
2. The Proposed Method
2.1. Labels
2.2. Multi-Task Learning
3. Experiments
3.1. Datasets
3.2. Evaluation of the Performance of Pallet Recognition
3.3. Soft Parameter Sharing
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Garibotto, G.; Masciangelo, S.; Bassino, P.; Ilic, M. Computer vision control of an intelligent forklift truck. In Proceedings of the Conference on Intelligent Transportation Systems, Boston, MA, USA, 12 November 1997; pp. 589–594. [Google Scholar]
- Seelinger, M.; Yoder, J.D. Automatic pallet engagment by a vision guided forklift. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 4068–4073. [Google Scholar]
- Chen, G.; Peng, R.; Wang, Z.; Zhao, W. Pallet recognition and localization method for vision guided forklift. In Proceedings of the 2012 8th International Conference on Wireless Communications, Networking and Mobile Computing, Shanghai, China, 21–23 September 2012; pp. 1–4. [Google Scholar]
- Varga, R.; Nedevschi, S. Robust Pallet Detection for Automated Logistics Operations. In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications—Volume 4: VISIGRAPP (4: VISAPP), Rome, Italy, 27–29 February 2016; pp. 470–477. [Google Scholar]
- Xiao, J.; Lu, H.; Zhang, L.; Zhang, J. Pallet recognition and localization using an RGB-D camera. Int. J. Adv. Robot. Syst. 2017, 14, 1729881417737799. [Google Scholar] [CrossRef]
- Syu, J.L.; Li, H.T.; Chiang, J.S.; Hsia, C.H.; Wu, P.H.; Hsieh, C.F.; Li, S.A. A computer vision assisted system for autonomous forklift vehicles in real factory environment. Multimed. Tools Appl. 2017, 76, 18387–18407. [Google Scholar] [CrossRef]
- Li, T.; Huang, B.; Li, C.; Huang, M. Application of convolution neural network object detection algorithm in logistics warehouse. J. Eng. 2019, 2019, 9053–9058. [Google Scholar] [CrossRef]
- Zaccaria, M.; Monica, R.; Aleotti, J. A Comparison of Deep Learning Models for Pallet Detection in Industrial Warehouses. In Proceedings of the 2020 IEEE 16th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 3–5 September 2020; pp. 417–422. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Ruder, S. An overview of multi-task learning in deep neural networks. arXiv 2017, arXiv:1706.05098. [Google Scholar]
- Zhang, Y.; Yang, Q. A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 2021. [Google Scholar] [CrossRef]
- Bakker, B.; Heskes, T. Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 2003, 4, 83–99. [Google Scholar]
- Yang, Y.; Hospedales, T.M. Trace norm regularised deep multi-task learning. arXiv 2016, arXiv:1606.04038. [Google Scholar]
- Bilen, H.; Vedaldi, A. Integrated Perception with Recurrent Multi-Task Neural Networks. arXiv 2016, arXiv:1606.01735. [Google Scholar]
- Misra, I.; Shrivastava, A.; Gupta, A.; Hebert, M. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3994–4003. [Google Scholar]
- Kokkinos, I. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6129–6138. [Google Scholar]
- Rudd, E.M.; Günther, M.; Boult, T.E. Moon: A mixed objective optimization network for the recognition of facial attributes. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 19–35. [Google Scholar]
- Huang, J.T.; Li, J.; Yu, D.; Deng, L.; Gong, Y. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 7304–7308. [Google Scholar]
- Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 160–167. [Google Scholar]
- Dong, D.; Wu, H.; He, W.; Yu, D.; Wang, H. Multi-task learning for multiple language translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; pp. 1723–1732. [Google Scholar]
- Standley, T.; Zamir, A.; Chen, D.; Guibas, L.; Malik, J.; Savarese, S. Which tasks should be learned together in multi-task learning? In Proceedings of the International Conference on Machine Learning. PMLR, Virtual, 13–18 July 2020; pp. 9120–9132. [Google Scholar]
- Baxter, J. A Bayesian/information theoretic model of learning to learn via multiple task sampling. Mach. Learn. 1997, 28, 7–39. [Google Scholar] [CrossRef]
Architectures | Number of Layers | Number of Parameters | ||
---|---|---|---|---|
CNN | Classifier | Total | ||
AlexNet | 8 | 2.470 M | 54.580 M | 57.050 M |
VGG11 | 11 | 9.226 M | 119.591 M | 128.817 M |
ResNet-18 | 18 | 11.186 M | 0.005 M | 11.191 M |
ResNet-50 | 50 | 23.531 M | 0.022 M | 23.553 M |
ResNet-101 | 101 | 42.500 M | 0.022 M | 42.522 M |
Architectures | Number of Parameters | ||
---|---|---|---|
Two Single Models | Multi-Task Model (Proposed) | Ratio (%) | |
AlexNet | 114.08 M | 111.61 M | 97.83 |
VGG11 | 257.62 M | 248.39 M | 96.41 |
ResNet-18 | 22.37 M | 11.19 M | 50.02 |
ResNet-50 | 47.05M | 23.54 M | 50.03 |
ResNet-101 | 85.04 M | 42.54 M | 50.02 |
Multi-Task Model (VGG11) | |||||||||
---|---|---|---|---|---|---|---|---|---|
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | |
Exact match ratio | 0.65 (0.08) | 0.68 (0.06) | 0.67 (0.06) | 0.70 (0.09) | 0.70 (0.07) | 0.79 (0.08) | 0.77 (0.07) | 0.72 (0.08) | 0.60 (0.09) |
Training Dataset | Validation Dataset | Test Dataset |
---|---|---|
270 (Default) | 38 | 77 |
540 (Default + light on) | 38 | 77 |
540 (Default + flipped) | 38 | 77 |
1080 (Default + light on + flipped) | 38 | 77 |
Methods | Architectures | ||||
---|---|---|---|---|---|
AlexNet | VGG11 | ResNet-18 | ResNet-50 | ResNet-101 | |
Single-task | 0.58 (0.09) | 0.67 (0.08) | 0.75 (0.07) | 0.70 (0.08) | 0.68 (0.05) |
Multi-task (proposed) | 0.71 (0.08) | 0.79 (0.08) | 0.81 (0.06) | 0.81 (0.07) | 0.77 (0.11) |
Methods | Architectures | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
AlexNet | VGG11 | ResNet-18 | ResNet-50 | ResNet-101 | ||||||
Task 1 | Task 2 | Task 1 | Task 2 | Task 1 | Task 2 | Task 1 | Task 2 | Task 1 | Task 2 | |
Single-task | 0.95 (0.02) | 0.61 (0.09) | 0.97 (0.02) | 0.69 (0.08) | 0.97 (0.03) | 0.78 (0.06) | 0.97 (0.02) | 0.72 (0.07) | 0.96 (0.02) | 0.71 (0.06) |
Multi-task (proposed) | 0.93 (0.03) | 0.75 (0.08) | 0.95 (0.03) | 0.83 (0.06) | 0.96 (0.02) | 0.84 (0.06) | 0.96 (0.03) | 0.84 (0.06) | 0.97 (0.02) | 0.79 (0.11) |
Methods | Training Dataset Number of Samples | Architectures | ||||
---|---|---|---|---|---|---|
AlexNet | VGG11 | ResNet-18 | ResNet-50 | ResNet-101 | ||
Default 270 | 0.71 (0.08) | 0.79 (0.08) | 0.81 (0.06) | 0.81 (0.07) | 0.77 (0.11) | |
Default + light on 540 | 0.93 (0.03) | 0.93 (0.04) | 0.94 (0.04) | 0.93 (0.04) | 0.94 (0.03) | |
Multi-task (proposed) | Default + flipped 540 | 0.72 (0.09) | 0.75 (0.06) | 0.86 (0.06) | 0.81 (0.06) | 0.81 (0.07) |
Default + light on + flipped 1080 | 0.94 (0.03) | 0.93 (0.03) | 0.95 (0.03) | 0.95 (0.03) | 0.96 (0.03) |
Training Dataset Number of Samples | Methods | Architectures | |||||
---|---|---|---|---|---|---|---|
ResNet-18 | ResNet-50 | ResNet101 | |||||
Task 1 (degree) | Task 2 (cm) | Task 1 (degree) | Task 2 (cm) | Task 1 (degree) | Task 2 (cm) | ||
Default 270 | Single-task | 0.13 (0.23) | 1.50 (1.04) | 0.09 (0.12) | 2.61 (1.98) | 0.15 (0.23) | 2.44 (1.45) |
Multi-task | 0.10 (0.11) | 0.52 (0.24) | 0.12 (0.14) | 0.50 (0.39) | 0.06 (0.07) | 0.66 (0.51) | |
Default + light on 540 | Single-task | 0.04 (0.09) | 1.37 (1.00) | 0.12 (0.16) | 2.28 (1.93) | 0.20 (0.43) | 2.32 (1.87) |
Multi-task | 0.04 (0.09) | 0.11 (0.14) | 0.08 (0.11) | 0.17 (0.21) | 0.02 (0.05) | 0.10 (0.17) | |
Default + flipped 540 | Single-task | 0.38 (0.38) | 4.64 (3.43) | 0.40 (0.31) | 5.22 (1.57) | 0.51 (0.27) | 6.40 (2.97) |
Multi-task | 0.37 (0.21) | 0.92 (0.40) | 0.50 (0.28) | 1.62 (0.63) | 0.45 (0.24) | 1.87 (1.10) | |
Default + light on + flipped 1080 | Single-task | 0.05 (0.13) | 0.59 (0.65) | 0.02 (0.06) | 1.10 (0.81) | 0.03 (0.06) | 1.31 (1.09) |
Multi-task | 0.01 (0.03) | 0.09 (0.09) | 0.02 (0.04) | 0.07 (0.10) | 0.01 (0.03) | 0.09 (0.13) |
Methods | Training Dataset Number of Samples | Architectures | ||
---|---|---|---|---|
ResNet-18 | ResNet-50 | ResNet-101 | ||
Single-task | 0.88 (0.05) | 0.80 (0.04) | 0.81 (0.06) | |
Soft parameter sharing | Default + light on + flipped 1080 | 0.88 (0.06) | 0.85 (0.04) | 0.85 (0.04) |
Hard parameter sharing (proposed) | 0.95 (0.03) | 0.95 (0.03) | 0.96 (0.03) |
Methods | Training Dataset | Architectures | |||||
---|---|---|---|---|---|---|---|
ResNet-18 | ResNet-50 | ResNet-101 | |||||
Task 1 | Task 2 | Task 1 | Task 2 | Task 1 | Task 2 | ||
Single-task | 0.99 (0.02) | 0.88 (0.05) | 0.98 (0.01) | 0.81 (0.04) | 0.99 (0.01) | 0.82 (0.06) | |
Soft parameter sharing | Default + light on + flipped 1080 | 0.99 (0.01) | 0.89 (0.05) | 0.98 (0.02) | 0.87 (0.04) | 0.99 (0.01) | 0.87 (0.04) |
Hard parameter sharing (proposed) | 0.99 (0.01) | 0.95 (0.02) | 0.98 (0.01) | 0.95 (0.03) | 0.99 (0.01) | 0.96 (0.03) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mok, C.; Baek, I.; Cho, Y.S.; Kim, Y.; Kim, S.B. Pallet Recognition with Multi-Task Learning for Automated Guided Vehicles. Appl. Sci. 2021, 11, 11808. https://doi.org/10.3390/app112411808
Mok C, Baek I, Cho YS, Kim Y, Kim SB. Pallet Recognition with Multi-Task Learning for Automated Guided Vehicles. Applied Sciences. 2021; 11(24):11808. https://doi.org/10.3390/app112411808
Chicago/Turabian StyleMok, Chunghyup, Insung Baek, Yoon Sang Cho, Younghoon Kim, and Seoung Bum Kim. 2021. "Pallet Recognition with Multi-Task Learning for Automated Guided Vehicles" Applied Sciences 11, no. 24: 11808. https://doi.org/10.3390/app112411808
APA StyleMok, C., Baek, I., Cho, Y. S., Kim, Y., & Kim, S. B. (2021). Pallet Recognition with Multi-Task Learning for Automated Guided Vehicles. Applied Sciences, 11(24), 11808. https://doi.org/10.3390/app112411808