Correlating Edge with Parsing for Human Parsing
Abstract
:1. Introduction
- We propose a new MCEP network that can make full use of the edge information of the human body, so that the model can better learn the fine-grained features and obtain a more complete structure of the target.
- Experiments on look into person (LIP) [12] by adding our SMP module to the mainstream network PSPnet show that the proposed SMP module can be applied to other segmentation tasks related to small-scale targets.
2. Related Work
2.1. Semanticsegmentation
2.2. Utilizing Pose for Human Parsing
2.3. Utilizing Edge for Human Parsing
3. Proposed Method
3.1. Edge Module
3.2. SMP Module
3.3. Human Parsing Module
4. Experiments
4.1. Subjective Evaluation
4.1.1. Single-Person Quantitative Analysis
4.1.2. Multi-Person Quantitative Analysis
4.2. Objective Evaluation
4.2.1. Metrics
4.2.2. Training Details
4.2.3. Experiments on Single-Person Datasets
4.2.4. Ablation Experiment
4.2.5. Experiments on Multi-Person Datasets
4.3. Discussion
4.3.1. Extended Experiment
4.3.2. Analysis of the Difference between the MCEPNet and CP-SSGNet
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zeng, D.; Huang, Y.; Bao, Q.; Zhang, J.; Su, C.; Liu, W. Neural Architecture Search for Joint Human Parsing and Pose Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 11385–11394. [Google Scholar]
- Yang, L.; Song, Q.; Wang, Z.; Liu, Z.; Xu, S.; Li, Z. Quality-aware network for human parsing. arXiv. 2022, arXiv:2103.05997. [Google Scholar] [CrossRef]
- Li, T.; Liang, Z.; Zhao, S.; Gong, J.; Shen, J. Self-learning with rectification strategy for human parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9263–9272. [Google Scholar]
- Sun, H.; Liu, X.; Xu, K.; Miao, J.; Luo, Q. Emergency vehicles audio detection and localization in autonomous driving. arXiv 2021, arXiv:2109.14797. [Google Scholar]
- Fan, J.; Xu, W.; Wu, Y.; Gong, Y. Human tracking using convolutional neural networks. IEEE Trans. Neural Netw. 2010, 21, 1610–1623. [Google Scholar] [PubMed]
- Cheng, L.; Guan, Y.; Zhu, K.; Li, Y. Recognition of human activities using machine learning methods with wearable sensors. In Proceedings of the 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 9–11 January 2017; pp. 1–7. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Ruan, T.; Liu, T.; Huang, Z.; Wei, Y.; Wei, S.; Zhao, Y. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 4814–4821. [Google Scholar]
- Gong, K.; Liang, X.; Li, Y.; Chen, Y.; Yang, M.; Lin, L. Instance-level Human Parsing via Part Grouping Network. arXiv 2018, arXiv:1808.00157. [Google Scholar] [CrossRef]
- Yu, W.-Y.; Po, L.-M.; Zhao, Y.; Zhang, Y.; Lau, K.-W. FEANet: Foreground-edge-aware network with DenseASPOC for human parsing. Image Vis. Comput. 2021, 109, 104145. [Google Scholar] [CrossRef]
- Gong, K.; Liang, X.; Zhang, D.; Shen, X.; Lin, L. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 932–940. [Google Scholar]
- Zhou, T.; Wang, W.; Konukoglu, E.; Van Gool, L. Rethinking semantic segmentation: A prototype view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 2582–2593. [Google Scholar]
- Mo, Y.; Wu, Y.; Yang, X.; Liu, F.; Liao, Y. Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 2022, 493, 626–646. [Google Scholar] [CrossRef]
- Xu, J.; De Mello, S.; Liu, S.; Byeon, W.; Breuel, T.; Kautz, J.; Wang, X. Groupvit: Semantic segmentation emerges from text supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 18134–18144. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Chen, L.-C.; Yang, Y.; Wang, J.; Xu, W.; Yuille, A.L. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3640–3649. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer; pp. 234–241. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Boston, MA, USA, 7–12 June 2015; pp. 1520–1528. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Park, J.; Woo, S.; Lee, J.-Y.; Kweon, I.S. Bam: Bottleneck attention module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Wang, L.; Li, D.; Zhu, Y.; Tian, L.; Shan, Y. Dual super-resolution learning for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3774–3783. [Google Scholar]
- Liu, Y.; Zhang, S.; Xu, J.; Yang, J.; Tai, Y.-W. An accurate and lightweight method for human body image super-resolution. IEEE Trans. Image Process. 2021, 30, 2888–2897. [Google Scholar] [CrossRef]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Zhou, T.; Wang, W.; Liu, S.; Yang, Y.; Van Gool, L. Differentiable multi-granularity human representation learning for instance-aware human semantic parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1622–1631. [Google Scholar]
- Nie, X.; Feng, J.; Yan, S. Mutual learning to adapt for joint human parsing and pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 502–517. [Google Scholar]
- Zhou, T.; Yang, Y.; Wang, W. Differentiable Multi-Granularity Human Parsing. IEEE Trans. Pattern Anal. Mach. Intelligence 2023. [Google Scholar] [CrossRef]
- Wang, W.; Zhou, T.; Qi, S.; Shen, J.; Zhu, S.-C. Hierarchical human semantic parsing with comprehensive part-relation modeling. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3508–3522. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.-C.; Barron, J.T.; Papandreou, G.; Murphy, K.; Yuille, A.L. Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4545–4554. [Google Scholar]
- Goyal, P.; Dollár, P.; Girshick, R.; Noordhuis, P.; Wesolowski, L.; Kyrola, A.; Tulloch, A.; Jia, Y.; He, K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv 2017, arXiv:1706.02677. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Luo, Y.; Zheng, Z.; Zheng, L.; Guan, T.; Yu, J.; Yang, Y. Macro-micro adversarial network for human parsing. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 418–434. [Google Scholar]
- Liang, X.; Gong, K.; Shen, X.; Lin, L. Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 871–885. [Google Scholar]
- Liu, X.; Zhang, M.; Liu, W.; Song, J.; Mei, T. Braidnet: Braiding semantics and details for accurate human parsing. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 338–346. [Google Scholar]
- Zhang, X.; Chen, Y.; Zhu, B.; Wang, J.; Tang, M. Semantic-spatial fusion network for human parsing. Neurocomputing 2020, 402, 375–383. [Google Scholar] [CrossRef]
- Zhang, Z.; Su, C.; Zheng, L.; Xie, X. Correlating edge, pose with parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8900–8909. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 2961–2969. [Google Scholar]
Method | Hat | Hair | Glove | Glass | Clot | Dress | Coat | Sock | Pant | Suit | Scarf | Skirt | Face | L-Arm | R-Arm | L-Leg | R-Leg | L-Shoe | R-Shoe | H.G | AVE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DeepLabv2 [7] | 50.48 | 65.33 | 29.98 | 19.76 | 62.44 | 30.33 | 51.03 | 40.51 | 69.00 | 22.38 | 11.29 | 20.56 | 70.11 | 49.25 | 52.88 | 42.37 | 35.78 | 33.81 | 32.89 | 84.53 | 41.64 |
Attention [17] | 58.87 | 66.78 | 23.23 | 19.48 | 63.20 | 29.63 | 49.70 | 35.23 | 66.04 | 24.73 | 12.84 | 20.41 | 70.58 | 50.17 | 54.03 | 38.35 | 37.70 | 26.20 | 27.09 | 84.00 | 42.92 |
ASSL [12] | 59.75 | 67.25 | 28.95 | 21.57 | 65.30 | 29.49 | 51.92 | 38.52 | 68.02 | 24.48 | 14.92 | 24.32 | 71.01 | 52.64 | 55.79 | 40.23 | 38.80 | 28.08 | 29.03 | 84.56 | 44.73 |
MMAN [34] | 57.66 | 65.63 | 30.07 | 20.02 | 64.15 | 28.39 | 51.98 | 41.46 | 71.03 | 23.61 | 9.65 | 23.20 | 69.54 | 55.30 | 58.13 | 51.90 | 52.17 | 38.58 | 39.05 | 84.75 | 46.81 |
JPPNet [35] | 63.55 | 70.20 | 36.16 | 23.48 | 68.15 | 31.42 | 55.65 | 66.56 | 72.19 | 28.39 | 18.76 | 25.14 | 73.36 | 61.97 | 63.88 | 58.21 | 57.99 | 44.02 | 44.09 | 86.26 | 51.37 |
CE2P [9] | 65.29 | 72.54 | 39.09 | 32.73 | 69.46 | 32.52 | 56.28 | 49.67 | 74.11 | 27.23 | 14.19 | 22.51 | 75.50 | 65.14 | 66.59 | 60.10 | 58.59 | 46.63 | 46.12 | 87.67 | 53.10 |
HRNet [26] | 68.69 | 73.31 | 40.41 | 34.27 | 70.41 | 32.42 | 57.89 | 47.31 | 74.79 | 25.01 | 25.52 | 29.77 | 76.11 | 66.11 | 67.95 | 59.77 | 60.29 | 44.59 | 45.79 | 87.55 | 54.40 |
BraidNet [36] | 66.84 | 72.04 | 42.54 | 32.14 | 69.84 | 33.74 | 57.44 | 49.04 | 74.94 | 32.44 | 19.34 | 27.24 | 74.94 | 65.54 | 67.94 | 60.24 | 59.04 | 47.44 | 47.94 | 88.04 | 54.54 |
SSF-NET [37] | 68.60 | 73.14 | 40.02 | 33.57 | 70.51 | 34.89 | 57.38 | 49.30 | 74.87 | 33.16 | 21.30 | 29.11 | 75.74 | 64.85 | 66.52 | 57.41 | 57.04 | 47.43 | 47.74 | 87.87 | 54.53 |
CP-SSGNet [38] | 66.20 | 71.56 | 41.06 | 31.09 | 70.20 | 37.74 | 57.95 | 48.4 | 75.19 | 32.37 | 23.79 | 29.23 | 74.36 | 66.53 | 68.61 | 62.80 | 62.81 | 49.03 | 49.82 | 87.77 | 55.33 |
OURS | 66.56 | 71.93 | 43.48 | 33.75 | 70.75 | 37.69 | 56.24 | 50.11 | 75.22 | 33.05 | 25.98 | 30.42 | 74.83 | 65.75 | 68.16 | 59.07 | 60.11 | 47.01 | 48.23 | 87.92 | 55.31 |
Method | Pixel Accuracy | Mean Accuracy | mIoU |
---|---|---|---|
Base | 87.37 | 63.20 | 53.10 |
Base + max-pooling one | 87.66 | 66.45 | 54.92 |
Base + max-pooling two | 88.10 | 67.41 | 55.31 |
Base + max-pooling three | 87.63 | 66.13 | 54.63 |
Base + max-pooling four | 87.82 | 66.28 | 55.01 |
Method | mIoU |
---|---|
PGN | 55.80 |
M-CE2P | 59.50 |
BraidNet + Mask r-CNN | 60.62 |
Ours | 60.70 |
Method | Glove | Glass | Dress | Suit | Scarf | Skirt | Pixel Accuracy | Mean Accuracy | MIOU |
---|---|---|---|---|---|---|---|---|---|
PSPnet | 30.57 | 21.20 | 28.60 | 23.52 | 14.00 | 23.20 | 80.50 | 57.90 | 48.90 |
PSPnet + SMP | 41.23 | 27.29 | 34.98 | 30.63 | 20.57 | 26.42 | 86.73 | 64.36 | 52.45 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gong, K.; Wang, X.; Tan, S. Correlating Edge with Parsing for Human Parsing. Electronics 2023, 12, 944. https://doi.org/10.3390/electronics12040944
Gong K, Wang X, Tan S. Correlating Edge with Parsing for Human Parsing. Electronics. 2023; 12(4):944. https://doi.org/10.3390/electronics12040944
Chicago/Turabian StyleGong, Kai, Xiuying Wang, and Shoubiao Tan. 2023. "Correlating Edge with Parsing for Human Parsing" Electronics 12, no. 4: 944. https://doi.org/10.3390/electronics12040944
APA StyleGong, K., Wang, X., & Tan, S. (2023). Correlating Edge with Parsing for Human Parsing. Electronics, 12(4), 944. https://doi.org/10.3390/electronics12040944