**6. Conclusions**

In this paper, we review recent researches on the point cloud-based joint extraction of human body. The superiority of point cloud data as well as the applications of joint estimation are all discussed in details. Different works are introduced based on three mainstream methods: (1) template-based methods; (2) feature-based methods; (3) machine learning-based methods. On this basis, we analyze and summarize the current human pose dataset with point cloud. Although a lot of research devotes to the construct the practical pose dataset of human body, there is still a lack of comprehensive ground truth datasets for human varied pose, especially the marking of human joints with different clothes. The relevant applications of point cloud-based joint estimation of human body are discussed in this paper, we found that point cloud-based method plays an important role in some emerging technologies, such as 3D reconstruction, human–computer interaction, action recognition, etc.

From the analysis above, we know that many existing methods have already accurately tracked the human body joints in real time under the indoor environment. However, joint estimation of human body ye<sup>t</sup> still faces many challenges. In our opinion, feature-based method cannot further improve the accuracy of joint detection, if it relies only on the depth features and length constraints of the joints. Therefore, combining with other data from multiple sensors has become a new breakthrough, such as RGB cameras, infrared cameras and IMU sensors. The template-based method and the machine learning-based method are currently unable to recognize the joints of any pose, because neither the template dataset nor the training set can cover all the poses. The 3D template of the human body can be constructed by some software, but there are some limitations on the fixed pose of the template, and the matching process takes a long time. At present, additional information can be leveraged to shorten the search time, it is also possible to build models with better resolution. They will still be interesting research directions in the coming future. In order to improve the detection accuracy, the machine learning-based network should output the local and global features between the points in real time when the input is an orderless point cloud. In addition, there are still some unresolved challenges and gaps between research and practical applications in the entire research field, such as self-occlusion and multi-person detection. However, with the deeper research of machine learning technology, pose estimation of the human body will also be faster and more accurate. Effective networks and sufficient train data are key elements in machine learning-based methods; it is believed that there will be more room for improvement by many scientific researchers investing time in the future.

**Author Contributions:** Conceptualization, Y.Y.; methodology, T.X., D.A., and Y.Y.; validation, T.X., D.A., and Y.J.; formal analysis, T.X., D.A., and Y.Y.; investigation, T.X., D.A., and Y.J.; data curation, T.X., D.A., and Y.J.; writing—original draft preparation, T.X., D.A.; writing—review and editing, Y.Y.; visualization, D.A., T.X.; supervision, Y.Y.; project administration, Y.Y.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFB0703500, in part by the Key Technologies Research and Development Program of Tianjin under Grant 20YFZCGX00440, Tianjin Research Innovation Project for Postgraduate Students (Artificial Intelligence Special Project) under Grant 2020YJSZXB09, and in part by the Fundamental Research Funds for the Central Universities, Nankai University, under Grant 63201178 and Grant 63191511.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** No new data were created or analyzed in this study. Data sharing is not applicable to this article.

**Acknowledgments:** Valuable feedback and suggestions for improvements of this paper from Qiang Wang and Zhongqi Pan are gratefully acknowledged.

**Conflicts of Interest:** The authors declare no conflict of interest.
