Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Motion Capture in Mixed-Reality Applications: A Deep Denoising Approach

Virtual Worlds 2024, 3(1), 135-156; https://doi.org/10.3390/virtualworlds3010007

by André Correia Gonçalves¹, Rui Jesus^2,*

and Pedro Mendes Jorge²

Reviewer 1: Anonymous

Reviewer 2:

Jerzy Balicki

Reviewer 3: Anonymous

Virtual Worlds 2024, 3(1), 135-156; https://doi.org/10.3390/virtualworlds3010007

Submission received: 20 November 2023 / Revised: 22 December 2023 / Accepted: 20 February 2024 / Published: 11 March 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Summary The article focuses on the advantages of utilizing Deep Neural Networks (DNN) for motion capture and error correction, particularly in the context of Microsoft Kinect sensors or similar devices. It highlights the significance of error correction in enhancing the accuracy of capturing human movements for applications like video games and film production. The use of a DNN demonstrates a sophisticated approach, leveraging machine learning to address motion capture errors by learning complex patterns from a pre-processed dataset provided by CMU Graphics Lab. The incorporation of a temporal filter further refines motion capture quality, addressing noise issues. The implementation in Python with TensorFlow API and Unity game engine ensures flexibility and accessibility. The evaluation involves objective metrics, including Mean Absolute Error (MAE), and qualitative user feedback from 12 participants. The article conducts thorough testing with CMU and Kinect poses, identifying and assessing various noise scenarios and analyzing different training conditions. Acknowledging real-world challenges, such as differences between CMU and Kinect skeletons, the article proposes strategies to enhance the model's adaptability in handling real-world scenarios.

Here are some considerations regarding the article:

The related work discusses various techniques and models, but there might be a need to address the real-time processing aspect, especially if the application involves interactive environments like video games.

Considerations related to latency in motion capture correction can significantly impact user experience. The Kinect and CMU Graphics Lab Skeletons section provides a comprehensive description of the data structures and processes but might benefit from addressing the accuracy of these representations and potential limitations in capturing certain movements or body configurations!

In the Network Implementation section, the writer can explain how cross-validation is implemented, the choice of the number of folds (6 in this case), and how it contributes to a more accurate evaluation of the model's performance.

In the result section, the writer provides a detailed interpretation of the Mean Absolute Error (MAE) metric used to evaluate the model's performance. Explain how this metric relates to the accuracy of the predicted poses.

Also, in the Noise Reduction part, the writer explain the effectiveness of the temporal filter in reducing noise, especially jitter, and its role in enhancing the overall quality of the poses.

The writer can discuss any trade-offs or challenges associated with the filter. About the unseen movements or scenarios, the writer can discuss how well the model generalizes to unseen movements or scenarios would provide a more comprehensive evaluation of its applicability.

In addition, there is no mention of any Ethical considerations. Please discuss any ethical considerations related to the use of pose-capturing technology, especially in the context of Mixed Reality. Considerations related to privacy and potential misuse should be addressed, if even just briefly.

Regarding the 12 participants- the authors must provide justification for the low number. With such a low number, the statistical power is quite weak to justify the findings. This should be explained in the article.

Overall - The text is well-structured and thorough, offering a comprehensive exploration of Kinect poses through deep neural networks. Positive aspects include clear and organized content, facilitating an easy understanding of the research flow. The methodology is detailed, covering data collection, model implementation, and evaluation, ensuring a thorough comprehension for readers. The technical content delves deep into various aspects, including data preprocessing, neural network architecture, noise simulation, and evaluation methods. Additionally, the inclusion of visual aids contributes to the clarity of complex processes, enhancing accessibility for readers. After editing and adding considerations to the article, It can be a good article for publication.

Comments on the Quality of English Language

English is mostly fine - just a proof reading needed.

Author Response

We would like to thank the reviewer for their thorough analysis of the work and for the comments made to help us improve our article. The response to the comments can be found in the attached pdf.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

I must admit that work on the combination of artificial intelligence and games, or perhaps more virtual and augmented reality, is very interesting. This manuscript also has many advantages. What is surprising is the attempt to solve a non-trivial problem, which is motion capture in mixed reality applications. It's even nicely named: "a deep denoising approach." Using the movements of an actor allows you to design more realistic animations in a short amount of time. The authors proposed to support this approach by correcting the motion capture errors. They are considering Microsoft Kinect sensor or ResNet network, trained with a pre-processed dataset of poses offered by CMU (Carnegie Mellon University) Graphics Lab. The Python environment is used with the Tensorflow API, which supports the Unity game engine to visualize and interact with the obtained skeletons. The results were evaluated with a set of metrics and with the feedback of participants through a questionnaire that is attached. The entire manuscript is clearly written and quick to read. Well, the authors did not avoid minor shortcomings.

1. There are several versions of ResNet. What are we dealing with here? It is worth explaining all abbreviations and acronyms. For many readers who are not familiar with machine models, the abbreviation ReLu may be unclear.

2. If ResNet was used, maybe better results would be achieved by using other convolutional networks (from manufacturers or our own)? Have such experiments been carried out?

3. The role of LSTM networks, which are suitable for time series analysis, is not clear. This is a different application than convolutional networks such as ResNet.

4. Literature is a bit "discreet". There is no broader perspective on this important problem. please expand your sources. It is definitely worth adding an interesting item from Virtual Worlds.

To sum up, the manuscript will be suitable for publication after the authors make the necessary corrections in accordance with the above four points.

Comments on the Quality of English Language

Line 386: tensorflow =>TensorFlow

Line 515: Resnet => Resnet

Line 516: ResnetRot => ResNetRot

Line 536 problems,in => problems, in

Author Response

We would like to thank the reviewer for their thorough analysis of the work and for the comments made to help us improve our article. The response to the comments can be found in the attached pdf.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper aimed at proposing a deep learning-based approach to correct the motion capture errors from the Microsoft Kinect sensor.

Suggestions and questions:

1. Abstract: 'This project presents a solution...' - is it a project or a study/work/article/paper?

2. Abstract: 'The results were evaluated with a set of metrics...' - if you do not provide the metrics, remove the sentence, because it is not informative in the current form.

3. Information related to "An approach identical to [6] is followed..." is excessively repeated in the manuscript.

4. Materials and methods was fragmented in three sections (3, 4 and 5), which does not facilitate comprehension. Also, methods applied in '6.3. Evaluation with Users' are only presented in the results. My suggestion is to merge them into the Materials and methods section.

5. Consider 'The training set will be divided into 6 k-folds and approximately 10% of the data will be used for validation.'; Approximately 10% or 16,66%? Also, why 6 folds? 5 is normally used.

6. In '6.2. Kinect data', I could not find a clear conclusion. Can I conclude that the proposed model does not smooth/remove occlusions or displacements? So, what are the advantages of the developed model?

7. Section '7. Conclusions and Future Work' presents not only a conclusion, but also a discussion on the results.

8. Not mandatory, but it would improve the manuscript if the authors could read again the text looking for repeated/unnecessary information. The manuscript is verbose in some parts.

9. Supplementary Material should be: (1) cited in the manuscript; and (2) completely translated from Portuguese to English.

Specific comments:

- DNN (Deep Neural Network) -> Deep Neural Network (DNN) - idem for all abbreviations (e.g., CMU, API, GPU, etc.)

- sets of data -> datasets

Author Response

We would like to thank the reviewer for their thorough analysis of the work and for the comments made to help us improve our article. The response to the comments can be found in the attached pdf.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for your thorough response to my comments. I appreciate the efforts you and your team have made to address the concerns raised in my initial review.

After carefully reviewing the revisions made to the manuscript, I am pleased to note that the changes have significantly improved the clarity and completeness of the article. The additional explanations and clarifications provided in response to each comment have effectively addressed the issues I raised, ranging from real-time processing considerations to ethical concerns and participant justifications.

Considering the comprehensive nature of your responses and the improvements made to the manuscript, I believe that it is now suitable for publication. Thank you for your responsiveness and commitment to enhancing the quality of your work.

Article Menu

Motion Capture in Mixed-Reality Applications: A Deep Denoising Approach

Further Information

Guidelines

MDPI Initiatives

Follow MDPI