Next Article in Journal
Dissimilar Rotary Friction Welding of Inconel 718 to F22 Using Inconel 625 Interlayer
Next Article in Special Issue
Single Evaluation of Use of a Mixed Reality Headset for Intra-Procedural Image-Guidance during a Mock Laparoscopic Myomectomy on an Ex-Vivo Fibroid Model
Previous Article in Journal
Characterisation of Microstructure and Mechanical Properties of Linear Friction Welded α+β Titanium Alloy to Nitinol
Previous Article in Special Issue
Network Analysis for Learners’ Concept Maps While Using Mobile Augmented Reality Gaming
 
 
Article
Peer-Review Record

A Study on Interaction Prediction for Reducing Interaction Latency in Remote Mixed Reality Collaboration

Appl. Sci. 2021, 11(22), 10693; https://doi.org/10.3390/app112210693
by Yujin Choi 1, Wookho Son 2 and Yoon Sang Kim 3,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Appl. Sci. 2021, 11(22), 10693; https://doi.org/10.3390/app112210693
Submission received: 31 August 2021 / Revised: 10 November 2021 / Accepted: 10 November 2021 / Published: 12 November 2021
(This article belongs to the Special Issue Augmented Reality: Trends, Challenges and Prospects)

Round 1

Reviewer 1 Report

This article presents a method to improve the action detection time with interaction prediction in MR collaboration scenarios. The authors used joint angle information from consecutive video frames and applied the k-nearest neighbor method to classify two gestures: grab and pinch. The experiment showed that the proposed method could save up to 0.19 seconds (12%) compared to the conventional method, and could potentially improve the overall MR collaboration experience for human-virtual object interaction.

**Strength:

+First, I appreciate the work's aim for addressing an important and timely issue: latency has been a long-standing challenge that limits the realism of human-virtual object interaction realism.

+The presentation flow is sound and easy to understand. 

+The numerical/statistical analysis is in-depth and supports the proposed contributions.

 

**Weakness:

However, I do have some concerns regarding the generality of the proposed approach.

-In Section 3, the authors didn't explain why k-nearest neighbor (k-NN) was selected for the task compared to other gesture recognition algorithms. The authors also didn't properly motivate why only two gestures, grab and pinch, were selected for the task: in reality, the MR hand-based interfaces may highly vary beyond these two conditions. Even with grabbing, there would be multiple variances with different fingers (say two for small objects, and five for large/heavy ones)

-In line 184, "Also, in the case of pinch, the tip joint of other fingers greatly changed." does not agree with Figure 3(b).

- In the paragraph below Figure 3, the authors showed that the tip joint information is essential to determine the hand movement, but claimed that "it is impossible to determine the hand's movement only with the tip joint" without providing any proof.

- Apart from the tip joint, the authors mentioned that the metacarpal and proximal joint information were added since they "had the least degree of change", and (would not) "compromise the representativeness of the tip joint". However, according to Figure 3, the distal joint also had large differences in the degree of change but wasn't included. More discussion of how the joint angle is defined should be provided here.

- In line 208, the consecutive joint angle wasn't clearly defined. For example, how are the five frames selected? Why "5" frames? How does the algorithm determine "when each gesture was started"? That is, what if the durations are longer/shorter?

- More details of the experiment in Section 4 should be provided. For example, the virtual objects appeared randomly at blue, red, and green positions, but were the order/position counterbalanced across conditions? Each subject performed 4 gestures on 27 virtual objects, so for a given virtual object, was only one gesture performed 4 times or both gestures were performed? How was the grab/pinch gesture ordered among the 27 objects? Were they counterbalanced as well?

- To properly evaluate the classification algorithm, in addition to the prediction success rate (Section 4.3.2), it's also important to evaluate how often the algorithm produces [false positive]. For example, if the subject performed a none gesture, how accurate is the algorithm?

- In terms of the interaction latency measurement, how was the time measured in the "with proposed method" condition when the prediction was wrong? Was it still the same as the "the moment when the virtual object for interaction appears to the moment when the prediction is completed"? In real MR applications, if the prediction was wrong, users generally need to undo/redo the action. Have the authors considered a time penalty when the prediction was wrong? Also, the red arrow in Figure 10 is not very informative.


Overall, I would recommend the manuscript to perform a thorough revision that addresses or explains the questions/concerns above.

Author Response

Responses to Reviewer:

First of all, the authors would like to express sincere gratitude for reviewer’s invaluable time and thoughtful comments.

In order to save the reviewer's invaluable time, and expedite the processing of the revised manuscript, the authors tried to be as specific as possible in our response to the reviewer. 

Overall, the authors agreed with the reviewer's opinion and comment. For better quality of the paper, the authors had read the manuscript carefully, and some sentences, figures were added and modified according to the reviewer's comment.


To. Reviewer 1

Points 1.
In Section 3, the authors didn't explain why k-nearest neighbor (k-NN) was selected for the task compared to other gesture recognition algorithms. The authors also didn't properly motivate why only two gestures, grab and pinch, were selected for the task: in reality, the MR hand-based interfaces may highly vary beyond these two conditions. Even with grabbing, there would be multiple variances with different fingers (say two for small objects, and five for large/heavy ones).

(Answers)
Thanks for the invaluable comment. According to reviewer’s comment, we added the part that explains gesture (page 2 line 83-85). The reference [21] was moved to explain the reason why we choose grab and pinch (number is changed from 23 to 21). Also, we explained this again in Chapter 3 (page 4 line 145-149). Additionally, we added the part that explains the reason why we choose k-NN (page 4 line 153-155). Also, we explained the limitations in Chapter 5 (page 18 line 550-552). 

Please refer the blue texts: Page 2 line 83-85 and Page 4 line 145-149, Page 4 line 153-155, Page 18 line 550-552


Points 2.
In line 184, "Also, in the case of pinch, the tip joint of other fingers greatly changed." does not agree with Figure 3(b).

(Answers)
Thanks for the considerable comment. According to reviewer’s comment, we modified the sentence to express the figure more clearly (page 5 line 191-192). In that part, we explained that each tip joint has changed greatly compared to other joints of each finger.

Please refer the blue texts: Page 5 line 191-192 

Points 3.
In the paragraph below Figure 3, the authors showed that the tip joint information is essential to determine the hand movement, but claimed that "it is impossible to determine the hand's movement only with the tip joint" without providing any proof.

(Answers)
Thanks for this invaluable comment! As the reviewer mentioned, we agreed that the explanation was not enough. According to reviewer’s suggestion, we added the part that explains an example (page 5 line 195-197) and a figure (Figure 4) to explain it.
 
Please refer the blue texts: Page 5 line 195-197, Figure 4


Points 4.
Apart from the tip joint, the authors mentioned that the metacarpal and proximal joint information were added since they "had the least degree of change", and (would not) "compromise the representativeness of the tip joint". However, according to Figure 3, the distal joint also had large differences in the degree of change but wasn't included. More discussion of how the joint angle is defined should be provided here.

(Answers)
Thanks for the considerable comment. According to reviewer’s comment, we added the part that defines joint angle (page 6 line 203-207) and explains the reason why we didn’t choose distal joints.

Please refer the blue texts: Page 6 line 203-207

 
Points 5.
In line 208, the consecutive joint angle wasn't clearly defined. For example, how are the five frames selected? Why "5" frames? How does the algorithm determine "when each gesture was started"? That is, what if the durations are longer/shorter? 

(Answers)
Thanks for the invaluable comment. According to reviewer’s comment, we added the part and a reference ‘[26]’ in order to explain how five frames were selected (page 7 line 225-229). Also, we added the part that explains how the algorithm was determined when each gesture was started (page 7 line 249-251). In addition, the starting and completion moment of task in experiment was defined in chapter 4.2 (page 9 line 312-321, figure 9).

Please refer the blue texts: Page 7 line 225-229, line 249-251 Page 9 line 312-321, Figure 9


Points 6.
More details of the experiment in Section 4 should be provided. For example, the virtual objects appeared randomly at blue, red, and green positions, but were the order/position counterbalanced across conditions? Each subject performed 4 gestures on 27 virtual objects, so for a given virtual object, was only one gesture performed 4 times or both gestures were performed? How was the grab/pinch gesture ordered among the 27 objects? Were they counterbalanced as well?

(Answers)
Thanks for the considerable comment. As the reviewer mentioned, we found that the explanation about the experiment was insufficient. According to reviewer’s comment, we added the part that explains the order and position of virtual objects appeared (page 8 line 270-271). Also, we added the part that explains the number of performed gesture (page 13 line 428-431).

Please refer the blue texts: Page 8 line 270-271, Page 13 line 428-431

Points 7.
To properly evaluate the classification algorithm, in addition to the prediction success rate (Section 4.3.2), it's also important to evaluate how often the algorithm produces [false positive]. For example, if the subject performed a none gesture, how accurate is the algorithm?

(Answers)
Thanks for the invaluable comment. According to reviewer’s comment, we compared the recorded gesture data with the predicted one. Also, we added a part that explains the false positive and false negative. (page 14 line 446-452, Table 12, Table 13).
In our experiment, false positive has the following meanings.
- grab: recorded gesture data was ‘pinch or none’ and predicted data was ‘grab’
- pinch: recorded gesture data was ‘grab or none’ and predicted data was ‘pinch’
And, false negative has the following meanings.
- grab: recorded gesture data was ‘grab’ and predicted data was ‘pinch or none’
- pinch: recorded gesture data was ‘pinch’ and predicted data was ‘grab or none’
Please refer the blue texts: Page 14 line 446-452, Table 12, Table 13


Points 8. 
In terms of the interaction latency measurement, how was the time measured in the "with proposed method" condition when the prediction was wrong? Was it still the same as the "the moment when the virtual object for interaction appears to the moment when the prediction is completed"? In real MR applications, if the prediction was wrong, users generally need to undo/redo the action. Have the authors considered a time penalty when the prediction was wrong? Also, the red arrow in Figure 10 is not very informative.

(Answers)
Thanks for the considerable comment. According to reviewer’s comment, we modified Figure 11(originally Figure 10). Also, we added the limitation that the penalty in wrong prediction cases was not considered in Chapter 5 (page 18 line 552-555).

Please refer the blue texts: Figure 11, Page 18 line 552-555

 

Author Response File: Author Response.docx

Reviewer 2 Report

A study on interaction prediction for reducing 2 interaction latency in remote mixed reality collaboration

In the considered manuscript, the authors propose and evaluate a method aimed at reducing latency in MR. The topic of the paper is potentially interesting and relevant for the Applied Sciences journal. The experimental evaluation undertaken by the authors to support their method is also a positive feature. The problem is, I do not see an actual contribution of the paper. The proposed method (predicting users' actions and making the system act before they are completed) is not novel. The recognition of the gestures is rather basic, and the accuracy (about 85% for just two gestures) is lower than in existing work. The new interaction mode, with the predictions, is not assessed with regard to the overall user experience. Correspondingly, we do not know if the failed predictions have a considerable negative effect. All that authors demonstrate is that the latency time is reduced, but this would be expected by the method's design.
So, I cannot recommend accepting the paper in the current form, as more work is deemed necessary. Some more detailed comments and suggestions follow.

First, I would recommend the authors to provide more detailed review of effect of latency in humans. What values of latency are tolerated and what values of latency become critical in the interaction?

Second, based on the above they could discuss the target value for the latency reduction and its desired balance with accuracy. In the authors' experiment, the latency time is reduced by 12%, but in over 15% of cases the gestures are not predicted correctly (even with just 2 possible gestures). Is it really worth it?

Third, the authors need to specify what happens if the prediction is not a success in a real interaction. What effect does it have on the overall UX? They should measure the latter in their experiment, not just SSQ scores pre- and post-. As I said before, it's rather easy to reduce time, if we do not account for accuracy.

Fourth, the authors should strengthen their statistical analysis. For instance, in Table 15 and Table 16 they seem to perform normality tests for just 7 values, which is hardly appropriate.

As for the language and style, the paper is readable overall, but there are many minor grammatical errors and typos that should be corrected. E.g.:
"Related Works" -> "Related Work"
"affect a user ... when human-VO interaction" -> "in human-VO interaction"?
"are very various"
"in which the virtual object can be appeared"
"('saved time'" - no closing round bracket
"compared to in" -> "compared to it"

Author Response

Responses to Reviewer:

First of all, the authors would like to express sincere gratitude for reviewer’s invaluable time and thoughtful comments.

In order to save the reviewer's invaluable time, and expedite the processing of the revised manuscript, the authors tried to be as specific as possible in our response to the reviewer. 

Overall, the authors agreed with the reviewer's opinion and comment. For better quality of the paper, the authors had read the manuscript carefully, and some sentences, figures were added and modified according to the reviewer's comment.

To. Reviewer 2

Points 1.
As for the language and style, the paper is readable overall, but there are many minor grammatical errors and typos that should be corrected. E.g.:
"Related Works" -> "Related Work"
"affect a user ... when human-VO interaction" -> "in human-VO interaction"?
"are very various"
"in which the virtual object can be appeared"
"('saved time'" - no closing round bracket
"compared to in" -> "compared to it“. 

(Answers)
Thanks for the considerable comment. According to reviewer’s comment, we modified typo error in our paper.

Please refer the blue texts: Page 2 line 56, line 88, Page 3 line 132, Page 4 line 148-149, Page 8 line 262-264, Page 15 line 482, Page 16 line 492, Page 18 line 546


Points 2.
First, I would recommend the authors to provide more detailed review of effect of latency in humans. What values of latency are tolerated and what values of latency become critical in the interaction?

(Answers)
Thanks for the invaluable comment. According to reviewer’s comment, we added the part that explains latency (page 3 line 102-105) and added reference ‘[23]’. In general, the latency of 100 ms or less was known as a threshold that does not affect users. But the recent study found that such a small latency could affect users. So, even though latency is small, all latencies play a major role in user satisfaction during the interaction. To make this clear, explanation and the reference were added in the revised manuscript. 

Please refer the blue texts: Page 3 line 102-105


Points 3.
Second, based on the above they could discuss the target value for the latency reduction and its desired balance with accuracy. In the authors' experiment, the latency time is reduced by 12%, but in over 15% of cases the gestures are not predicted correctly (even with just 2 possible gestures). Is it really worth it?

(Answers)
Thanks for the invaluable comment. As the reviewer mentioned, the accuracy of interaction prediction using k-NN was rather low. At first, it was due to the fact that k-NN is a relatively simple algorithm. And secondly, the dataset for the experiment was too small. We added the part that explains these limitations in Chapter 5 (Page 18 line 550-555). 
However, our purpose was to investigate the feasibility of the proposed method as quickly as possible, so we used k-NN algorithm which is a relatively small and simple. We added the part that explains why we selected the k-NN algorithm (Page 4 line 153-155). 
Also, in the previous manuscript, our result was not considered about false-positive and false-negative. In other words, our result did not reflect true negative. So, we added Table 12-13 and explanation part about false positive and false negative (Page14 line 446-452, Table 12, Table 13). 
Authors agreed with the reviewer’s opinion that the accuracy of 15% in our study be considered a bit low, but it is worthwhile in that we tried a new aspect and confirmed its possibility. Authors would appreciate very much if the reviewer could consider it. To make these discussions highlighted, explanation and limitation were added. 

Please refer the blue texts: Page 4 line 153-155, Page14 line 446-452, Table 12, Table 13, Page 18 line 550-555


Points 4.
Third, the authors need to specify 1what happens if the prediction is not a success in a real interaction. 2What effect does it have on the overall UX? 3They should measure the latter in their experiment, not just SSQ scores pre- and post-. As I said before, it's rather easy to reduce time, if we do not account for accuracy.

(Answers)
Thanks for the considerable comment. As the reviewer noticed, we did not consider the penalty in wrong prediction cases in real interaction. There was not any effect on the overall UX when the prediction was wrong in our experiment, either. We added the limitation that the penalty in wrong prediction cases was not considered in Chapter 5 (Page 18 line 552-555). Also, we added the part that explains all wrong prediction cases (false positive and false negative) in Chapter 4.3.2 (Page14 line 446-452, Table 12, Table 13) Based on the reviewer’s comment, our future works will include the penalty in wrong prediction cases, and the authors really appreciate this very much.

Please refer the blue texts: Page14 line 446-452, Table 12, Table 13, Page 18 line 552-555


Points 5.
Fourth, the authors should strengthen their statistical analysis. For instance, in Table 15 and Table 16 they seem to perform normality tests for just 7 values, which is hardly appropriate.

(Answers)
Thanks for the considerable comment. Our subject number was 7. So, we measured SSQ scores about 7 subjects through pre- and post- questionnaires. Therefore, we compared 7 results in Table 15 and Table 16. As a result, most of the value couldn’t satisfy the normal distribution (maybe due to less subject number). So we performed a non-parametric test in Table 17. Also, we added the part that we intend to expand our study with more subjects (page 18 line 553-555).

Please refer the blue texts: Page 18 line 553-555

Author Response File: Author Response.docx

Reviewer 3 Report

Authors present in this paper their method based on interaction prediction for reducing the time for detecting the action between human and virtual object (for selected gestures). The research is supported by and presented on an experiment documented with gtaphs and tables with results. 

Generally, the method developed is documented in Sect. 3. How was the implementation tested?

Also, please, emphasize the motivation for the research and the application domain.

Some minor observations:

  • avoid double headings, e.g. l. 56, l. 252, l. 335 etc, put some short introductions
  • change "In order to" => "To" (somehow wordy ...)
  • references 3 and 4 are not considered as cited bibliography, they should be put into footnote, not into bib resources
  • since the lenght of arm is in experiment considered, it would be interesting also to explore also some parameters based on golden ratio => suggestion for the future research, not applicable for this paper
  • l. 422 (and others), because k is math parameter, consider also math font in regular text

Author Response

Responses to Reviewer:

 

First of all, the authors would like to express sincere gratitude for reviewer’s invaluable time and thoughtful comments.

In order to save the reviewer's invaluable time, and expedite the processing of the revised manuscript, the authors tried to be as specific as possible in our response to the reviewer.

Overall, the authors agreed with the reviewer's opinion and comment. For better quality of the paper, the authors had read the manuscript carefully, and some sentences, figures were added and modified according to the reviewer's comment.

 

To. Reviewer 3

 

Points 1.

Generally, the method developed is documented in Sect. 3. How was the implementation tested?

 

(Answers)

Thanks for the invaluable comment. As the reviewer mentioned, our explanation was insufficient. Before the experiment, the above proposed method was implemented as a prototype and tested. For the proposed method to work properly, the most important thing is to check whether the raw data is being input properly in the original image input step. This is because subsequent steps are performed based on the input data. Therefore, we tested to compare the raw data input in the original image input stage with the actual hand gesture. As a result of the test, raw data input was performed well. Based on these input values, we performed additional tests by adjusting the threshold for classifying grab, pinch, and none. It was confirmed that the proposed method could classify users’ hand gestures into grab, pinch, and none through the additional test. To make this clear, we added explanation about how to test our procedure as the followings: “We examined whether our procedure worked well by comparing the obtained hand gestures with actual ones. It was confirmed that the proposed method could classify users’ hand gestures into grab, pinch, and none through the additional test.”

Please refer the pink texts: Page 7 line 251-253

 

Points 2.

Also, please, emphasize the motivation for the research and the application domain.

 

(Answers)

Thanks for the considerable comment. According to reviewer’s comment, we added the part that explains expected benefit by the proposed method in Chapter 5(Page 18 line 553-555). Also, we emphasized our motivation in Chapter 1(Page 1 line 31-43).

Please refer the pink texts: Page 1 line 31-43, Page 18 line 553-555

 

Points 3.

Avoid double headings, e.g. l. 56, l. 252, l. 335 etc, put some short introductions

 

(Answer)

Thanks for the invaluable comment. As the reviewer mentioned, we agree that avoiding double heading is more proper. But double heading was suggested in journal’s template (example). So we used double heading.

 

Points 4.

change "In order to" => "To" (somehow wordy ...) l.422 (and others), because k is math parameter, consider also math font in regular text

 

(Answer)

Thanks for the considerable comment. According to reviewer’s comment, we modified several parts.

Please refer the pink texts: Page 1 line 17, Page 4 line 136, line 158, line 159, Page 10 line 340, line 341, line 342, line 344, line 346, Table 1, Table 3, line 364, Page 11 line 366, line 369, line 370, Table 4, Table 6, line 390, line 392, Page 12 line 396, line 397, Table 7, Table 9, line 417, line 419, Page 13 line 424, line 425, line 426, Page 15 line 470, Page 16 line 510

 

Points 5.

Since the length of arm is in experiment considered, it would be interesting also to explore also some parameters based on golden ratio => suggestion for the future research, not applicable for this paper

 

(Answer)

Thanks for the invaluable comment. As the reviewer mentioned, we will consider body ratio (such as golden ratio) in our future works. It is a really valuable comment that makes our study fruitful. Authors really appreciate this very much.

 

Points 6.

references 3 and 4 are not considered as cited bibliography, they should be put into footnote, not into bib resources

 

(Answer)

Thanks for the considerable comment. As the reviewer mentioned, we agree that references 3 and 4 should be put into the footnote is more proper. But journal’s template suggested references instead of using the footnote. So we included references 3 and 4 as reference formats.

 

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

I have read the authors' reply to the reviewers' comments and the revised version of the manuscript. I see that the authors have addressed the recommendations and considerably improved the paper.

Although I still believe that considering the latency reduction without accounting for accuracy makes little sense, I understand that this cannot be fixed in the current version of the manuscript. So, I hope that the authors will address this in their future experiments.

Meanwhile, the paper is mostly publishable in the current form and recommend accepting it.

Author Response

Responses to Reviewer:

 

First of all, the authors would like to express sincere gratitude for reviewer’s invaluable time and thoughtful comments.

In order to save the reviewer's invaluable time, and expedite the processing of the revised manuscript, the authors tried to be as specific as possible in our response to the reviewer.

Overall, the authors agreed with the reviewer's opinion and comment. For better quality of the paper, the authors had read the manuscript carefully, and some sentences, figures were added and modified according to the reviewer's comment.

 

To. Reviewer 2

 

Points 1.

I have read the authors' reply to the reviewers' comments and the revised version of the manuscript. I see that the authors have addressed the recommendations and considerably improved the paper.

Although I still believe that considering the latency reduction without accounting for accuracy makes little sense, I understand that this cannot be fixed in the current version of the manuscript. So, I hope that the authors will address this in their future experiments.

Meanwhile, the paper is mostly publishable in the current form and recommend accepting it.

 

(Answers)

Thanks for the invaluable comment. According to reviewer’s comment, we added and modified the several parts. Also, as we addressed in Chapter 5(Page 18 line 558-560), we will consider wrong prediction cases, and extend into applying advanced algorithms and larger number of subjects in future works. Through this, we will reduce latency with accounting for accuracy in future works as you mentioned. Our paper improved a lot because of your comments. Authors really appreciate this very much.

Please refer the pink texts: Page 18 line 558-560

Author Response File: Author Response.docx

Back to TopTop