Next Article in Journal
Dynamic Bottleneck Identification of Manufacturing Resources in Complex Manufacturing System
Next Article in Special Issue
Conceptual Design of an Extended Reality Exercise Game for the Elderly
Previous Article in Journal
Application and Optimization of Algorithms for Pressure Wave Evaluation Based on Measurement Data
Previous Article in Special Issue
Calculating and Analyzing Angular Head Jerk in Augmented and Virtual Reality: Effect of AR Cue Design on Angular Jerk
 
 
Article
Peer-Review Record

A Multi-Object Grasp Technique for Placement of Objects in Virtual Reality

Appl. Sci. 2022, 12(9), 4193; https://doi.org/10.3390/app12094193
by Unai J. Fernández 1, Sonia Elizondo 1, Naroa Iriarte 1, Rafael Morales 2, Amalia Ortiz 1, Sebastian Marichal 1, Oscar Ardaiz 1 and Asier Marzo 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(9), 4193; https://doi.org/10.3390/app12094193
Submission received: 27 February 2022 / Revised: 13 April 2022 / Accepted: 19 April 2022 / Published: 21 April 2022
(This article belongs to the Special Issue New Frontiers in Virtual Reality: Methods, Devices and Applications)

Round 1

Reviewer 1 Report

This paper presents a study of a technique to grasp multiple objects at once using standard VR controllers. The study is well planned and executed, although a larger number of participants would be more appropriate. The paper is well presented and the results reveal some interesting findings regarding the multi-object grasp technique compared to the real-world and the single-object ones.

Some issues that need to be revised are:

  • The process of eliciting the multi-object grasp technique (see l. 38) is not presented in the paper
  • The preliminary study mentioned in l. 118 is also not presented and explained
  • Figure number is missing in l. 92
  • Present in a few sentences scenarios where the multi-object grasp technique would be helpful and practical. You mention virtual stores and physical rehabilitation applications but without further explanation
  • what statistical analysis did you do to obtain p-values?
  • At the end of the results you mention user ranking of the conditions, but it's not mentioned in the process or the measures. 

Author Response

We genuinely thank the reviewer for the effort. We find all the suggestions very pertinent and we have addressed them as follows:


1) eliciting the multi-object grasp technique (see l. 38) is not presented in the paper

We have added subsections (3.1 Single-object grasp and 3.2 Multi-object grasp) to the section "3. Interaction Techniques" for better organization.

We have moved l.38"The multi-object grasp technique was elicited through informal interviews." to the new subsection 3.2 and added the following text afterwards:
"In order to obtain a natural set of interactions with the controllers for performing the actions (Grab first, grab one, release all, release last), we gathered 6 people with backgrounds in computer science, electronics and medicine. Firstly, we asked them to perform the actions with real wooden tokens. Then we asked them, how they would perform those actions with the game controller. Most participants agreed on the Grab first action (pressing the ring trigger) and Release all (release that trigger). For release one, some people suggested a throwing gesture but others mentioned pressing one of the top buttons. For Grab one, there was no agreement between pressing the index trigger or doing it automatically when an object was touched while the Grab first mode was held."



2) preliminary study mentioned in l. 118 is not presented

We have added a subsection for the preliminary study and after the text l.118 "study similar to the one in Section 4 but with 2 conditions" we have added the following text:

"Eight participants performed the study described in Section 4 with only the desktop and shelf scenarios and 2 conditions: using the index trigger in the controller to Grab one object, or doing it automatically as the virtual hand touched an object while the Grab mode was held. The Task completion time (TCT) for the 'with-button' was M=14.23 s SD= 5.58 s;  for the 'no-button' condition M=14.00 s SD=3.84 s. A t-paired test reported a p-value of 0.43, so no significant difference was found between using the button to select each object or automatically attaching it to the hand."

This connects with the already existing text "Thus, we selected the option of pressing the button since it is closer to the real action, i.e., we actively grab more objects by moving our thumb and index, they do not get automatically added or stacked to our hand as we pass it around."


3) Figure number is missing in l. 92
Corrected


4) Present in a few sentences scenarios where the multi-object grasp technique would be helpful and practical. You mention virtual stores and physical rehabilitation applications but without further explanation

We have added the following text after the statement l.29" may present a more realistic and transferable experience with multi-object grasp methods":

"For example, when visiting a clothes shop, it is common to grab multiple clothes to test various sizes, or on the supermarket to pick-up multiple yogurts of different flavours. In rehabilitation, there are several tasks that include grabbing multiple items such as playing with cards or moving specifically shaped tokens." 


5) what statistical analysis did you do to obtain p-values?

At section 5. Results we have added the following text:
"The measurements from the user study were analysed using ANOVA repeated measures to detect significant effects of the condition, posthoc tests with Bonferroni correction were used to determine significant differences."


6) At the end of the results you mention user ranking of the conditions, but it's not mentioned in the process or the measures.

At the end of the first paragraph of subsection "4.6. Measurements", we have added the following text:
"In the last part of the questionnaire, participants were asked to rank the conditions according to their preferences: from 1 (preferred one) to 3 (least preferred condition)."

Reviewer 2 Report

In this paper, the authors present a novel controller-based grasping technique for VR allowing the handling of multiple objects at once, inspired by humans' real-world capabilities for grasping and handling multiple objects with a single hand. The paper is very well written, clearly structured, references are adequate and the work appears technically sound. The conclusions are generally supported by the experimental results. However, there are a few points which I believe must be addressed to ensure this paper is fit for publication (see below). I believe all the concerns I raise below can be easily addressed by the authors, and given the otherwise good quality of the presented work, I would argue for acceptance after minor revisions.

Major remarks

Figures 5, 6 and 7 feature error bars with no legend. Are these confidence intervals? (if so, what percentile?) Standard deviations around a mean?

 

It is unclear how exactly the instructions for object placement were provided in the real-world scenario. From Fig. 4 I assume this is using the laptop screen with the laptop placed somewhere in the environment of the user. However, Fig. 4 also shows that the instructions are inconsistently provided between the real and virtual conditions (different apparent screen sizes, different instructions layout compared to the task layout in the desktop condition). I believe this could affect performance, in particular w.r.t. statements like (L. 239) "tasks performed in the real environment were easier to understand". Whether the differences would tend to exaggerate, reduce, or not impact differences observed between tasks in the real and virtual environments is up for debate, but I would appreciate it if the authors could present this more clearly and mention it as a possible limitation of their study.

 

Figs. 6 and 7: "Desktop and Shelf are aggregated since results were very similar." => this is not scientifically acceptable as such. At the very least, a statistical analysis showing the lack of significant differences between results in both tasks should be presented. Even then, these are two different tasks, so if there is a concern about space/readability, maybe present only the results for Desktop (or Shelf) and said statistical analysis, or figure out a more effective way of presenting the results for all three tasks? Aggregating results across different tasks does is not a valid approach.

 

In the discussion, the authors state "We hope that the presented techniques help to design training VR systems in which the practice of coarse motion can be transferred to the real life.", reiterating the stated goal of designing an interaction technique to enable better skills transfer between VR and the real world. However, the results show there are large and significant differences between real-world performance and the presented interaction technique across the board. To me, simply stating that one hopes the technique could help design effective VR training systems amounts to wishful thinking. I would argue that given the current results, it is just as likely that the presented techniques are inherently limited in their design and incapable of achieving anything close to real-world-equivalent performance or skills transfer. The authors should provide a discussion of why they believe pursuing this avenue could eventually achieve this goal if they want to make such statements.

 

Minor remarks

(L. 92) missing reference to figure

(L. 300) typo : "Although we did no use ..." 'no' should be 'not'

 

Author Response

We really appreciate the useful comments from the reviewer. We have checked the raised issues, and concluded that some of them were caused by sentences that were not expressed properly. We have addressed them as follows:


1) Figures 5, 6 and 7 feature error bars with no legend. Are these confidence intervals? (if so, what percentile?) Standard deviations around a mean?

In the figures with error bars (Figure 5, 6 and 7), we have added to the caption: "Error bars indicate standard error."


2) It is unclear how exactly the instructions for object placement were provided in the real-world scenario. From Fig. 4 I assume this is using the laptop screen with the laptop placed somewhere in the environment of the user. However, Fig. 4 also shows that the instructions are inconsistently provided between the real and virtual conditions (different apparent screen sizes, different instructions layout compared to the task layout in the desktop condition). I believe this could affect performance, in particular w.r.t. statements like (L. 239) "tasks performed in the real environment were easier to understand". Whether the differences would tend to exaggerate, reduce, or not impact differences observed between tasks in the real and virtual environments is up for debate, but I would appreciate it if the authors could present this more clearly and mention it as a possible limitation of their study.

The sentence (L. 239) "tasks performed in the real environment were easier to understand" was not in the questionnaire nor was expressed by the users, it was our conclusion extracted from Q9 "the users felt more comfortable in terms of difficulty" and Q14 "system complexity in that condition". Upon review, we reckon that that sentence can be confusing and it is not a direct conclusion from Q9 and Q14, therefore we have removed it. In fact, we did not observe participants having any difficulty understanding the task to be performed across all conditions and scenarios.

The following descriptions: 
l160. "The instructions for the trials were shown on a laptop." 
l165. "The instructions were shown in a virtual screen in the VR scenario."
have been expanded to:
"The instructions for the trials were shown on a laptop with a 14" screen, the instructions for the trials were shown as they appear in Figure 3. Users reported no issues understanding or seeing them across the different scenarios."
"The instructions were shown in a virtual screen in the VR conditions, on different scenarios it was placed at different distances but users reported no problem observing or understanding the instructions."
 

3) Figs. 6 and 7: "Desktop and Shelf are aggregated since results were very similar." => this is not scientifically acceptable as such. At the very least, a statistical analysis showing the lack of significant differences between results in both tasks should be presented. Even then, these are two different tasks, so if there is a concern about space/readability, maybe present only the results for Desktop (or Shelf) and said statistical analysis, or figure out a more effective way of presenting the results for all three tasks? Aggregating results across different tasks does is not a valid approach.

We note that user studies from the related work [10, 17, 18, 22] do not split qualitative questionnaires by scenario; that is, qualitative questionnaires are split only by condition. We gathered our qualitative data in the same way, but since we performed the evaluations in 2 sessions to keep them below 1 hour, we had a questionnaire for the first session (Desktop&Shelf scenarios) and another questionnaire for the second session (Room scenarios) but data is always split by condition.

We think that l.236 "Scenarios Desktop and Shelf are aggregated since results were very similar." can be confusing, that sentence could make people think that we had the qualitative data split by condition and scenario; and that we aggregated Desktop and Shelf afterwards. What we meant is that they were asked in the same aggregated questionnaire, since TCT & Distance travelled did not showed different results between Desktop and Shelf it is justified to do it like that.

We have replaced it by "Qualitative questionnaires for Desktop and Shelf scenarios were answered at the end of the first session, questionnaires from Room scenario were gathered in session 2."

We think that providing qualitative feedback split by these two groups provides interesting insights given the categorical difference between the static conditions of Session 1 and the "on the walk" nature of Session 2. Yet we can aggregate all the scenarios. Note that in all these cases, the data is split (not aggregated) by condition.


4) In the discussion, the authors state "We hope that the presented techniques help to design training VR systems in which the practice of coarse motion can be transferred to the real life.", reiterating the stated goal of designing an interaction technique to enable better skills transfer between VR and the real world. However, the results show there are large and significant differences between real-world performance and the presented interaction technique across the board. To me, simply stating that one hopes the technique could help design effective VR training systems amounts to wishful thinking. I would argue that given the current results, it is just as likely that the presented techniques are inherently limited in their design and incapable of achieving anything close to real-world-equivalent performance or skills transfer. The authors should provide a discussion of why they believe pursuing this avenue could eventually achieve this goal if they want to make such statements.

We agree that that sentence from the conclusion at the end of the paper is too speculative and sustained more by our wish for future research than by the actual results. Therefore, we have removed it, we think that the sentences about future work from the discussion (e.g., l.306 "how well training in virtual environments transfer to the real world" ) reflect the intention and are more appropriate.

Although we would like to note that even if skill transfer is not as high with the virtual conditions as with the real conditions, VR training may still valuable since it is not always possible to practice in a real environment due to safety or equipment availability. 


5) (L. 92) missing reference to figure
Corrected


6) (L. 300) typo : "Although we did no use ..." 'no' should be 'not'
Corrected

Round 2

Reviewer 1 Report

The authors have successfully addressed my comments and suggestions, and the paper is now fit for publication.

Back to TopTop