Objective measures

The application records behavioral responses that could be inferred from the HTC Vive controllers and the headset log data. The following variables were measured:

	- **–** Time to submit Task 1;
	- **–** Time to submit Task 2;
	- **–** Response time to new messages in Task 1;
	- **–** Response time to new messages in Task 2.

Subjective measures

We utilized multiple questionnaires to evaluate participants' subjective experience with the application as detailed below.



**Table 4.** Questionnaires used to rate the level of English proficiency, the previous experience with immersive VR, and the perceived uncertainty of Task 1 and Task 2.

### **4. Results**

In this section, we present the objective and subjective results of our experiment concerning our research questions:

We compared the ratings that the participants gave to the perceived uncertainty of two tasks with the Wilcoxon signed-rank test. The result found a significant difference between them (v = 0, *p* = 0.0003553 < 0.05), suggesting that overall the participants rated Task 2 with higher perceived uncertainty than Task 1 (See also Figure 9 for a visual comparison of the ratings).

To find the effects of different degrees of induced uncertainty on the user's behavior, we first confirmed the normality of the data with the Shapiro–Wilk test at the 5% level. Then, we conducted the Paired *t*-test. The results did not yield a significant difference between the response time in the two tasks (t(16) = 1.44, *p* = 0.084 > 0.05). However, the box plot in Figure 10 visually shows a lower response time to pick up the phone in Task 2 when compared to Task 1.

Since the normality of the data was rejected by the Shapiro–Wilk test at the 5% level, using the Wilcoxon signed-rank test (v = 0, *p* = 0.00001526 < 0.05), we found a significant difference between the task completion time for Task 1 and Task 2 (See also Figure 11 to see a visual comparison between the amounts).

**Figure 9.** A comparison between the means of ratings obtained from the participants for their perceived uncertainty of Task 1 (M = 1.88, SD = 1.11) and Task 2 (M = 3.41, SD = 1.00); Range of answers = 1–5.

**Figure 10.** A comparison between the response time of the user to the source of information (the phone call) during Task 1 and Task 2, measured in [s].

**Figure 11.** A comparison between the task time completion of the user for Task 1 and Task 2, measured in [s]. In these data, two completion time values identified as outliers which are shown in white circles in the figure.

To report the differences in the change of position in Task 1 in comparison to Task 2, Figures 12 and 13 present a visual comparison of participants' change of position.

**Figure 12.** A visualization of participants' change of position in Task 1; the 8 task targets are shown in blue.

**Figure 13.** A visualization of participants' change of position in Task 2; the task targets related to before the change of the task are shown in red; the task targets related to after the change of the task are shown in green.

We used Pearson's r-test to measure the strength and direction of the possible linear relationship between the scores on system usability, immersion, presence, motion sickness, and intolerance of uncertainty questionnaires and the recorded time to answer the second call (i.e., response time to the source of information in Task 2). We also used this test for

finding the possible relationships between the scores of these questionnaires and the time spent on the second task. See Table 5 for the results of these tests.


**Table 5.** Correlation values between subjective and objective measures.

Table 6 reports the mean and standard deviation of scores obtained from the questionnaires about the quality of the participants' experience and their intolerance to uncertainty. Figures 14–20 present a visual comparison of the data obtained.

**Table 6.** Mean and standard deviation of scores received from participants' answers to the questionnaires.


**Figure 14.** A comparison between the scores obtained from the participants from the System Usability Scale (SUS) questionnaire (M = 76.32, SD = 9.89).

**Figure 15.** Percentages of ratings for each category from the System Usability Scale (SUS) questionnaire, categories from left to right are: best imaginable (score > 84.1); excellent (72.6 < score < 84.0); good (62.7 < score < 72.5); ok (51.7 < score < 62.6); poor (26 < score < 51.6); worst imaginable (score < 25).

**Figure 16.** A comparison between the scores obtained from the participants from the Immersive Experience questionnaire (IEQ) (M = 65.84 (%), SD = 7.13 (%)).

**Figure 17.** Range of (1: Not at all; 5: A lot) for a comparison between the means of scores for the components of the Immersive Experience questionnaire (IEQ) scale, from left to right: overall IEQ score (M = 3.17); challenge (M = 2.59); cognitive involvement (M = 4.06); control (M = 2.95); emotional involvement (M = 3.20); and real-world dissociation (M = 3.059).

**Figure 18.** A comparison between the scores obtained from the participants from the Slater–Usoh– Steed presence (SUS) questionnaire (M = 64.37 (%), SD = 14.50 (%)).

**Figure 19.** A comparison between the scores obtained from the participants from the Motion Sickness questionnaire (MASQ) (M = 22.18 (%), SD=12.84 (%)).

**Figure 20.** A comparison between the scores obtained from the participants from the Intolerance of Uncertainty Scale (IUS) (M = 59.48 (%), SD=11.63 (%)).

### **5. Discussion**

In this section, we present and discuss the main findings of the experiment in more detail.

The main purpose of this study was to sugges<sup>t</sup> the design and implementation of a VR platform that is able to create the experience of uncertainty of interpersonal communications on two levels and to record and report human behavioral responses to this exposition. In this paper, we addressed these research questions:

RQ1: Is there any significant difference between subjective ratings of participants for perceived uncertainty of Task 1 and Task 2?

Our findings from a comparison of the post-experiment ratings of the participants to the perceived uncertainty of two tasks indicate the potential of the proposed design to successfully produce at least two levels of uncertainty in the experience of the system.

RQ2: How do different degrees of induced uncertainty affect the users' behavior and performance in this immersive virtual workplace scenario?


In this paper, we targeted the study of the differences between behavioral responses to the experience of two levels of uncertainty. In particular, we focused on studying the real-time records of the time and position of the participants.

Related to time, we were interested in two variables:


Related to position: The positions of the participants in Task 1 and Task 2 were recorded by the application every 4 s. The distribution of them along two axes of X and Y and in comparison with the targets for each individual task is visualized in Figures 12 and 13. From a visual comparison of the two plots, we can see that in total, participants in Task 2, a task with an increase in uncertainty level, have more changes of position. This finding is aligned with what we were expecting.

In sum, our experiment suggests that adding uncertainty to a task will harm task performance on completion time, but not in response time.

RQ3: How are the users' subjective responses to uncertainty related to the objective responses?

Despite our expectations for finding strong relations between the subjective and objective measures, we found a small negative linear correlation between the scores of the system usability scale and time to answer the second call, small positive linear correlations between the presence score and both task completion time for Task 2 and time to answer the second call, and a small positive linear correlation between scores of the motion sickness questionnaire and time to answer the second call. We think with the increase in sample size, we can report stronger correlations between these variables.

RQ4: How does the user evaluate the quality of his/her experience through subjective measures?

Another purpose of the study was to report the results of the participants' evaluation of their experience with the system. The mean score of our results from the System Usability Scale (SUS) conveys a higher amount than the average SUS score which is 68. This gives an immediate insight into the overall good usability of the system and the need

for minor improvements in the design [69]. In addition, based on the adjective rating scale introduced by [70], we also found that nearly 70 % of the participants' ratings of the usability of the system fit into "Best imaginable", "Excellent", and "Good" categories (See Figure 15). For the Immersive Experience questionnaire (IEQ), the Slater–Usoh–Steed presence (SUS) questionnaire, and the Motion Sickness questionnaire (MASQ), the average of the scores also falls into an acceptable range representing a good quality of the participants' experiences (See Figures 16–19).

In sum, we can conclude that the system with the help of the designed environment and story plot is able to create a pleasant virtual experience. In addition, with the help of tracing from the HTCVive pro controllers and headset we were able to successfully capture in real time the behavioral responses of the participants related to the time of actions, and user position to our variable of interest.

### **6. Conclusions and Future Works**

In this paper, we investigated the effects of uncertainty level in a virtual office on participants' objective and subjective responses through a controlled human-subject study. We designed an experimental scenario inspired by a famous story name Amelia Bedelia written by Peggy Parish [62]. For the design of our system, we first investigated and carefully selected the virtual reality interfaces and environments that supported our research needs. In addition, we were inspired by previous games which applied uncertainty in their designs. The goal was to develop a system that supports a pleasant 3D immersive experience with real-world-like interactions and rich data-collecting techniques. In our usability study, participants were asked to complete two different tasks inside a virtual office where they were also involved in interpersonal communication with their boss on the first day of work. We measured the participants' objective responses through the log data captured from the tracing of HTCVive pro controllers and headsets as well as assessed their subjective experience through questionnaires. We determined that the two proposed versions of tasks received significantly different ratings from the participants for their perceived uncertainty after the experiment. In addition, our results supported that the time taken to submit different tasks differs significantly. In addition, results from the usability, immersion, presence, and motion sickness questionnaires conveyed that overall, the participants were satisfied with the experience by scoring the usability, presence, and immersion of the experience on average higher than 50% and the motion sickness of the experience less than 30%.

This paper suggested that our proposed VR system can manipulate the levels of uncertainty to study it. In the design of this system, we inspired ourselves from real-life situations. An example workplace scenario could be what happens regularly for one in the role of a manager. S/he may receive multiple unpredictable inputs at once and has to constantly monitor and choose what to do first, stay productive, and successfully monitor time allocations to be able to work with everyone involved [71]. To indicate how effectively our system replicates such real-life happenings under the same conditions, an evaluation of our proposed system against real-life baseline conditions is required. We decided not to consider this system evaluation in this paper because of our limitations in controlling the confounding factors coming from real-world settings that make it hard for us to have a valid measure of the effects of uncertainty. So, we leave it for future work. In addition, we plan to investigate more behavioral responses from the user in a future study and assess the feasibility of this application with a desktop-based version of it. Finally, a larger sample size helps us to report and study more powerful behavioral results of the study.

**Author Contributions:** Conceptualization, S.H.; Methodology, S.H.; Software, S.H.; Validation, S.H.; Investigation, S.H.; Writing—original draft, S.H.; Writing—review & editing, G.M.; Supervision, G.M.; Funding acquisition, G.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been funded by the AlmaAttrezzature 2017 gran<sup>t</sup> and by the Italian Ministry for Research and Education (MUR).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Available upon request.

**Conflicts of Interest:** The authors declare no conflict of interest.
