*2.3. Experimental Design*

The experimental design includes two independent variables: LOA and levels of workload. A between-within participant experimental design was conducted with the LOA as the within variable while level of workload was the between variable. Four groups were designed depicting the different levels of workload. Each participant was randomly assigned to one of the four groups and experienced both LOA modes (Table 1).


#### **Table 1.** Experimental design.

#### *2.4. Study Hypotheses*

The model for the study (Figure 5) and the hypotheses describing the proposed connection between the constructs, user preferences and the study variables (LOA and levels of workload) along for the rationale for the hypotheses are presented as follows:

We suspect that at all workload levels, high LOA will enable the users to perform efficiently and effectively since the high LOA involves the robot carrying out most aspects of the main task which would likely improve performance [37]. Therefore, we propose:

**Hypothesis 1.** *Quality of task (QoT) execution will be higher with high LOA than with low LOA for all workload levels.*

Several meta-studies conducted regarding levels of automation [38], ref. [39] seem to suggest that the workload experienced by users is influenced by the LOA of the system, particularly in situations of routine performance. This does not discountenance the effect of task complexity but seems to point to the effect level of workload may have in low task complexity. Since a major component of usability is the users' perception of the system use [40] along with effectiveness and efficiency, which high LOA will likely increase, we posit:

**Hypothesis 2.** *Usability will be higher with high LOA than with low LOA for all workload levels*.

**Figure 5.** Model for the study and hypotheses.

Research has revealed that as automation increases, workload is expected to decrease, particularly if the automation is properly designed and does not provide new challenges and tasks related to monitoring or other forms of engagement [39]. Moreover, in the design of adjustable robot autonomy in human–robot systems, research shows that as task complexity increases, robot effectiveness is likely to reduce if the robot is operating at higher autonomy [41]. Users seem to intuitively understand that autonomous systems could encounter difficulties in more complex situations with high uncertainty [42] Therefore, in terms of user preferences, we propose:

**Hypothesis 3.** *Participants will prefer high LOA to low LOA for high workload and low LOA to high LOA when task complexity is increased*.

#### *2.5. Participants*

Eighty undergraduate industrial engineering third year students (44 females, 36 males, mean age = 26, SD = 1.4) participated in the study. All students had experience with both computers and robots. Participation was voluntary and every participant received compensation in the form of a bonus point contributing to a credit in an academic course. The participants completed a preliminary questionnaire which included demographics questions for the participants and the negative attitudes towards robots scale (NARS) [43].

The NARS results revealed that 21.06% of the participants had a negative attitude towards situations and interactions with robots while 63.65% were neutral about it. 26.58% had highly negative attitudes towards the social influence of robots, 47.61% had a low attitude and 25.81% were neutral about it. 65.82% had a highly negative attitude towards the concept of robots having emotions, 8.87% were indifferent about it while 25.31% had a low negative attitude towards it.

#### *2.6. Experimental Procedure*

Explanation was provided to the participants noting the robot would operate differently in the two trials. To avoid bias, the details of each trial in terms of LOA was not explained to them. They were told that a post-trial and final questionnaire will be provided to express their observations, assessments, and preferences. Then, the participant

experienced two experimental trials in which they collaborated with the robot to assemble the configuration that appeared during the GUI in a specific LOA (high/low) in random order. After each trial, they completed a post-trial questionnaire regarding their experience with the robot. At the end of the two trials, each participant completed a final questionnaire where they indicated their preferred level of automation. The experimental design and protocol were approved by the departmental ethical committee.

#### *2.7. Dependent Variables*

#### 2.7.1. Objective Measures

**Effectiveness:** Accuracy of the robot during the task—calculated from the number of times the robot erred in bringing the cubes (e.g., failed to catch a cube, brought an incorrect cube). These are system errors to portray the context of a system whose performance may not be absolutely optimum at all times.

*Performance in the secondary task* was measured as the number of stages they passed in the secondary task (for the participants that experienced the higher workload).

**Efficiency:** Total time (in seconds) that it took the participant to complete the task for each trial. In the higher level of automation, the total time was constant since depended on robot motions only.

#### 2.7.2. Subjective Measures

The subjective measures were collected through questionnaires that included questions regarding the participants' experience with the robot. The post-trial questionnaire was prepared as a 5-point Likert scale ranging from "1 = strongly disagree" to "5 = strongly agree" through which participants were expected to express their experience and assessments. The questionnaire included NASA-TLX questions [17] to assess perceived workload in relation to the system efficiency. The raw NASA-TLX scores were added without the weights to provide an estimate of the overall workload (RTLX aggregation technique). The post-trial questionnaire also included questions from the technology acceptance model (TAM) to assess perceived ease of use [44]. The final questionnaire assessed **user preferences** regarding LOA modes and their perceptions as they collaborate with the robot at specific LOA modes.

#### 2.7.3. Constructs

The dependent variables were defined through two constructs: QoT execution and usability. These constructs were derived from the objective and subjective measures explained above (mapping is provided in Figure 6). They were adapted to the context of human–robot collaboration from the ISO 9241-151 guideline [40,45] as follows:

**Figure 6.** Mapping of the measures into constructs for assessment. (O—objective measures; S—subjective measures).

**Quality of task (QoT) execution**. The extent to which specific goals in a task are accomplished to a specified degree of accuracy for a specified time period [46]. This construct involves effectiveness and efficiency of the collaboration. Effectiveness of the collaboration was evaluated by the accuracy and completeness of the task which the human and robot cooperate to execute. The efficiency of the collaboration depends on resources such as time and human effort spent to achieve the required goal [47].

**Usability.** The extent to which the robotic system can be used to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use (adapted from [40]). This construct, in this study, is composed of effectiveness, efficiency in addition to satisfaction derived from the perceived ease of use, perceived workload and perceived reliability of the system. All these variables could affect the degree to which the human operator believes that working with the robot will be free of difficulty or great effort. This is an adaption from [44] in the information technology domain to the context of HRC. They constitute the user's perception regarding use of the system and is essential to ensure that the human can successfully team up with the robot to achieve such collaboration [35]. A negative user perception could lead to disuse of the support the robot can provide in the collaboration [48]. In the current study, the usability construct was comprised of the QoT measures, along with other user perceptions on ease of use, workload, and reliability.

#### *2.8. Analysis*

A generalized linear mixed model (GLMM) was applied to analyze the data with the LOA, and workload as independent variables. To combine variables for the constructs, multivariate analyses of variance (MANOVA) was used. The analyses considered all the constituent variables within constructs and combined them into a composite variable. Tukey's honestly significant difference (Tukey's HSD) test were used as the post-hoc test for multiple comparison. The tests were designed as two-tailed with a significance level of 0.05. The items in the user preferences questionnaire were analyzed using ANOVA to assess the effect of workload on their preferences for the LOA mode they experienced.

#### **3. Results**

Results of the assessments using the constructs (QoT execution and usability), details of the user preference regarding the LOA modes and a comparison within the workload groups are presented below.

#### *3.1. QoT Execution*

The interaction of LOA and workload had significant effect (F (3, 152) = 5.198, *p* = 0.002) on the QoT execution. The QoT execution was higher at the high LOA when the workload was low compared to other LOA-workload combinations, confirming H1. LOA (F (3, 150) = 45.15, *p* < 0.001) and workload (F (3, 152) = 18.725, *p* < 0.001) were also significant as main effects on the QoT execution. The high LOA produced better QoT execution compared to the low LOA. Best results were obtained for low workload as expected. When the workload is high, the high LOA also produced a better QoT execution compared to the low LOA. Details of the constituent variables in the QoT execution (effectiveness and efficiency) are presented below:

#### 3.1.1. Effectiveness

The interaction of LOA and workload did not have a significant effect on accuracy (F (3, 152) = 0.512, *p* = 0.675) and neither did the LOA (F (1, 152) = 1.024, *p* = 0.313) and workload (F (3, 152) = 0.376, *p* = 0.77) as main effects. Workload level however, had a significant effect on the performance in the secondary task (F (1, 32) = 4.23, *p* < 0.001) with MWL2 (M = 2.02, SD = 1.239) resulting in better performance compared to HWL (M = 1.93, SD = 1.047). All of the participants who did the secondary task finished the first stage of the game. The majority (71/80) reached the second stage of the game, 56/80 reached the third stage while only 10/80 reached the fourth stage.

#### 3.1.2. Efficiency

The interaction of LOA and workload had a significant effect on completion time (F (3, 152) = 4.838, *p* = 0.003). At high LOA and LWL, participants completed the task at shorter time compared to the other combinations. LOA also had significant effect on the completion time (F (1, 152) = 136.565, *p* < 0.001) with the high LOA (M = 87.3, SD = 0) having lower completion time compared to the low LOA (M = 107.945, SD = 16.547) as expected, even though the users had the option to stop the robot's operation at any point in the high LOA mode, thereby increasing the completion time. Workload also had significant effect on the completion time (F (3, 152) = 4.838, *p* = 0.004) with the LWL (M = 94.62, SD = 9.028) having less completion time compared to the HWL (M = 103.158, SD = 23.924). Higher task complexity (MWL1, M = 96.449, SD = 12.766) resulted in less completion time compared to the workload caused by the secondary task (MWL2, M = 96.595, SD = 11.241).
