*3.5. Questionnaire Design*

The measurement items of the proposed questionnaire (22 in total) are summarized in Table 2 by each construct: perceived comprehensibility (eight items), perceived manipulability (six items), perceived enjoyment (four items), and perceived usefulness (four items). The participants had absolute freedom to fill the questionnaire to their own discretion.

The questionnaire was divided into three parts:



#### **Table 2.** Questionnaire items.

All the questions used a 7-point Likert scale, ranging from 1—"strongly disagree", to 7—"strongly agree". Depending on the time availability, users were asked to fill in the questionnaire on site, or instead, access a survey link sent to their e-mail to be completed at a later time. Out of the 63 questionnaires collected, 21 were completed online and 2 incomplete questionnaires were eliminated. A total of 61 questionnaires were used for this study (96.82%).

The online survey questionnaire was developed in English, and the paper-based questionnaire was translated into Romanian and Italian for participants who had difficulties understanding English.

### **4. Results**

Responses were put together to obtain a value for each of the four constructs defined by the questionnaire: comprehensibility, manipulability, enjoyment, and usefulness. The questionnaire contained both positive and negative worded items. For the negatively formulated questions, we first reversed the results in order to have the same scale. We chose to alternate positive and negative items. In this way, the user may be more careful when completing the questionnaire, and the answer was, thus, more relevant for the survey. After that, the scores were converted to a range of 0 to 6. All the values were then summed, and the sum was mapped to a range of 0 to 100. This method of data aggregation was done according to the instructions from the original HARUS questionnaire assessment [54].

Figures A1–A3 from Appendix A show the obtained scores for each of the questions for the three experiments, represented as box charts. In each figure, mean, median, interquartile values, whiskers, and outliers were reported.

For validating our measurement model, we evaluated reliability and sampling adequacy. Reliability was assessed by calculating Cronbach's α, and the proportion of variance was assessed by the Kaiser-Meyer-Olkin index. Each of the measures exceeded the recommended threshold (Cronbach's alpha = 0.902, Kaiser-Meyer-Olkin = 0.796).

When participants were asked: "Do you have any knowledge of who Ovid is?", the majority of them answered "Yes" from all three locations. Results showed that Ovid was known by 95% of the participants in Sulmona, 91% of the participants from Rome admitted to have heard of Ovid, while only 73% of the participants in Constanta heard of him. The values obtained for each question for all the users are displayed in Figure 7.

**Figure 7.** Boxplots showing the outcomes for the questions, grouped into the four constructs.

The values for each construct—comprehensibility (mean = 91.06; SD = 4.46), manipulability (mean = 93.81; SD = 2.66), enjoyment (mean = 95.88; SD = 4.11) and usefulness (mean = 87.57; SD = 2.61)—are reported in Figure 8, separately for each experiment, and overall. The vertical axis is displayed from 80 to 105 to make the results visible.

At the end of the experiment, some participants showed interest in learning more about AR and how it could change the way people explore and learn new things about heritage and education.

**Figure 8.** Boxplot showing the results for the four constructs.
