*3.1. Manipulation Check*

Regarding the difficulty of the conversation, no participant in either condition marked under 4 for the two questions (QD-1 and QD-2), which was interpreted as indicating that the contents of the conversation were appropriately prepared so as not to be too difficult. Regarding the degree of engagement to the conversation, the median scores obtained from the participants attending the semi-passive conversation for four questions (QE-1, QE-2, QE-3, and QE-4) were not less than intermediate points, which was interpreted as indicating that the system could provide the experience of conversation with a moderate level of engagement.

#### *3.2. Recall Test for Hypothesis (I)*

We evaluated the difference between the semi-passive condition and the passive condition using a recall test and the questionnaire. Figure 6a shows the average number of questions that the participants correctly answered. Mann–Whitney's U test was used to compare the scores of the two conditions. The result shows that the median of the number of correct answers of the semi-passive condition (Mdn = 8) is significantly higher than that of the passive condition (Mdn = 7) (U = 54.5, *p* < 0.05).
