*7.1. Word Error Rate*

Figure 7 shows the averages of the WERs. The total average of the WERs was 0.778 (SD = 0.144). The average of the WERs of the one-robot scenario was 0.789 (SD = 0.123), and that of the two-robot scenario was 0.768 (SD = 0.163). There was no significant difference between the scenarios (U = 62.0, *p* = 0.608, Cohen's d = 0.148).

**Figure 7.** The averages of the word error rate (WER) in each scenario.

#### *7.2. Dialogue Time*

Figure 8 shows the averages of the dialogue time. The total average of the dialogue time was 12 min 51 s (SD = 4 min 52 s). The average of the dialogue time of one-robot scenario was 11 min 30 s (SD = 5 min 18 s), and that of two-robot scenario was 14 min (SD = 4 min 20 s). Here, milliseconds were round off. There was no significant difference between the scenarios (U = 48.0, *p* = 0.188, Cohen's d = −0.519).

**Figure 8.** Averages of the dialogue time in each scenario.

#### *7.3. Participant Utterance Time*

Figure 9 shows the averages of the participant utterance time. The total average of the participant utterance time was 3 min 31 s (SD = 3 min 14 s). The average of the participant utterance time of one-robot scenario was 3 min 5 s (SD = 3 min 21 s), and that of two-robot scenario was 3 min 53 s (SD = 3 min 13 s). There was no significant difference between the scenarios (U = 61.0, *p* = 0.569, Cohen's d = −0.242).

**Figure 9.** Averages of the user utterance time in each scenario.

#### *7.4. Participant Subjective Impression*

Figure 10 shows the results of the question to the participants, which was whether the participants have felt something strange in a dialogue. The numbers of the participants who answered "Yes", "No", and nothing were 3 (13%), 17 (71%), and 4 (17%), respectively in total. Those numbers were 1 (9%), 8 (73%), and 2 (18%) in one-robot scenario, and were 2 (15%), 9 (69%), and 2 (15%) in two-robot scenario, respectively.

**Figure 10.** Results of the question to the participants in each scenario.

#### *7.5. Caregiver Subjective Impression*

Figure 11 shows the averages of the scores of the question to the caregivers, which is whether the participants had talked with the robot more positively than usual. The total average of the scores was 4.92 (SD = 1.89). The average of the scores of the one-robot scenario was 4.55 (SD = 1.86), and that of two-robot scenario was 5.23 (SD = 1.92). There was no significant difference between the scenarios (U = 61.5, *p* = 0.568, Cohen's d = −0.186).

**Figure 11.** Averages of the scores of the question to the caregivers.
