2.1.4. Androids' Behavior Control Module

The Androids' Behavior Control Module controls the actuators of androids to express various behavior such as emotional expression, mouth movement with synchronized voice, and motion to make an android look at people or the other android. The behavior of the android is described in a motion file, which contains the positions of each joint at 50 milliseconds intervals called frames. The position of a joint is set using a value from 0 to 255. Androids execute idling behaviors, which are

minimal behaviors like blinking, as well as breathing, which is expressed by slight movements of the shoulders, waist, and neck. Moreover, an android can express these behaviors at any time by executing prepared motion files. Some motions for which preparation is difficult, like looking at the person's face, are realized using sensors. When looking at a face, the face position is detected in three-dimensional space using Kinect for Windows v2 by Microsoft, and is associated with the three axes of the neck of the android. Values of these three axes are transformed from the face position using a projection matrix calculated via calibration using the least squares method with 16 pairs of position of the axes of the neck and the position of the detected face. Regarding lip motion synchronized with voice, the motion is generated using formant information included in the voice [16].

#### 2.1.5. Scenario Manager Module

The scenario manager module controls the conversation and behavior of two androids by following a prepared script. In the script, the details of a dialogue, such as speech contents, the behavior of the two androids, and the timings of speech recognition, are specified, and the module sends orders to the other modules sequentially. Moreover, the script also has templates to be used as a response of an android to utterances from people at every scene, for which an android asks people something and the person may utter. Androids encourage the person to utter by asking him/her a yes-or-no question. When the person utters something, the module identifies the utterance as positive or negative, and the androids generate a response using a template for each category. If the person has not spoken anything or the speech recognition module fails to recognize speech, the module selects a template for an ambiguous response.

The module has a classifier to categorize the utterance of the person and learns it from training data, which is constructed using pairs of a recognized text of the utterance of people and a label of it. After morphological analysis of training data, vectors of bag of words (BOW) are created. Incidentally, stop words included in SlothLib [17], which is a programming library, are removed from the data. Also, high-frequency words included in the top 0.9% of the data and low-frequency words appearing only two times are removed. Additionally, to reduce the amount of sampling used for categorization, the dimension of BOW vectors is diminished until reaching the number of classes by using Latent Semantic Indexing (LSI). The module learns support vector machine (SVM) using BOW vectors and labels.

Morphological analysis is performed using JUMAN [18] which is a Japanese morpheme analysis program. For SVM, Scikit-Learn [19] is used, which is an open source machine learning library. Gemsim [20] is used for creating vectors of bag of words, and is a topic model library in Python.

After selecting a template, the module creates a text response by inputting words included in the person's utterance. What is extracted from the person's utterance are words with polarity. To extract polarity words, Japanese Sentiment Polarity Dictionaries (Volume of Verbs and Adjectives) ver. 1.0, created by Inui-Okazaki laboratory at Tohoku University, is used.

#### *2.2. Experiment*

This section describes a subject experiment between participants conducted to reveal the effectiveness of semi-passive social androids as communication media to convey objective information and subjective information. The semi-passive social androids tried to engage the participants by repeatedly giving directions to them in the conversation. This was expected to help the participants to remain engaged in and concentrating on the conversation. This engagement and concentration were expected to help the participants to recall the message in the conversation. Meanwhile, it was clear the messages when the messages were being directed to the participants because the android uttered toward the participant, and the participants were expected to feel the stronger will of the androids conveying them, and as a result, the messages were expected to be recognized as stronger. Accordingly, with respect to these expected effects of the semi-passive social androids, we examined three hypotheses in this experiment: (i) Participants will recall more objective information given in the semi-passive social conversation of two androids than in the passive social one. (ii) Participants

will feel the subjective messages as being stronger in the semi-passive social conversation than in the passive social conversation. (iii) Participants will be moved to follow the subjective messages in the semi-passive social conversation more than in the passive social conversation. In addition to the three hypotheses, two other points, namely the degree of participant engagement with the conversation and the difficulty of the conversation, were surveyed to confirm that the system was able to run as intended and the experiment was conducted as expected.
