3.2. Evaluation of Teachable Agent Levels
We implemented and evaluated the effectiveness of the proposed leveled interactive teachable agent in enabling leveled dialogue between learners and agents. This evaluation consisted of the following three aspects.
First, we evaluated whether the agent could have a leveled dialogue with the learner based on the learner’s proficiency level. For this purpose, we evaluated whether the agent’s answers to user questions at each level were at the same level, using the level conversability scale and level satisfaction scale in
Table 2.
Second, we evaluated the appropriateness of the flow of the conversation, which included topics such as self-introductions, weather, soccer, weekend plans, and birthday parties. We evaluated whether the agent answered the learners’ questions and continued the conversation. The appropriateness of the conversational flow metric was evaluated using a 5-point Likert scale to determine whether the flow was relevant to the topic, as listed in
Table 2.
Third, we evaluated the agent’s effectiveness in helping the learners improve their English skills. Based on an expert’s experience, we assessed whether the agent’s teaching and leveled dialogue features helped learners improve their English skills. Learning effectiveness was evaluated on a 5-point Likert scale to determine whether the agent helped learners improve their English-speaking skills, as listed in
Table 2.
We conducted an evaluation, from 23 to 24 February 2023, to verify the effectiveness of the interactive teachable agent by level. The evaluation involved five English experts who, on average, had five years of teaching experience. We first explained and demonstrated the agent to the users, and then presented the experts with a set of English assessment questions for each level. The evaluation questions were organized according to four levels, with two dialogues for each level, for a total of eight questions. After viewing the agent’s answers, the experts rated the level of each answer. The language proficiency level was classified from 1 to 4, as described in the list below.
Level 1—Conversation Assessment Questions(see
Table 3).
This level comprises basic English expressions and words for simple greenery, introductions, numbers, colors, dates, times, and so forth, at an understandable level.
Level 2—Conversation Assessment Questions(see
Table 4).
These questions were designed to focus on everyday conversations about family, hobbies, food, shopping, travel, and so forth.
Level 3—Conversation Assessment Questions(see
Table 5).
These questions were designed to focus on conversations about common topics in everyday life, such as the weather, environment, health, and culture.
Level 4—Conversation Assessment Questions(see
Table 6).
This level represents the ability to communicate on various topics and appropriately understand and use grammatical details, such as modal verbs, nouns, and pronouns.
The questions were designed to help one understand and engage in simple, everyday conversations about family, hobbies, food, shopping, travel, etc.
The questions were designed to help one understand and engage in conversations about topics common in everyday life, such as weather, environment, health, and culture.
One can communicate on various topics and appropriately understand and use grammatical details, such as modal verbs, nouns, and pronouns.
3.3. Evaluation Results
We evaluated the functionality and user satisfaction of a conversational agent that played a teaching role. The agent’s functionality was evaluated by analyzing its answers at the learner’s conversation level. Satisfaction was assessed based on the appropriateness of the flow of conversation, the level of the agent’s answers, and the effectiveness of the learning agent when used by the students in a classroom setting. Specifically, we evaluated the following.
First,
Figure 12 presents the experts’ evaluations of the agent’s responses to the learner’s level of dialogue, which shows that for level 1 questions, the agent’s answer was measured as level 1 (SD 0), whereas for level 2 questions, it was recorded as level 2.2 (SD 0.4), and for level 3, it was recorded as level 3.1 (SD 0.3); for level 4, the agent’s answers were evaluated as level 4.4 (SD 0.5). These responses show that the leveled agent system responded at the same level as the user’s question, but as the conversation level increased, the agent’s responses exhibited a somewhat higher level of complexity than those of the users.
Second, we present the results of evaluating the adequacy of the dialogue flow of the agent system. As shown in
Figure 13, for level 1, the system maintained the dialogue flow with a mean of 5 (SD 0.0), for level 2, it maintained the dialogue flow with a mean of 4.9 (SD 0.3), for level 3, it maintained the dialogue flow with a mean of 4.8 (SD 0.4), and for level 4, it maintained the dialogue flow with a mean of 4.3 (SD 0.5). The lowest mean score was for level 4, where the agent’s answer level was higher than that of the users’ questions.
Third, the results for learning effectiveness assessed whether the conversational agent system assisted learners in enhancing their English proficiency. The results of the evaluation are shown in
Figure 14, with a mean of 4.9 (SD 0.3) for levels 1 and 2, a mean of 4.8 (SD 0.4) for level 3, and a mean of 4.4 (SD 0.5) for level 4. This result also shows that the lowest mean score is for level 4, which was higher than the level of the users’ questions.
The agent performed well overall because the average score was above 4.3 at all levels. The differences between the average scores at each level were not significant, so we can say that the agent system performed consistently regardless of the level. The overall evaluation results suggest that the agent was more effective at the lower levels.
3.4. Discussion
By applying a leveled interactive teachable agent, we investigated the applicability of personalized conversational agents to English language learning and studied the implementation of voice chatbots at different levels. We designed an agent based on grammar, memory, and context rather than concepts and causality, and implemented the agent to respond to the learner’s level. The leveled interactive teachable agent is a method in which the agent learns the data taught by the learner and responds to each situation based on the learned data by level.
The teachable agent voice chatbot improves the agent’s speaking skills, such as asking and answering questions as the learner’s skills improve. To create leveled conversation data, we collected a dataset of children’s English conversations and set the conversation level using the existing ARI index model. For learner-level evaluation, we acquired learner-utterance data for words or sentences through an interactive interface and generated learner-specific agents through pronunciation and sentence evaluation algorithms for the acquired learner-utterance data. Our results confirmed that teaching and learning with a leveled agent can motivate beginner-level learners to use the system continuously, with a positive impact on their learning progress. In the future, the effectiveness of the proposed system should be evaluated in comparison with AI voice chatbots used on existing English education sites. Issues related to the difficulty of conversational dialogues of level-specific teachable agents should be improved in future studies.