1. Introduction
With the rise of Internet technology, many educational institutions are gradually turning their attention to the application of digital education. Various online learning methods enable people to make good use of their spare time to learn, greatly enhancing traditional learning efficiency. Technologies integrating various learning approaches such as pragmatic, context, or cooperated learning have shown great success in language learning [
1,
2,
3]. In the context of digital language learning, speaking is considered to be one of the most important parts of learning a foreign language [
4]. To address this problem, social interaction is a key factor in improving language fluency in language learning. However, the cost of creating a social language-learning environment is too high to be widely implemented [
5,
6]. Researchers seek opportunities to adopt computer-aided technologies to create an environment similar to speaking with native English speakers. With the popularity of computer-assisted language learning (CALL) and the advancement of the Internet, new methods in the field of language learning are booming [
7,
8,
9].
Dialogue practice has become increasingly important in computer-aided language (CAI) learning especially for foreign languages [
10]. As hardware and natural language processing progress with the times, dialogue training can be accomplished through computer technologies [
11,
12,
13]. Language learning places special emphasis on the training of communication skills. This study adopted task-based language learning as the fundamental curriculum design and used a task-based dialogue system to provide the interface. Dialogue practice thus can be carried out through task-based learning, and at the same time, the learning process can be conducted by the computer-aided dialogue system driven by the language learning curriculum.
Educators have promoted task-based learning for language learning [
14,
15]. The original concept of task-based learning was developed by Prabhu [
16], who proposed a task-based leaning model to enable learners to learn a language in the process of solving a “non-linguistic task” and focus on learners in the process of performing tasks. The application of cognitive techniques such as received language messages and processing is proven to be as effective as traditional teaching methods [
17]. Prabhu [
16] pointed out three different kinds of activities for task-based learning:
Information-gap activity: Allow learners to exchange information to fill up the information gap. Learners can communicate with each other using the target language to ask questions and solve problems.
Opinion-gap activity: Learners express their feelings, ideas, and personal preferences to complete the task. In addition to interacting with each other, teachers can add personal tasks to the theme to stimulate the learners’ potential.
Reasoning-gap activity: Students conclude new information through reasoning by using the existing information, for example, deriving the meaning of the dialogue topic or the implied association in the sentence from the dialogue process.
Willis outlined the teaching process of task-based language teaching as three stages: pre-task stage, task cycle stage, and language focus stage [
18,
19]. Stage activities can be used to construct a complete language learning process. The pre-task stage pre-approves the learner’s task instructions and provide the student with clear instructions on what must be done during the task phase [
19]. This helps students review the language skills associated with the task. Through the execution of task activities, the teacher can judge the students’ learning status on the topic. At the task cycle stage, students use the words and grammar they learned during the task preparation phase and think about how to complete the tasks and presentations. In this process, the teacher plays the role of supervisor, giving appropriate guidance and language-related resources. In the last stage, the language focus stage, students and teachers review related issues encountered during the previous phase, such as the use of words, grammar, or sentence structure. The teacher guides the students to practice the analyzed results and improve their language comprehension.
The efficiency and crucial factors of task-based language learning have been surveyed by different aspects of studies. Research shows a significant improvement of speaking comprehension [
20,
21,
22]. Rabbanifar and Mall-Amiri indicate that the reasoning-gap activity holds the key factor for speaking complexity and accuracy [
21].
The present study adopted the three-stage-model shown in
Figure 1 to develop the task-based dialogue system [
16]. In the pre-task stage, the system needs to present the task and let students clearly understand the goals to accomplish throughout the conversation. In the task cycle, the system needs to interact with students and guide students to finish the task. For the language focus stage, the system needs to be able to evaluate the performance of the students and give the proper feedback.
The task-based dialogue system usually has a very clear task, such as helping users order meals or learning languages [
23]. This dialogue robot contains basic modules including Dialogue Script, Dialogue Manager, Natural Language Understanding, and Natural Language Generation. As shown in
Figure 2 [
24], the widely used method of the task-based dialogue system is to treat the dialogue response as a pipeline. The system must first understand the information conveyed by humans and identify it as an internal system. According to the state of the conversation, the system generates the corresponding reply behavior and finally converts these actions into the expression of natural language. Although this language understanding is usually handled by statistical models, most of the established dialogue systems still use manual features or manually defined rules for identifying state and action representations, semantic detection, and problem filling [
24].
Implementing a dialogue system for language learning has been carried out by using different algorithms years ago [
25,
26,
27]. From the statistic model to pattern recognition, the applications have become more practical and widely developed with the advancement of text mining and natural language processing technologies [
25]. Several advantages have been addressed using dialogue system for language learning. The language-learning dialogue system is considered fun and easy to approach for students [
25,
26]. In addition, the dialogue system is easily integrated with teaching methods such as grammar check and repetition [
25]. Except when carrying out the task, the proposed dialogue system needs to focus more on language learning. Functions regarding speaking comprehension need to be considered and developed in the system.
In recent years, hardware and software technologies have grown rapidly. The media attention toward artificial intelligence and machine learning continues to rise. The development of these technologies makes it possible for applications using machine learning and human–computer interaction to process large amounts of data storage and massive calculations. Many researchers have turned to applications with natural language processing [
28,
29,
30]. Natural language processing is the communication channel between human and machine. It is also one of the most difficult problems in computer science, whether it is to achieve natural language understanding or natural language interaction. However, applications of natural language processing have been proposed in different fields, such as machine translation, dialogue robots, data retrieval, and abstract generation. Among those applications, the task-oriented robot shows the capability of solving the special purpose problems. For example, food-ordering robots in restaurants or customer service robots are general applications using a task-oriented dialogue robot. In education, computer-assisted teaching robots can help learners’ oral fluency and build self-confidence for speaking foreign languages.
The decision-making technology of the dialogue system (chatbot) has gradually matured, an example being the Siri artificial intelligence assistant software in Apple’s iOS system [
31,
32]. Through natural language processing technology, people can use dialogue to smoothly interact with mobile devices, such as querying weather, making phone calls, and setting up to-do items. [
33,
34,
35]. The use of the dialogue system is quite extensive. In terms of the fast-growing chat bots in recent years, in order to allow customers to get instant response from enterprises, many companies have invested resources into building dedicated dialogue robots to save labor costs [
34,
35]. The chat bot is based on the dialogue system, so it is necessary to simulate human dialogue. In addition, the dialogue has to have meaningful purpose. It still remains a challenge for today’s chat bots to understand all kinds of questions and responses correctly, since human languages are ambiguous to a degree. Dialogue training still heavily depends on human communication with instant feedback or correction [
32,
36]. However, it is not possible to provide a personal tutor for every English learner.
Therefore, this study involved the development of a task-based dialogue system that combines task-based language teaching with a dialogue robot. The proposed task-based dialogue system contains functions to carry out the conversational task including natural language understanding, disassembly intention, and dialogue state tracking. The research objectives were as follows:
Development of a task-based dialogue system that is able to conduct a task-oriented conversation and evaluate students’ performance after the conversation;
Comparison of the differences between the proposed system and the traditional methods;
Evaluation of the effectiveness of the proposed system.
The first step of this study was to survey the related studies on task-based learning methodology and task-based dialogue systems to establish the fundamental curriculum design and interfaces of the system.
Section 2 proposes a novel framework of a task-based dialogue-learning model.
Section 3 elaborates on the experiment and the results. Finally,
Section 4 concludes the results and discusses limitations and future works.
3. Results
To address the research objectives, each conversation was recorded and evaluated by an English teacher. The score was compared to the different scoring mechanisms provided by the system to check the accuracy of the scoring system. In addition, a questionnaire was distributed after the experiment to evaluate the efficiency of the dialogue system.
The study collected a total of 636 records and 51 complete task data. Data for each complete task were evaluated using five scoring criteria by the task-based dialogue system and the same teacher who taught this class. The “correct” score refers to the score given by the teacher. The teacher was able to evaluate all the data recorded by the system while students were performing the tasks. The task dialogue can be re-produced based on the system records so that the teacher can evaluate the students’ performance and give a score similar to that of face-to-face scoring criteria. Different criteria from the system were combined and tested to obtain accurate prediction. The criteria included pause time, answer time, number of errors, the number of repetitions, and the number of hints (reminders). Based on the suggestion from the teacher, the teacher judged students’ performance based on the pause time after the question was asked. The number of incorrect responses also reflects the comprehension of the given dialogue. The number of repetitions and number of hints are also possible criteria suggested by the English teacher. The system recorded those criteria and used them to train a model to predict the “correct” score given by the teacher. Three different methods were tested in this experiment. The first method was a rule-based evaluation method. The rating rule was based on point-deduction rules given by the teachers. The number of errors, the number of repetitions, and the number of hints were considered for this method. Points were deducted whenever the rule is triggered. The deducted point for each rule was also suggested by the teacher. The second and the third methods used machine-learning algorithms to predict the scores. A multilayer feed-forward neural network was used to train and predict the score with different criteria as the input data and the final score as the output. The second method used neural network prediction taking the same criteria from the first method as input data, namely number of errors, the number of repetitions, and the number of hints. The third method was also a neural network approach considering all the five criteria recorded by the system, namely pause time, answer time, the number of errors, the number of repetitions, and the number of hints (reminders). The prediction models of the neural network methods were trained using the corresponding criteria and expected scores given by the teacher. The system uses the M5P algorithm to predict the nonlinear model. M5P is a machine-learning algorithm published by Yong Wang and Ian H. Witten in 1997 [
38]. In practice, most of the prediction targets (classes) to be predicted by many machine learning research problems are continuous values, but only a few machine learning methods can handle continuous numerical predictions. M5P is one of the machine-learning algorithms that is able to predict the continuous value. Training involves 10-fold cross-validation. The 10-fold cross-validation is used to test the accuracy of the algorithm. The validation divides the data set into 10 parts, and takes turns using nine parts as training data and one part as test data for testing. Each test obtains the corresponding correct rate (or error rate). The average of the correct rate (or error rate) of the results of 10 repetitions is used as an estimate of the accuracy of the algorithm. Generally, it is necessary to perform multiple 10-fold cross-validation (for example, 10 times 10-fold cross-validation) and then find the average value as the algorithm accuracy rate. Based on 10-fold cross validation, 90% of the data were used as training data, and the remaining 10% were used as testing data.
Figure 11 shows the predicted results of the different system methods. The X-axis shows the completed 51 tasks. The Y-axis shows the corresponding scores for each task given by three different automatic grading methods and manually by the classroom teacher. The detailed scores can be found in
Appendix D. As shown in the figure, three different methods all gave an evaluation close to the teacher’s evaluation.
Table 3 shows the error estimation among three different methods, namely system rating with point-deduction rules, the machine-learning prediction model with three features, and the machine-learning prediction model with five features. For the predicted score p
i and the correct score t
i given by the teacher, root mean squared error and mean absolute error were measured based on Formulas (2) and (3). Machine learning prediction using five criteria shows the closest evaluation to the expected scores. This shows that the pause time and the answer time are crucial factors while the teacher is rating the students’ conversations.
Right after the chatting experiment, participants were requested to fill out the online survey with 12 statements. The 12 statements were designed based on a five-point Likert scale measuring three aspects: (1) participants’ perception of the user interface, (2) participants’ perception of the chatting process compared to traditional instruction, and (3) participants’ perception of the overall effectiveness of the system.
Table 4 shows the results of the survey. The averaged score (AVG) was calculated based on three aspects. Each aspect was evaluated by four sections of the questionnaire, as shown in
Appendix B. One point was scored when the strongly disagree option was given. Five points were given when the strongly agree option was given. The result of the questionnaire is shown in
Appendix C. The average score of four sections was calculated to represent the perspective of the participants of the corresponding aspect.
Based on the results shown in
Table 4, even though participants showed less agreement on the user interface (<3.5), they agreed that using the system to practice English conversation is better than traditional conversation practice (>3.5), and the system (including composing dialogue and practicing dialogue) is effective in general (>3.5).
The results for the first section of the questionnaire (The user interface is simple and easy to use) indicate that most participants consider the platform to be clearly designed and easy to use. However, many students were not satisfied (2.72) with the recognition accuracy rate of the speech-to-text software (Q4: The speech-to-text recognition is accurate). Based on an unofficial interview with the instructor, many students became frustrated when the machine replied that their answers cannot be recognized (because of the pronunciation, accent, or not using the pre-designed words or phrases). Once the instructor reminded the students to use only the words or phrases that were taught or focused, all the students successfully completed the three tasks. During the process, however, some students still experienced the issue that their speech could not be recognized smoothly. For example, a couple of students kept saying “Two nights”, but the system showed “tonight” in the chat room. Therefore, the speech-to-text function will be modified accordingly in order to increase the accuracy rate.
Figure 12 shows the overall results of the online survey. The blue line “UI” represents participants’ perception of the user interface. The red line “V.S traditional” represents participants’ perception of the chatting process compared to traditional instruction. The green line “Effectiveness” represents participants’ perception of the overall effectiveness of the system. The X-axis indicates the corresponding section of the questionnaire. The Y-axis shows the average score for each section.
As shown in
Figure 12, the participants responded positively regarding the overall effectiveness of the system and the chatting process compared to traditional instruction. Regarding the user interface, since the students encountered unexpected problems with the speech-to-text recognition software, and they still tended to reply with simple phrases instead of complete sentences, students did not respond with high satisfaction. All in all, however, students still expressed above average satisfaction with the conversation process and the system. They believed that computer-assisted learning environment did improve their learning motivation. Basically, they considered that the overall system design is effective for English language learners to practice speaking, and they will continue to use the system.
4. Conclusions
This study analyzed a task-oriented ”English conversation“ learning system. The system simulates professional English teachers to establish a grammar and sentence scoring mechanism. A task-based dialogue framework was proposed, and a preliminary system was developed to test the effectiveness of the proposed framework. The system was used in a college-level English speaking class to test the perceptions toward the system regarding user interface, learning style, and the system effectiveness. This research collected data to evaluate the possibility of replacing the traditional English speaking practice with the proposed system. During the process of performing tasks, the proposed system records the details of the learner’s learning data. In addition to the grammar and vocabulary, it also includes the pause time of the dialogue and the number of repeated answers. The proposed task-based dialogue robot simulates the real life conversation. Based on the task-based language learning, students can learn the language by executing the conversational task assigned by the system. This study uses a pre-defined dialogue tree to describe the conversational task and a large quantity of Wikipedia Corpus data to train the natural language capability for the dialogue robot. Based on the collected students’ feedback, results confirm the positive perceptions toward the system regarding the learning style and the leaning outcomes. The system provides better semantic understanding and more accurate task-based conversation control.
Compared to the traditional learning method, the system in this study conducts assessment automatically and analyzes learning status. Using the proposed framework, the dialogue is recorded, accessed, and compared to the regular conversation evaluation. The score is given by the auto-scoring module in the dialogue system. Three auto-grading methods were tested in this research. The dialogue system recorded the criteria suggested by teachers and used them to train a model to predict the “correct” score given by the teacher. Coherent grading using these evaluation methods was expected. In addition, the results of the questionnaire show effective learning using the task-based dialogue system. The qualitative feedback from students also provides the evidence of ease of use, usefulness of repetitive practice, and instant response.