Enhancing Task-Oriented Dialogue Systems through Synchronous Multi-Party Interaction and Multi-Group Virtual Simulation
Abstract
:1. Introduction
2. Related Work
2.1. LLMs in Task-Oriented Dialogue Systems
2.2. Synchronous Multi-Party Chatbots
2.3. Multi-Group Virtual Simulations
3. Assistant Bot
3.1. Language Understanding and Generation
3.2. Shared Database
3.3. Proactive Engagement
3.4. Administrative Tasks
3.4.1. Names and Hub
3.4.2. Appointments
3.4.3. Messages
3.4.4. Groups
4. Multi-Group Virtual Simulator
4.1. Preprocessing
4.2. Initialization
4.3. Interactions
4.4. Termination
5. Experiments
5.1. Virtual Simulation
5.2. Human Simulation
5.3. Evaluation Metrics
- NLU: The failure rate due to NLU shortcomings (in %).
- STD: The failure rate due to inherent limitations in the STDM framework, hindering it from exiting the current dialogue state until it is fully resolved, even when the user requests another task (in %).
- SUC: The overall success rate; 100 − (NLU + STD).
- SWS: The success rate without accounting for STD, used to evaluate system performance excluding fails due to limitations of the STDM framework; 100 × SUC/(100 − STD).
- INT: The error rate resulting from inaccurate intent classification by GPT (in %).
- DST: The error rate resulting from inaccurate dialogue state tracking by GPT (in %).
- STD: The error rate due to the STDM limitations, leading to NLU failures (in %).
- SUC: The overall success rate; 100 − (INT + DST + STD).
- SWS: The success rate without considering STD; 100 × SUC/(100 −STD).
5.4. Results
6. Discussion
6.1. Error Analysis on the Assistant Bot
6.1.1. STDM Limitations
6.1.2. NLU Errors in Dialogue State Tracking (DST)
6.1.3. NLU Errors in Intent Classification (IC)
6.2. Error Analysis on the Virtual Simulation
6.3. Limitations
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix A.1. Virtual Simulation Prompts and Personas
Name | Class | Personality | Tendencies |
---|---|---|---|
Joseph Cho | CS | enthusiastic | organized and plans material meticulously |
Amanda Johnson | PSY | approachable | interactive and encourages participation |
Michael Brown | HIS | analytical | enjoys solving complex problems, often takes a methodical approach |
Olivia Adams | CS | creative | approaches problems with unconventional thinking |
Leila Ahmed | CS | creative | enjoys experimenting, thinks outside the box |
Mohammed Ali | CS, PSY | analytical | enjoys solving challenging problems |
Michael Brown | CS | analytical | enjoys solving complex problems, often takes a methodical approach |
Valerie Chen | CS, HIS | enthusiastic | organized and plans study schedules meticulously |
Diego Fernandez | CS | outgoing | prefers group study sessions, actively participates in class discussions |
Amelia Garcia | CS, PSY, HIS | independent | prefers to work alone, excels in self-paced learning |
Rohan Gupta | CS, PSY, HIS | analytical | approaches problems logically |
Mary Johnson | CS | introverted | enjoys individual projects, excels in structured assignments |
Fatima Khan | CS, PSY, HIS | creative | applies creative problem-solving techniques |
Daniel Lee | CS | team-oriented | works well in groups, enjoys collaborative projects |
Maria Lopez | CS, PSY, HIS | curious | likes exploring new concepts, often conducts independent research |
Rachel Martinez | CS | outgoing | works best in groups, enjoys discussing ideas with classmates |
Ethan Miller | CS | diligent | thrives in structured learning environments, maintains organized notes |
Yuki Nakamura | CS | focused | attentive in lectures |
Sophia Nguyen | CS | curious | explores additional resources beyond lecture material |
Aisha Patel | CS, PSY | reserved | prefers self-study, takes detailed notes during lectures |
Aaliyah Patel | CS, HIS | friendly | enjoys group discussions |
Nadia Ramirez | CS, PSY, HIS | organized | creates detailed study guides, follows a structured study routine |
Emily Rodriguez | CS | friendly | prefers group study sessions, good at explaining concepts to peers |
Isabella Rodriguez | CS, HIS | enthusiastic | eager to learn, actively participates in class discussions |
Elijah Scott | CS | persistent | consistently follows through on projects, enjoys mastering difficult concepts |
Alexandre Silva | CS | introspective | prefers self-paced learning |
Kumar Singh | CS, HIS | enthusiastic | eager learner, actively participates in group discussions |
Ben Smith | CS, PSY | easygoing | prefers studying in a group, collaborates on group projects |
Haruto Suzuki | CS, HIS | reserved | prefers studying alone, methodical in approach |
Sato Tanaka | CS, PSY | friendly | prefers collaborative learning |
Ryne Waters | CS | analytical | works well under pressure, completes assignments last minute |
Chloe White | CS | curious | enjoys exploring new topics |
James Wilson | CS | adventurous | enjoys hands-on projects and experimentation |
Appendix A.1.1. Student and Professor Response Generation Prompts
You have {num_notifs} notifications you MUST check through the bot first. After the bot APPROPRIATELY RESPONDS, you want to complete this task: {todo}.
You are talking to an assistant chatbot. Its functionalities are ONLY as follows:
- makes appointments with other students or professor
- sends messages to people (anonymized or not anonymized) to other students or professor
- create and delete groups/classes with leaders and members
- leave a group if you are a MEMBER and in a GROUP
- check what groups you are in
- make announcements within groups if you are a LEADER and in a GROUP
You will be given a conversation history and the assistant chatbot’s last response. Using the information and the given identity and personality, respond IN CHARACTER and APPROPRIATELY to the assistant chatbot’s last response. Pay attention to what the bot is asking for and carefully consider your own tasks that you want to get done. Respond as if YOU are the one typing and asking for assistance; do NOT add role-playing details such as “John Doe: hi” or “user: hi” or “how can I assist you today?”. In addition, do NOT REPEAT yourself, such as repeating “Thank you” or “add all students”. If the conversation has ended, consider asking for any updates if you received notifications. If it is not, respond with something DIFFERENT. If the bot is repeating itself, CANCEL the current functionality the bot is in to return to the hub state.
AN EXAMPLE of how it should be responded is provided below:
user_prompt =
History:
user: hi
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: Before we begin, please provide your full name.
CORRECT RESPONSE:
John Smith
AN EXAMPLE of how it should be responded is provided below:
user_prompt =
History:
user: hi
bot: Before we begin, please provide your full name.
user: Gwen Stacy
bot: Thank you Gwen Stacy. Welcome! How should I call you?
user: Gwen
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: Nice to meet you, Gwen. Anything I can do for you, Gwen?
CORRECT RESPONSE:
any updates?
ANOTHER CORRECT RESPONSE:
actually call me Stacy
AN EXAMPLE of how it should be responded is provided below:
user_prompt =
user: Can you help me schedule a study group session with my classmates for this week?
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: Sure, I can help you schedule an appointment with someone. What is their full name?
CORRECT RESPONSE:
let’s see if David is free
ANOTHER CORRECT RESPONSE:
Never mind
AN EXAMPLE of how it should be responded is provided below:
user_prompt =
user: Call me Jasmine. Any updates?
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: Nice to meet you, Jasmine. So, how’s it going, Jasmine?
CORRECT RESPONSE:
so any mail
ANOTHER CORRECT RESPONSE:
I want to send a message to Dr Yin about a bug in the hw
AN EXAMPLE of how it should be responded is provided below:
user_prompt =
user: You can call me Fatima. Thank you.
bot: Nice to meet you, Fatima. So what’s up, Fatima?
user: I wanted to ask if there’s a way for me to find a classmate to work together on the Data Structures homework?
bot: You’re not in any groups right now, or you do not have any groups that you can request feedback from!!
user: Actually, I wanted to ask if there’s a way for me to find a classmate to work together on the Data Structures homework. Can you help me with that?
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: Do you also still want help with check the members in a group?
CORRECT RESPONSE:
no, it’s ok
AN EXAMPLE of how it should be responded is provided below:
bot: Understood, I’ll move on. Anything I can do for you, Joseph?
user: Any updates?
bot: There are no new messages in your inbox. Anything I can do for you, Joseph?
user: No, I’m all set for now. Thank you.
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: You have 1 message in your appointment inbox. Would you like to respond to it now?
CORRECT RESPONSE:
yes
History:
{history}
Using the conversation history provided above and the characterś personality and class details as context, respond IN CHARACTER TO the chatbotś last response below:
Chatbotś Last Response:
{bot_last_response}
You are {name}, a professor teaching the {sim_class} class. You are {personality}, with studying tendencies of {teaching_tendencies}. Online, you tend to {online_tendencies}.
Your students are {students}. The class history so far is as such: {class_history}. This week, the material covered was {current_material}.
You have {num_notifs} notifications you MUST check through the bot first. After the bot APPROPRIATELY RESPONDS, you want to complete this task: {todo}.
You are talking to an assistant chatbot. Its functionalities are ONLY as follows:
- makes appointments with other students or professor
- sends messages to people (anonymized or not anonymized) to other students or professor
- create and delete groups/classes with leaders and members
- leave a group if you are a MEMBER and in a GROUP
- check what groups you are in
- make announcements within groups if you are a LEADER and in a GROUP
You will be given a conversation history and the assistant chatbot’s last response. Using the information and the given identity and personality, respond IN CHARACTER and APPROPRIATELY to the assistant chatbot’s last response. Pay attention to what the bot is asking for and carefully consider your own tasks that you want to get done. Respond as if YOU are the one typing and asking for assistance; do NOT add role-playing details such as “John Doe: hi” or “user: hi” or “how can I assist you today?”. In addition, do NOT REPEAT yourself, such as repeating “Thank you” or “add all students”. If the conversation has ended, consider asking for any updates if you received notifications. If it is not, respond with something DIFFERENT. If the bot is repeating itself, CANCEL the current functionality the bot is in to return to the hub state.
AN EXAMPLE of how it should be responded is provided below:
user_prompt =
History:
user: hi
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: Before we begin, please provide your full name.
CORRECT RESPONSE:
John Smith
AN EXAMPLE of how it should be responded is provided below:
user_prompt =
History:
user: hi
bot: Before we begin, please provide your full name.
user: Gwen Stacy
bot: Thank you Gwen Stacy. Welcome! How should I call you?
user: Gwen
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: Nice to meet you, Gwen. Anything I can do for you, Gwen?
CORRECT RESPONSE:
any updates?
ANOTHER CORRECT RESPONSE:
actually call me Stacy
AN EXAMPLE of how it should be responded is provided below:
user_prompt =
user: Can you help me schedule a study group session with my classmates for this week?
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: Sure, I can help you schedule an appointment with someone. What is their full name?
CORRECT RESPONSE:
let’s see if David is free
ANOTHER CORRECT RESPONSE:
Never mind
AN EXAMPLE of how it should be responded is provided below:
user_prompt =
user: Call me Jasmine. Any updates?
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: Nice to meet you, Jasmine. So, how’s it going, Jasmine?
CORRECT RESPONSE:
so any mail
ANOTHER CORRECT RESPONSE:
I want to send a message to Dr Yin about a bug in the hw
AN EXAMPLE of how it should be responded is provided below:
user_prompt =
user: You can call me Fatima. Thank you.
bot: Nice to meet you, Fatima. So what’s up, Fatima?
user: I wanted to ask if there’s a way for me to find a classmate to work together on the Data Structures homework?
bot: You’re not in any groups right now, or you do not have any groups that you can request feedback from!!
user: Actually, I wanted to ask if there’s a way for me to find a classmate to work together on the Data Structures homework. Can you help me with that?
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: Do you also still want help with check the members in a group?
CORRECT RESPONSE:
no, it’s ok
AN EXAMPLE of how it should be responded is provided below:
bot: Understood, I’ll move on. Anything I can do for you, Joseph?
user: Any updates?
bot: There are no new messages in your inbox. Anything I can do for you, Joseph?
user: No, I’m all set for now. Thank you.
Using the conversation history provided above as context, respond in character to the chatbot’s last response below:
bot: You have 1 message in your appointment inbox. Would you like to respond to it now?
CORRECT RESPONSE:
yes
History:
{history}
Using the conversation history provided above and the characterś personality and class details as context, respond IN CHARACTER TO the chatbotś last response below:
Chatbotś Last Response:
{bot_last_response}
- name: Name of user
- sim_class: Classes that the user is part of
- personality: Personality of the user
- studying_tendencies: Studying tendencies of the user
- online_tendencies: Online tendencies of the user
- professor: If the user is a student, professors that user has for classes
- classmates: If the user is a student, classmates that user has in classes
- students: If the user is a professor, students that user has in class
- class_history: History of what material was taught and what the user did in the past weeks of classes
- current_material: What material is being taught at the point in time in which the user is conversing with the Assistant Bot
- num_notifs: Number of notifications the user has
- todo: Task(s) that user seeks to complete through Assistant Bot
- history: Chat history with Assistant Bot.
- bot_last_response: Assistant Bot’s last response
Appendix A.1.2. Tasks and Notifications Prompt
You will be given a conversation history between a user and an assistant chatbot.Using it and the information provided, determine if there are any notifications are NOT resolved and tasks that are NOT completed. A notification is ONLY resolved when the user has completely received the message that the bot is delivering. A task is ONLY completed when the bot has completed the task the user requests and concludes as such. Respond ONLY in a dictionary format such as {‘notifications’ : 1, ‘tasks’: 1}, or ‘N/A’ if unable to be determined.An EXAMPLE of how it should be responded is provided below:
user_prompt =
This user has received 0 notification and their task is as follows: send a message to Gwen about studying together. The number of notifications and tasks is presented in a dictionary format below:
{‘notifications’: 0, ’tasks’ : 1}
History:
— smith is now talking —
user: hi
bot: Before we begin, please provide your full name.
user: Gabriel Smith
bot: Thank you Gabriel Smith. Welcome! How should I call you?
user: How about Darth Vader
bot: Nice to meet you, Darth Vader. How can I assist you, Darth Vader?
Based on the history above, how many remaining tasks and notifications does this user have?
CORRECT RESPONSE:
{‘notifications’: 0, ‘tasks’ : 1}
An EXAMPLE of how it should be responded is provided below:
user_prompt =
This user has received 2 notifications and their task is as follows: send a message to Ben about studying together. The number of notifications and tasks is presented in a dictionary format below:
{‘notifications’: 2, ‘tasks’ : 1}
History:
— fan is now talking —
user: You can call me Gwen.
bot: Nice to meet you, Gwen. What do you need next, Gwen?
user: I need to send a message to Ben about studying together.
bot: Okay! Would you like to send this message anonymously?
user: No, I don’t need to send the message anonymously.
bot: Alright! Can you please tell me the message that you want to send to Ben Smith?
user: Sure! Here’s the message I want to send to Ben: “Hey Ben, I was wondering if you’d be interested in studying together for our Intro to CS class. I find that collaborating with someone helps me understand the material better. Let me know if you’re available and we can set up a study session. Thanks!”
bot: Thank you! I’ll send off your message.
Based on the history above, how many remaining tasks and notifications does this user have?
CORRECT RESPONSE:
{‘notifications’: 2, ‘tasks’ : 0}
An EXAMPLE of how it should be responded is provided below:
user_prompt =
This user has received 2 notifications and their task is as follows: send a message to Dr. Cho about bug. The number of notifications and tasks is presented in a dictionary format below:
{‘notifications’: 2, ‘tasks’ : 1}
History:
— ali is now talking —
user: I need to send a message to Dr. Cho.
bot: Okay! Would you like to send this message anonymously?
user: No, I don’t need to send the message anonymously.
bot: Alright! Can you please tell me the message that you want to send to Ben Smith?
user: Subject: Possible Bug in Homework Assignment
Hi Dr. Cho,
I hope you’re doing well. I wanted to bring to your attention a possible bug that I’ve noticed in the current homework assignment. After reviewing the requirements, I believe there may be an issue with the expected output for a specific problem. I have detailed my findings and suggestions for resolving this in the document attached to this message.
Thank you for your attention to this matter.
Best regards,
Muhammad Ali
bot: Welcome! How should I call you?
Based on the history above, how many remaining tasks and notifications does this user have?
CORRECT RESPONSE:
{‘notifications’: 2, ‘tasks’ : 1}
An EXAMPLE of how it should be responded is provided below:
user_prompt =
This user has received 1 notifications and their task is as follows: create a group for school project with Daniel and Han. The number of notifications and tasks is presented in a dictionary format below:
{‘notifications’: 1, ‘tasks’ : 1}
History:
— tenali is now talking —
user: any updates?
bot: You have 1 message in your appointment inbox. Would you like to respond to it now?
user: yes
Based on the history above, how many remaining tasks and notifications does this user have?
CORRECT RESPONSE:
{‘notifications’: 1, ‘tasks’ : 1}
An EXAMPLE of how it should be responded is provided below:
user_prompt =
This user has received 0 notifications and their task is as follows: make an appointment with Professor Johnson. The number of notifications and tasks is presented in a dictionary format below:
{‘notifications’: 0, ‘tasks’ : 1}
History:
— patel is now talking —
user: You can call me Aisha.
bot: Nice to meet you, Aisha. What can I do for you, Aisha?
user: I would like to make an appointment with Professor Johnson for help with hash tables.
bot: Okay! Please tell me your first choice for a meeting day and time with Amanda Johnson.
user: I would prefer to meet with Professor Johnson on Thursday afternoon around 3:00 PM.
bot: Ok, I think I got it. Is this what you meant?
03/07/2024 (03:00 PM) to 03/07/2024 (03:30 PM)
user: Yes, that’s correct. Thank you for scheduling the appointment with Professor Johnson on Thursday afternoon from 3:00 PM to 3:30 PM.
bot: Awesome, I’ll see if they are free. Furthermore, where do you want to meet with Amanda Johnson?
Based on the history above, how many remaining tasks and notifications does this user have?
CORRECT RESPONSE:
{‘notifications’: 0, ‘tasks’ : 1}
This user has received {notifs} notifications and their task is as follows: {todo}. The number of notifications and tasks is presented in a dictionary format below:
{user_notif_data}
History:
{history}
Based on the history above, how many remaining tasks and notifications does this user have?
Appendix A.1.3. Termination Prompt
You will be given a conversation history between a user and an assistant chatbot.
Using it, determine if the conversation has ENDED or is REPEATING between the bot and user.
Your output should be EITHER YES or NO.
An EXAMPLE of how it should be responded is provided below:
user_prompt =
History:
— smith is now talking —
user: hi
bot: Before we begin, please provide your full name.
user: Gabriel Smith
bot: Thank you Gabriel Smith. Welcome! How should I call you?
user: How about Darth Vader
bot: Nice to meet you, Darth Vader. How can I assist you, Darth Vader?
Based on the history above, is the conversation between the user and the bot ENDED or REPEATING?
CORRECT RESPONSE:
NO
An EXAMPLE of how it should be responded is provided below:
user_prompt =
History:
— stacy is now talking —
user: You can call me Gwen.
bot: Nice to meet you, Gwen. What do you need next, Gwen?
user: I need to send a message to Ben about studying together.
bot: Okay! Would you like to send this message anonymously?
user: No, I don’t need to send the message anonymously.
bot: Alright! Can you please tell me the message that you want to send to Ben Smith?
user: Sure! Here’s the message I want to send to Ben: “Hey Ben, I was wondering if you’d be interested in studying together for our Intro to CS class. I find that collaborating with someone helps me understand the material better. Let me know if you’re available and we can set up a study session. Thanks!”
bot: Thank you! I’ll send off your message.
user: Great, thank you! I appreciate your help. I’ll let you know if there’s anything else I need help with. Have a great day!
bot: Of course! It’s my pleasure.
Based on the history above, is the conversation between the user and the bot ENDED or REPEATING?
CORRECT RESPONSE:
YES
An EXAMPLE of how it should be responded is provided below:
user_prompt =
History:
— patel is now talking —
user: You can call me Aisha.
bot: Nice to meet you, Aisha. What can I do for you, Aisha?
user: I would like to make an appointment with Professor Johnson for help with hash tables.
bot: Okay! Please tell me your first choice for a meeting day and time with Amanda Johnson.
user: I would prefer to meet with Professor Johnson on Thursday afternoon around 3:00 PM.
bot: Ok, I think I got it. Is this what you meant?
03/07/2024 (03:00 PM) to 03/07/2024 (03:30 PM)
user: Yes, that’s correct. Thank you for scheduling the appointment with Professor Johnson on Thursday afternoon from 3:00 PM to 3:30 PM.
bot: Awesome, I’ll see if they are free. Furthermore, where do you want to meet with Amanda Johnson?
Based on the history above, is the conversation between the user and the bot ENDED or REPEATING?
CORRECT RESPONSE:
NO
History:
{history}
Based on the history above, is the conversation between the user and the bot ENDED or REPEATING?
Appendix A.2. Conversation Examples Demonstrating Assistant Bot Tasks
Appendix A.3. Examples of Assistant Bot and Simulation Fails
References
- Chandan, A.J.; Chattopadhyay, M.; Swaroop Sahoo, S. Implementing Chatbot in Educational Institutes. IJRAR J. 2019, 6, 44–47. [Google Scholar]
- Oliveira, P.F.; Matos, P. Introducing a Chatbot to the Web Portal of a Higher Education Institution to Enhance Student Interaction. Eng. Proc. 2023, 56, 128. [Google Scholar] [CrossRef]
- Xiao, Z.; Zhou, M.X.; Chen, W.; Yang, H.; Chi, C. If I Hear You Correctly: Building and Evaluating Interview Chatbots with Active Listening Skills. In Proceedings of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020. [Google Scholar] [CrossRef]
- Gebreab, S.A.; Salah, K.; Jayaraman, R.; Habib ur Rehman, M.; Ellaham, S. LLM-Based Framework for Administrative Task Automation in Healthcare. In Proceedings of the 2024 12th International Symposium on Digital Forensics and Security (ISDFS), San Antonio, TX, USA, 29–30 April 2024; pp. 1–7. [Google Scholar] [CrossRef]
- Hays, S.; White, D.J. Employing LLMs for Incident Response Planning and Review. arXiv 2024, arXiv:2403.01271. [Google Scholar]
- Musumeci, E.; Brienza, M.; Suriani, V.; Nardi, D.; Bloisi, D.D. LLM Based Multi-agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain. In Proceedings of the International Conference on Human–Computer Interaction, Washington, DC, USA, 29 June–4 July 2024; pp. 98–117. [Google Scholar]
- Jung, H.; Yeen, H.; Lee, J.; Kim, M.; Bang, N.; Koo, M.W. Enhancing Task-Oriented Dialog System with Subjective Knowledge: A Large Language Model-based Data Augmentation Framework. In Proceedings of the 11th Dialog System Technology Challenge, Prague, Czech Republic, 11 September 2023; Chen, Y.N., Crook, P., Galley, M., Ghazarian, S., Gunasekara, C., Gupta, R., Hedayatnia, B., Kottur, S., Moon, S., Zhang, C., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 150–165. [Google Scholar]
- Pan, Y.; Cadamuro, D.; Groh, G. Data-Augmented Task-Oriented Dialogue Response Generation with Domain Adaptation. In Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation, Hong Kong, China, 2–4 December 2023; Huang, C.R., Harada, Y., Kim, J.B., Chen, S., Hsu, Y.Y., Chersoni, E., Pranav, A., Zeng, W.H., Peng, B., Li, Y., et al., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 96–106. [Google Scholar]
- Mendoza, S.; Hernández-León, M.; Sánchez-Adame, L.M.; Rodríguez, J.; Decouchant, D.; Meneses-Viveros, A. Supporting Student-Teacher Interaction through a Chatbot. In Proceedings of the Learning and Collaboration Technologies. Human and Technology Ecosystems, Copenhagen, Denmark, 19–24 July 2020; Zaphiris, P., Ioannou, A., Eds.; Springer: Cham, Switzerland, 2020; pp. 93–107. [Google Scholar]
- Wang, H.; Wang, L.; Du, Y.; Chen, L.; Zhou, J.; Wang, Y.; Wong, K.F. A Survey of the Evolution of Language Model-Based Dialogue Systems. arXiv 2023, arXiv:2311.16789. [Google Scholar]
- Yi, Z.; Ouyang, J.; Liu, Y.; Liao, T.; Xu, Z.; Shen, Y. A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems. arXiv 2024, arXiv:2402.18013. [Google Scholar]
- Sun, J.; Tian, Y.; Zhou, W.; Xu, N.; Hu, Q.; Gupta, R.; Wieting, J.; Peng, N.; Ma, X. Evaluating Large Language Models on Controlled Generation Tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Singapore: Cham, Switzerland, 2023; pp. 3155–3168. [Google Scholar] [CrossRef]
- Zhu, Z.; Sun, Z.; Yang, Y. HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild. arXiv 2024, arXiv:2403.04307. [Google Scholar]
- Sekulic, I.; Terragni, S.; Guimarães, V.; Khau, N.; Guedes, B.; Filipavicius, M.; Manso, A.F.; Mathis, R. Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems. In Proceedings of the 1st Workshop on Simulating Conversational Intelligence in Chat (SCI-CHAT 2024); Graham, Y., Liu, Q., Lampouras, G., Iacobacci, I., Madden, S., Khalid, H., Qureshi, R., Eds.; Association for Computational Linguistics: St. Julians, Malta, 2024; pp. 19–35. [Google Scholar]
- Wan, D.; Zhang, Z.; Zhu, Q.; Liao, L.; Huang, M. A Unified Dialogue User Simulator for Few-shot Data Augmentation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Goldberg, Y., Kozareva, Z., Zhang, Y., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2022; pp. 3788–3799. [Google Scholar] [CrossRef]
- Liu, Y.; Jiang, X.; Yin, Y.; Wang, Y.; Mi, F.; Liu, Q.; Wan, X.; Wang, B. One Cannot Stand for Everyone! Leveraging Multiple User Simulators to train Task-oriented Dialogue Systems. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Long Papers. Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; Volume 1, pp. 1–21. [Google Scholar] [CrossRef]
- Lu, X.; Wang, X. Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation. In Proceedings of the 11th ACM Conference on Learning @ Scale, New York, NY, USA, 18–20 July 2024; pp. 16–27. [Google Scholar] [CrossRef]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
- Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; de las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825. [Google Scholar]
- OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar]
- Zhang, J.; Bui, T.; Yoon, S.; Chen, X.; Liu, Z.; Xia, C.; Tran, Q.H.; Chang, W.; Yu, P. Few-Shot Intent Detection via Contrastive Pre-Training and Fine-Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; Moens, M.F., Huang, X., Specia, L., Yih, S.W.t., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2021; pp. 1906–1912. [Google Scholar] [CrossRef]
- Parikh, S.; Tiwari, M.; Tumbade, P.; Vohra, Q. Exploring Zero and Few-shot Techniques for Intent Classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2024; Industry Track; Sitaram, S., Beigman Klebanov, B., Williams, J.D., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 744–751. [Google Scholar] [CrossRef]
- Heck, M.; Lubis, N.; Ruppik, B.; Vukovic, R.; Feng, S.; Geishauser, C.; Lin, H.C.; van Niekerk, C.; Gasic, M. ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an Opportunity? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2024; Volume 2: Short Papers; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; Volume 2, pp. 936–950. [Google Scholar] [CrossRef]
- Zang, X.; Rastogi, A.; Sunkara, S.; Gupta, R.; Zhang, J.; Chen, J. MultiWOZ 2.2: A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, Online, 9 July 2020; Wen, T.H., Celikyilmaz, A., Yu, Z., Papangelis, A., Eric, M., Kumar, A., Casanueva, I., Shah, R., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2020; pp. 109–117. [Google Scholar] [CrossRef]
- Rastogi, A.; Zang, X.; Sunkara, S.; Gupta, R.; Khaitan, P. Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 8689–8696. [Google Scholar]
- Feng, Y.; Lu, Z.; Liu, B.; Zhan, L.; Wu, X.M. Towards LLM-driven Dialogue State Tracking. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 739–755. [Google Scholar] [CrossRef]
- Bang, Y.; Cahyawijaya, S.; Lee, N.; Dai, W.; Su, D.; Wilie, B.; Lovenia, H.; Ji, Z.; Yu, T.; Chung, W.; et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, Bali, Indonesia, 1–4 November 2023; Long Papers. Park, J.D.C., Arase, Y., Hu, B., Lu, W., Wijaya, D., Purwarianti, A., Krisnadhi, A.A., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; Volume 1, pp. 675–718. [Google Scholar] [CrossRef]
- Chung, W.; Cahyawijaya, S.; Wilie, B.; Lovenia, H.; Fung, P. InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems. In Proceedings of the 2nd Workshop on Natural Language Interfaces, Bali, Indonesia, 1 November 2023; Chen, K., Ku, L.W., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 1–21. [Google Scholar] [CrossRef]
- Hudeček, V.; Dusek, O. Are Large Language Models All You Need for Task-Oriented Dialogue? In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Prague, Czech Republic, 11–15 September 2023; Stoyanchev, S., Joty, S., Schlangen, D., Dusek, O., Kennington, C., Alikhani, M., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 216–228. [Google Scholar] [CrossRef]
- Toxtli, C.; Monroy-Hernández, A.; Cranshaw, J. Understanding Chatbot-mediated Task Management. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, New York, NY, USA, 21–26 April 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Deriu, J.M.; Cieliebak, M. Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement. In Proceedings of the 12th International Conference on Natural Language Generation, Tokyo, Japan, 29 October–1 November 2019; van Deemter, K., Lin, C., Takamura, H., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2019; pp. 432–437. [Google Scholar] [CrossRef]
- Ghandeharioun, A.; Shen, J.H.; Jaques, N.; Ferguson, C.; Jones, N.; Lapedriza, A.; Picard, R. Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, QC, Canada, 8–14 December 2019; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Deriu, J.; Tuggener, D.; von Däniken, P.; Campos, J.A.; Rodrigo, A.; Belkacem, T.; Soroa, A.; Agirre, E.; Cieliebak, M. Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2020; pp. 3971–3984. [Google Scholar] [CrossRef]
- Yang, R.; Li, Z.; Tang, H.; Zhu, K. ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Muresan, S., Nakov, P., Villavicencio, A., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2022; Volume 1, pp. 7579–7590. [Google Scholar] [CrossRef]
- Abeysinghe, B.; Circi, R. The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches. arXiv 2024, arXiv:2406.03339. [Google Scholar]
- Chiang, C.H.; Lee, H.y. Can Large Language Models Be an Alternative to Human Evaluations? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 202; Long Papers; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; Volume 1, pp. 15607–15631. [Google Scholar] [CrossRef]
- Svikhnushina, E.; Pu, P. Approximating Online Human Evaluation of Social Chatbots with Prompting. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Prague, Czech Republic, 11–15 September 2023; Stoyanchev, S., Joty, S., Schlangen, D., Dusek, O., Kennington, C., Alikhani, M., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 268–281. [Google Scholar] [CrossRef]
- Park, J.S.; O’Brien, J.D.C.; Cai, C.J.; Morris, M.R.; Liang, P.; Bernstein, M.S. Generative Agents: Interactive Simulacra of Human Behavior. arXiv 2023, arXiv:2304.03442. [Google Scholar]
- Zhou, X.; Zhu, H.; Mathur, L.; Zhang, R.; Yu, H.; Qi, Z.; Morency, L.P.; Bisk, Y.; Fried, D.; Neubig, G.; et al. SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents. arXiv 2024, arXiv:2310.11667. [Google Scholar]
- Finch, J.D.; Choi, J.D. Emora STDM: A Versatile Framework for Innovative Dialogue System Development. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Virtual Meeting, 1–3 July 2020; Pietquin, O., Muresan, S., Chen, V., Kennington, C., Vandyke, D., Dethlefs, N., Inoue, K., Ekstedt, E., Ultes, S., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 261–264. [Google Scholar] [CrossRef]
- Madotto, A.; Liu, Z.; Lin, Z.; Fung, P. Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems. arXiv 2020, arXiv:2008.06239. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 1877–1901. [Google Scholar]
- Yuan, C.; Xie, Q.; Ananiadou, S. Zero-shot Temporal Relation Extraction with ChatGPT. In Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, Toronto, ON, Canada, 13 July 2023; Demner-Fushman, D., Ananiadou, S., Cohen, K., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 92–102. [Google Scholar] [CrossRef]
- Wei, X.; Cui, X.; Cheng, N.; Wang, X.; Zhang, X.; Huang, S.; Xie, P.; Xu, J.; Chen, Y.; Zhang, M.; et al. Zero-Shot Information Extraction via Chatting with ChatGPT. arXiv 2023, arXiv:2302.10205. [Google Scholar]
- Salewski, L.; Alaniz, S.; Rio-Torto, I.; Schulz, E.; Akata, Z. In-Context Impersonation Reveals Large Language Models’ Strengths and Biases. In Proceedings of the 37th Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Kim, H.; Hessel, J.; Jiang, L.; West, P.; Lu, X.; Yu, Y.; Zhou, P.; Bras, R.; Alikhani, M.; Kim, G.; et al. SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 12930–12949. [Google Scholar] [CrossRef]
Type | Task | Description | Figures |
---|---|---|---|
N&H | Set name | Establishes the names that the user wishes to be called by | Figure A1 |
List roster | Lists the users who are registered in the database | ||
APP | Make appointment | Sends an appointment request to a different user | Figure A2 |
Confirm appointment | Confirms an appointment request between two users and adds to schedules | ||
Edit appointment | Edits an appointment request’s details | ||
Reschedule appointment | Reschedules appointment request to different date | ||
Cancel appointment | Cancels appointment request | ||
MSG | Send message | Sends message to recipient | Figure A3 |
Check inbox | Checks inbox for messages and appointment requests | Figure A4 | |
GRP | Make group | Makes group of multiple people | Figure A5 |
Delete group | Deletes a group that has been created (must be leader of the group to proceed) | ||
Leave group | Leaves a group that user has been added to (must be a group member to proceed) | ||
Check groups | Checks groups that users are in | Figure A6 | |
Send announcement | Sends an identical message to group members (must be leader of the group to proceed) | ||
Request feedback | Sends feedback questions to group members (must be leader of the group to proceed) | Figure A7 | |
Send feedback | Records responses to feedback question (must be a group member to proceed) | Figure A8 | |
Check feedback | Checks statistics of feedback question (must be leader of the group to proceed) | Figure A9 |
Scenario | Task Completion | Natural Language Understanding | Simulator | |||||||
---|---|---|---|---|---|---|---|---|---|---|
NLU | STD | SUC | SWS | INT | DST | STD | SUC | SWS | APP | |
(10, 1) | 9.3 | 36.7 | 54.0 | 85.3 | 7.1 | 1.8 | 22.4 | 68.7 | 88.7 | 69.8 |
(20, 1) | 7.8 | 34.6 | 57.6 | 87.8 | 4.6 | 3.0 | 23.9 | 68.5 | 90.0 | 81.8 |
(20, 2) | 6.8 | 38.3 | 54.9 | 88.7 | 4.5 | 2.9 | 22.5 | 70.1 | 90.4 | 79.6 |
(20, 3) | 6.2 | 37.7 | 56.1 | 90.0 | 3.6 | 5.3 | 20.9 | 70.2 | 88.7 | 83.6 |
(30, 1) | 9.7 | 41.0 | 49.3 | 83.5 | 5.5 | 4.6 | 20.4 | 69.4 | 87.2 | 77.0 |
Vir-Avg | 8.0 | 37.6 | 54.4 | 87.1 | 5.1 | 3.5 | 22.0 | 69.4 | 89.0 | 80.3 |
Hum-Avg | 10.5 | 43.1 | 46.4 | 81.6 | 5.2 | 4.2 | 26.0 | 64.6 | 87.3 | - |
Category | Error Type | % |
---|---|---|
STDM | State Lock | 0.68 |
NATEX Issues | 0.20 | |
System Bugs | 0.11 | |
Multiple Task Requests | 0.01 | |
NLU-DST | Multiple State Extraction | 0.84 |
Information Mis-extraction | 0.16 | |
NLU-IC | Context Mis-interpretation | 0.62 |
Task Mis-classification | 0.38 |
Error Type | % |
---|---|
Ignoring Context | 0.39 |
Forgetting Conversation History | 0.30 |
Forgetting User Information | 0.19 |
Confusing Conversational Partner | 0.06 |
Role Reverse | 0.06 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Paek, E.S.; Fan, T.; Finch, J.D.; Choi, J.D. Enhancing Task-Oriented Dialogue Systems through Synchronous Multi-Party Interaction and Multi-Group Virtual Simulation. Information 2024, 15, 580. https://doi.org/10.3390/info15090580
Paek ES, Fan T, Finch JD, Choi JD. Enhancing Task-Oriented Dialogue Systems through Synchronous Multi-Party Interaction and Multi-Group Virtual Simulation. Information. 2024; 15(9):580. https://doi.org/10.3390/info15090580
Chicago/Turabian StylePaek, Ellie S., Talyn Fan, James D. Finch, and Jinho D. Choi. 2024. "Enhancing Task-Oriented Dialogue Systems through Synchronous Multi-Party Interaction and Multi-Group Virtual Simulation" Information 15, no. 9: 580. https://doi.org/10.3390/info15090580
APA StylePaek, E. S., Fan, T., Finch, J. D., & Choi, J. D. (2024). Enhancing Task-Oriented Dialogue Systems through Synchronous Multi-Party Interaction and Multi-Group Virtual Simulation. Information, 15(9), 580. https://doi.org/10.3390/info15090580