Next Article in Journal
Application of Improved Sliding Mode and Artificial Neural Networks in Robot Control
Next Article in Special Issue
Evaluating Neural Networks’ Ability to Generalize against Adversarial Attacks in Cross-Lingual Settings
Previous Article in Journal
Systematic Review: Anti-Forensic Computer Techniques
Previous Article in Special Issue
Towards Media Monitoring: Detecting Known and Emerging Topics through Multilingual and Crosslingual Text Classification
 
 
Article
Peer-Review Record

STOD: Towards Scalable Task-Oriented Dialogue System on MultiWOZ-API

Appl. Sci. 2024, 14(12), 5303; https://doi.org/10.3390/app14125303
by Hengtong Lu, Caixia Yuan and Xiaojie Wang *
Reviewer 1: Anonymous
Reviewer 2:
Appl. Sci. 2024, 14(12), 5303; https://doi.org/10.3390/app14125303
Submission received: 8 May 2024 / Revised: 15 June 2024 / Accepted: 17 June 2024 / Published: 19 June 2024
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Here is my detailed reviews of the paper:


1. Paper summary

In this paper, the authors propose a Scalable Task-Oriented Dialogue modeling framework (STOD). Instead of labeling multiple dialogue components, which have been adopted by previous work, the authors only predict structured API queries to interact with DB and generate responses based on the complete DB results. Then the paper performs extensive qualitative experiments to verify the effectiveness of our proposed framework. The authors find that (1) Scalability across multiple domains: MSTOD achieves 2% improvements to the previous state-of-the-art in the multi-domain TOD. (2) Scalability to new domains: our framework enables satisfying generalization capability to new domains, a significant margin of 10% to existing baselines.

 

2. Strengths

  • The experiment results reflect the effectiveness of the proposed approach.
  • The system is proposed to solve an important problem of promoting the performance of TOD.
  • The paper is generally well-written and easy to follow.

 

3. Weaknesses

  • The structure of the paper can be improved.
  • There is no statistical significance test for the quantitive experiments.
  • The improvement of 2% is relatively small.

 

4. Comments for authors

a.               Significance

i.The paper didn’t discuss much about the statistical significance of the experiment outcome.

b.               Soundness

i.The comparison between the proposed tool and other approaches enhances the

soundness of the paper, but the authors may want to discuss more about the

replication process of the other approaches.

ii.The improvement of 2% is relatively small, authors may want to discuss future approaches to improve the technique.

c.               Novelty

i.I think the novelty of the paper is overall good as it takes advantage of generating API queries and responses.

d.               Presentation

i.The writing of the paper is generally good and easy to follow.

ii.There should be a section rather than the appendix to discuss the limitations because the experimental result is influenced by the specific context.

e.               Verifiability

i.The paper doesn’t discuss the verifiability of their approach.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

(optional) In many recent papers, sharing of experimental code is suggested for reproducibility of experiments. I suggest sharing the code for re-annotation and STOD training methods of this paper.

 

1. Introduction

 

1.1 

To explain the limitations of the existing research(work) on TOD, it is necessary to describe the existing research(work) in more detail. 

The three limitations of existing research claimed in the introduction (lines 29~37) require a lot of prior knowledge to understand, but the lack of explanation of exsiting work in the manuscript may make it difficult for readers to understand.

 

1.2

The introduction (line 39) claims that the similarity of APIs across domains can lead to increased extensibility. According to Appendix A, the types of APIs also have different API-names and API-parameters for each domain.

At this point, the reader may have questions about scalability because the types of APIs are also different for each domain. Therefore, the author should clearly explain the reason for the increased scalability by APIs.

 

 

3. Construction of MultiWOZ-API

 

3.1

According to Table1, API_num ratio is the value of how many APIs a turn has including. And, API_trun ratio is the number of turns that have at least one API of each attribute (find, book, get_attr) among all turns of dataset.

In this case, my guess is that the API_trun ratio should be find+book+get_attr > all_API. This is because there are cases where a single turn calls multiple APIs by the API_num ratio. 

(ex) if turn1=(1 find, 1 get_attr), turn2=(1 book) than all_API=2 find=1 book=1 get_attr=1)

However, according to the table, find+book+get_attr = all_API. The reason for this result should be explained.

 

 

4. Methodology

 

4.1

In line 192, we don't know whether  \{ U1, Q1, D1, · · · , Q^{n1}_{t-1} , D^{n1}_{t-1} \} means  \{ U1, Q1, D1, · · · , Q^{n1}_{t-2} , D^{n1}_{t-2}, Q^{n1}_{t-1} , D^{n1}_{t-1} \} or \{ U1, Q1, D1, · · · , Ut-1, Q^{n1}_{t-1} , D^{n1}_{t-1} \}.

In line 192, t-1 in Q^{n1}_{t-1} represents the number of turns, but I don't know what n1 means.

 

4.2

Eq.3 and Eq.4 are loss of different functions, but the formulas are the same. This needs to be examined.

Eq.7 and Eq.8 are loss of different functions, but the formulas are the same. This needs to be examined.

 

4.3

The difference between cross-domain TOD and multi-domain TOD needs to be explained in the methodology.

 

4.4

In S_d in 4.3, it should be specified which domain d is. (I guess it is the target domain for cross-domain TOD).

 

 

5. Experiment and Analysis

 

5.1

line 258, Section 3 is not human evaluation results.

 

5.2

For 5.2.3, the process and experimental setup of the paper's few-shot experiment needs to be clarified.

 

5.3

In the description of table.7, need to distinguish between few and full.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop