Next Article in Journal
Non-Toxic Anesthesia for Cataract Surgery
Next Article in Special Issue
A Query Expansion Method Using Multinomial Naive Bayes
Previous Article in Journal
A System to Improve Port Navigation Safety and Its Use in Italian Harbours
Previous Article in Special Issue
Classification of Problem and Solution Strings in Scientific Texts: Evaluation of the Effectiveness of Machine Learning Classifiers and Deep Neural Networks
 
 
Article
Peer-Review Record

Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering

Appl. Sci. 2021, 11(21), 10267; https://doi.org/10.3390/app112110267
by Puri Phakmongkol and Peerapon Vateekul *
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Appl. Sci. 2021, 11(21), 10267; https://doi.org/10.3390/app112110267
Submission received: 20 September 2021 / Revised: 22 October 2021 / Accepted: 27 October 2021 / Published: 1 November 2021
(This article belongs to the Special Issue Current Approaches and Applications in Natural Language Processing)

Round 1

Reviewer 1 Report

In this manuscript, the authors applies Raul Puri et al.’s method to augment training data for Thai question answering. The augmented data along with human-labeled data are used to fine-tune WangchanBERTa and mT5 models. Syllable-level F1 is proposed to evaluate the work. There are some observations and comments for this paper.

 

  1. Compared with Raul Puri et al.’s method, is there any major difference of your Question-Answer Pairs Generation method?
  1. In Fig 5, should it be p(q|a hat, c)?
  1. In the manuscript, Thai words should also be translated to English such as Table-1.
  1. Is Syllable-level comparison a reasonable metric? Any related work uses similar idea to do evaluation? You may need to explain the relation between Thai words and syllables.
  1. In section 3.2.3, “To achieve this, we trained a Question Answering model with labeled training data.”. You need to discuss this model. Will different models seriously affect the data to be filtered or not?
  1. Section 5.2 is missing.
  1. The limitation of the work is not clear and lack of discussion. The authors need to discuss the importance and limitations of the work.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Question answering is a natural language processing task that enables a machine to understand a given context and answer a given question. At present, experts and scholars have carried out a large number of research experiments on question and answer. Thai is one of the languages with low usability of marked corpus in QA research. The paper uses the multilingual text-to-text converter (mT5) and Thai data preprocessing method to generate more question and answer pairs to improve the performance of the Thai QA model. The paper is clear, but I think this article needs some changes: 1. Although this paper is clearly organized, but the motivation is not clear enough, it is recommended that the author elaborate on the research motivation of this article, and what kind of problem the method proposed in this article aims to solve? 2. Why does the author think that syllabel-level F1 is a more qualified evaluation standard? What is the difference between word-level F1 and syllabel-level F1? The evaluation standard is a very important indicator for verifying the strategy of the paper. Therefore, I suggest that the author elaborate on this content; 3. The thesis seems to have only innovated in data preprocessing and evaluation standards, and did not improve the main model. Therefore, I think the innovative part of this article may be a bit weak. Can the author make changes on the basis of the original model to improve the core strategy of this article? 4. It is recommended that the author add an overall frame diagram describing the content of this article, so as to make the overall context of the paper more clear; 5. This article is readable, but there are some irregularities. For example, the clarity of the picture is not high enough; some sentences are not smooth. I suggest that the author check the paper again. 6. Several related papers are valued to refer, such as A Survey of CRF Algorithm Based Knowledge Extraction of Elementary Mathematics in Chinese. Mobile Networks & Applications (2021). https://doi.org/10.1007/s11036-020-01725-x

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper describes a question-answering workflow for Thai Language. The authors present first a data preprocessing method. Then, they fine-tune two transformer based models (WangchanBERTa and mT5) for QA task. They compare the use of synthesized data and real human-labeled data. Finally, they also proposed a new metric called syllable-level F1 more adapted for Thai Language.
It is shown that results are improved when data augmentation is used instead of human-labeled only.

The paper is well written and the structure is clear. The introduction is clear with a nice description of the proposal and context. The proposal is very interesting but the conclusion is a bit short.

In section 3.4, I would have expected an evaluation of performance of the Thai tokenizer used. To get an idea of the quality of your input data.
It would be nice also to have more details about how the datasets are built (see section 4.1).
I wonder what is the impact of the batch size as you use 12 for WangchanBERT and only 4 for mt5. We understand that this choice is made for technical reason but it should be interesting to add some explanations about the impact of this. This may appear in the discussion section for instance.

Another comment concern the discussion section, tables 8 and 9 should include names of best models. 
Or, such as for tables 3 and 4, results obtained with filtered generated pairs and all generated pairs should appear for comparison in table 8 and 9.
Additionally, tables 6 and 7 should include the translation of the samples (even if the tokenization  is not the same in English).

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I think my comments for authors are acknowledged in the revised version.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

I still do not understand what the result they have. For example, I thought they proposed a data generation/argument method, and it is better than classical transformer when transfer English to Thai. However, when I read experimental result, it is toooo00 confused, Roberta and the corresponding WangchanBerta, are all from arXiv, which do not referred, also for mT-5, Raul Purl's method. Author must use models which published with peer review to validate their result, otherwise, one can submit an arXiv paper with low quality first then write a paper comparing it.

Second, the result is still not clear enough, in my view, Tables 1 is the structure of the two methods, clear, tables 2, 4 ,5, 7 are rsults comparison of some exist methods, table 3, 6 are dataset. So where is the result of the paper? Maybe it needs high and depth discovery.

If the "best model" is the result of the paper, it is too simple, u know, it was first appeared in the page 13 (total 17). Does it mean the better the later? But this is a scientific paper, author must write everything related to the kernel point closely.

This paper is not organized good enough, though I still think it has quality. In fact, it is not as well as a general paper from a general PhD student. In conclusion, I think the paper needs a major change, especially the experiment result and discussion part.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

no other comments

Back to TopTop