Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Improving the Performance of Vietnamese–Korean Neural Machine Translation with Contextual Embedding

Appl. Sci. 2021, 11(23), 11119; https://doi.org/10.3390/app112311119

by Van-Hai Vu¹, Quang-Phuoc Nguyen^2,*, Ebipatei Victoria Tunyan³ and Cheol-Young Ock^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2021, 11(23), 11119; https://doi.org/10.3390/app112311119

Submission received: 22 September 2021 / Revised: 30 October 2021 / Accepted: 17 November 2021 / Published: 23 November 2021

Round 1

Reviewer 1 Report

This is a relevant and well-presented paper on the performance of Vietnamese-Korean Neural Machine Translation with Contextual Embedding. It contains a well described introduction to understand how neural machine translations workd and an updated state of the art. Methodology is correct. A final improvement of 1.41 BLEU points and 1.65 TER points might not be considered "significant" for some, although contextual embedding is definitely the way to go and this contribution is important in order to highlight this.

As suggestions for improvement, I would say that data for Table 1 and Table 4, despite they are clearly presented, might create confusion as they seem to have been obtained from different non-comparable studies. Could the authors briefly clarify the comparability (or non-comparability) of these percentages?

Author Response

Dear Reviewer,

Thank you very much for your interest in our article.

We have responded to your comments in detail. We hope our feedback will satisfy you.

Please see attached file.

Thank you very much.

Best regards.

Author Response File: Author Response.pdf

Reviewer 2 Report

I suggest adding brief results summary do abstract and highlights to the introduction. Improvement in my opinion is not trustworthy. Similar research was done to other language pairs with successes but so big difference in BLEU. Especially that this BLEU improvement is not resembled in TER. It would be good if authors could add some examples, do manual evaluation on at least 100 segments and add METEOR. It would be good to add significance tests.

Author Response

Dear Reviewer,

Thank you very much for your interest in our article.

We have responded to your comments in detail. We hope our feedback will satisfy you.

Please see attached file.

Thank you very much.

Best regards.

Author Response File: Author Response.pdf

Reviewer 3 Report

In the context of machine translation (MT), this paper applies a BERT model to tag POS for Vietnamese sentences and tag morphological analysis (MA) and word sense disambiguation (WSD) for Korean sentences in a Vietnamese-Korean bilingual corpus. MT results are assessed through BLEU and TER.

The work is professionally done, and the basic English is fine. However, the paper lacks any insight into methodology and algorithmic choices. The methods are never explained with motivation.

specific points:
.. machine translation (MT) models are being steadily improved upon; the quality of the MT systems has also been improved.. - what is your distinction here between MT models and MT systems?

..tasks such as part-of-speech (POS), .. - POS is not a task; it is a syntactic label

NMT is used before being defined

Like (too) many highly technical articles in this field, the text mentions several relevant terms without explanation or description, even in the introduction, which should instead be giving background information about the field and the research problem. All these occur in the first paragraph:
masked language modelling (MLM), next sentence prediction (NSP), cross-layer parameter sharing, factorized embedding layer parameterization techniques, byte-level byte pair encoding (BBPE), dynamic masking; none are explained.

One could possibly excuse lack of explanations in an abstract (or introduction) if the terms were explained later on, but this is not the case. Later, we see other terms, such as multi-head attention, query, key, and value, positional encoding, and add-and-norm, all mentioned without any explanation. Thus this article is directly solely to experts in the field. All others will be completely lost. The details that are given are mostly simple algorithmic detail, e.g., feed this into that, etc.

.. TER [28] and BLUE [29] evaluation.. - use BLEU, not BLUE

..model that has regularly substituted RNN … - “substituted” is not the right word here

.. suppose n is the number of single attention matrix ? .. - by “number”, do you mean “order”?
Also, use italics for math variables (such as n)

.. query, key, and value matrix ..
.. query, key, and value matrices ..

... the parameters are different from the encoder blocks, but this is the same in the different positions of the sentence. - I do not understand this comment; what is the significance of sentence position here?
..The add and norm component is added into the encoder to make the training process faster by avoiding heavy changes in value going from layer to layer. - this sentence is a good example of the major weakness in this paper; what are these “heavy changes in value going from layer to layer”? why is that a problem? indeed, what are these layers? how would add-and-norm help?

Why repeat eq. 2 as eq. 3? If there is a distinction to be made of masking, the text needs to be far clearer.

As for the description of attention, there is no explanation of what keys and values are.

..The weight matrix of the encoder representation (RIs multiplied by the weight matrices of the query and the key, respectively. - this makes little sense; I assume that some words are missing.

..The demonstration of an encoder and a decoder is described in Figure 5. - not really; the figure merely links these in a trivial sequence.

..by combining linguistic annotation with input sentences. - what does this entail? What is
"combining"?

The % results in tables 1 and 4 show high values for all systems; the slightly higher boldfaced values are nice, but indicate only small improvements.

.. extracted 2000 sentences pairs.. ->
.. extracted 2000 sentence pairs..
(Make this change more than once...)

..uses a multi-head attention layer to understand the context 238 of each word..-easy to state this objective, but not easy tactual do.

..with the dropout ratio is set to 0.3; ..
..with the dropout ratio set to 0.3; ..

.. number of rare words.. - what does rare mean here?

The paper gives several translation examples, but they are in the two languages involved, not English, and refers of this article may be perplexed, as the explanations are insufficient for English readers to follow. There is only minimal translation to English.
This is important because several of the detailed points involve word-parts, and are incomprehensible unless one knows some rudimentary Korean syntax, which is not explained at all. For example, the text says: "Our WSD tool converts “sseu-da” into “sseu-da_01”, “sseu-da_02”, or “sseu-da_03” depending the context." Unless one knows what is involved for these three expressions, the idea is lost.

..To overcome this problem, annotation POS is used to add information to Vietnamese sentences. - however, it is not explained how the system determines whether "hoc" is noun or verb.

.. experiment with machines related to Korean or Vietnamese.. - what does this mean?

..to encourage the quality .. - how does this work?

Author Response

Dear Reviewer,

Thank you for your interest in our articles.

We have responded to your comments in detail and we hope it will satisfy you.

Please see attached file.

Thank you so much.

Best regards.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Authors ignored my comments and recommendations and did not improved evaluation, which results do not seem to be correct. In such a case I recommend reject.

Author Response

Dear Reviewers,

Thank you for your support. We are very grateful to you for this.

We have evaluated the accuracy of MT systems by METEOR scores on 2000 pairs of candidate and reference sentences. This score also increases in proportion to the BLEU score. We have updated METEOR to our manuscript.

Thank you very much.

Best regard.

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

authors did necessary corrections

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

Summary

This paper presents an NMT engine for the translation of Vietnamese into Korean; the Vietnamese source side is enriched by POS labels assigned by the BERT model for Vietnamese. Even if not stated in the Abstract nor in the Introduction, the Korean side is also processed (by applying morphological analysis and word sense disambiguation). Experiments were conducted on a parallel corpus released by the authors in 2020; MT results reward the use of linguistically annotated text.

Comments

The paper is interesting for the NLP/MT community especially because benefits are proved when the linguistic knowledge of pre-trained models is injected into MT models for low-resource language pairs.

On the other hand, the paper has weaknesses both in terms of presentation and of contents.

Presentation

Most of the paper is devoted to the description of the Transformer model (Section 2, for a total of about 3 pages) and BERT (Section 3, other 3 pages). Considering that Sections 4 and 5, those where the scientific contribution of the work and its empirical assessement are provided, overally consist of less than 3 pages, the unbalanced organization of the manuscript is evident.

It should be emphasized that the description of the two models is much less clear here than in the original papers, therefore it is superfluous and unnecessarily steals room that could be devoted to the innovativeness of the work.

Moreover, as pointed out more precisely in the "Detailed comments/suggestions" section, there are many annoying inaccuracies and missing elements, regarding bibliographic references, data, and experiments.

Contents

First of all, there is an important aspect of the work which is definitely unclear to me, that regards the processing, in the reported experiments, of the Korean sentences, that is the target side of the bilingual corpus. Curiously, neither in the Abstract nor in the Introduction it is stated that UTagger is used for morphological analysis (MA) and word sense disambiguation (WSD) of Korean text, while, according to results in Table 6, it yields the largest boost in performance.

Unfortunately, authors do not provide any insight on this, but I would have liked to read a clearer description of the experimental setup and a deeper discussion of results, possibly enriched with ablation experiments. Concerning setup, in fact, I do not understand how experiments are run: I assume that the training side of Korean text is processed by UTagger; how can this help in translation, where the input consists of only the Vietnamese text to be translated? Could you provide a detailed description? Other experiments, in addition to those reported in Table 6, could help the reader to understand the contribution of each component/processing stage.

Secondly, the amount of originality is rather limited for a journal paper; actually, I would have appreciated to see the list of innovative contribution with respect to the stat-of-the-art. Anyway, to the best of my knowledge, the only novelty of the work is the BERT-based POS tagging of the source Vietnamese text: hence, as a whole, what is presented in this paper is only a minor extension to existing work.

Finally, there is no chance to reproduce results since (i) the partitioning of the Korean-Vietnamese corpus in training/development/evaluation sets is not given, (ii) it is unclear how the experiments were run (see above) and (iii) the setups of MT engine (OpenNMT) and of BERT-fused model are definitely underspecified (in lines 277-279 just the value of four hyperparameters is provided).

In conclusion, I think that the work, amended by the descriptions of the already known models, is suitable to be presented at an MT workshop/conference.

Detailed comments/suggestions

line 32: "BETO [8] for Spanish" -> the cited paper does not present any BERT version for Spanish, named "BETO". Please, read the paper before citing it in unappropriate manner

line 32: "[8]" -> the reference is given to the version of the paper uploaded to arXiv, but it was published at IJCNLP 2019; please, replace ALL the arXiv references to the corresponding journal/conference version

line 44: "published toolkits" -> publicly avaiable toolkits

lines 45-46: "Transformer [3]" -> "Transformer" is NOT a toolkit; moreover, reference [3] is underspecified (where is the paper published?) and it is hard to find a connection between it and the "Transformer toolkit" (?)

line 50: "Chen et al. used Recent Advances in NMT (RNMT) [24]" -> apart that "recent" for a work of 3 years ago is unsuitable, you should report in a more appropriate way what they did (RNMT+ and an hybrid model)

line 53: [[27] -> [27]

line 55: Through the -> In terms of

line 56: evaluation methods -> evaluation metrics

lines 52-61: in this paragraph it is stated twice that sentence pairs are fed to MT systems; actually, pairs are provided only in training stage, while only source sentences are given in translation. Please, be more precise, because any processing of the target side in translation time means that you are changing the "reference", which in general should be avoided

line 67: In addition, we... -> We...

line 71-72: contains -> adopts

line 72: encoder-decoder architecture. -> encoder-decoder architecture, like earlier models. [the encoder-decoder architecture was not introduced with transformers!]

The two subsections 2.1 and 2.2 [lines 76-147] are useless: if the reader does not know the Transformer, she cannot understand it from what is written there; if she knows it, she does not learn anything more. I suggest to replace them with a very short description of the Transformer and to invite the reader to the original paper for details

The same as above for subsection 3.1 and BERT [lines 149-195].

Subsection 3.2 [lines 196-225] is a copy of what is presented in Section 4.1 "BERT-FUSED MODEL" of paper [3], hence again useless.

line 235: "figure 9" -> Figure 9

lines 244-245: "VLSP 2013 POS tagging corpus." -> add a proper reference

Table 2, last row: "ulsan|Ny" -> "Ny"? Would you mean "N"?

lines 251-252: "VV,... respectively." -> what does this sentence refer to? Is it related somehow to Table 2? Please, fix.

Table 4 presents results of definitely underspecified experiments: which are the test sets? What is their size? How were they annotated? What are the evaluation metrics? The reader should go through at least one of the papers referred inside the table to get such infos, but such an aspect should be covered in a journal paper.

Inconsistency: in lines 52-53 it is stated that the Korean-Vietnamese corpus used in MT experiments reported in this paper consists of 412K sentence pairs, while in line 271 the same corpus is said to include 454K sentence pair. Actually, looking at Table 5, the right approximated value is 455K.

Tables 5 and 6: which is the test set? Could you specify the partitioning of the whole corpus in training, development and evaluation sets?

line 329, on "Author Contributions:" -> some of the entries start with "and" without specifying any first person; for example: "Funding acquisition, and Cheol-Young Ock;". Please, fix all of them.

Reviewer 2 Report

The manuscript presented introduces an interesting solution for automatic translation based on BERT. All the development application is for Korean-Vietnamese translation.

The introduction adequately introduces existing BERT algorithms, Natural Language Processing (NLP). The article proposes the combination of BERT and Neural Machine Translation (NMT) including a preprocessing technique POS to improve accuracy measured using BLEU and TER score.

The manuscript correctly introduces the theoretical approach on sections 2 and 3. Section 4 is dedicated to comment the specifics on Vietnamese and Korean annotation and treatment.

The results are briefly presented. Authors mention the manuscript [26] as the origin of the data for processing and the BLEU and TER score. For major clarity to the readers some details on the context of those 450.000 sentences analyzed should be wellcome, even an example of the translation might be useful. Additionally a detailed of the way the BLEU and TER score are obtained will be great for those who are starting in this technology an metrics. Finally a few details on the rare words, and why those are considered rare and how POS solve polysemies would be welcome.

The conclusions are correctly supported. So, the interest for the reader will be high specially for those interested on Asian language translations.

Reviewer 3 Report

The paper present the application of BERT to the machine translation between Korean and Vietnamese. The background information is well presented.

Since the novelty of the paper is only the application of known techniques to a new language pair, I believe the paper in not suited for a journal.

The language in the paper could use some minor improvements. The mathematical notation in inconsistent - the italic form is often missing and some equations are oddly formatted. There are some other technical issues with the paper, e.g., the caption of table 4 is on one side, while the table is on the next.

Article Menu

Improving the Performance of Vietnamese–Korean Neural Machine Translation with Contextual Embedding

Further Information

Guidelines

MDPI Initiatives

Follow MDPI