Next Article in Journal
Virtual CT Myelography: A Patch-Based Machine Learning Model to Improve Intraspinal Soft Tissue Visualization on Unenhanced Dual-Energy Lumbar Spine CT
Previous Article in Journal
Fintech Services and the Drivers of Their Implementation in Small and Medium Enterprises
 
 
Article
Peer-Review Record

Semantic Connections in the Complex Sentences for Post-Editing Machine Translation in the Kazakh Language

Information 2022, 13(9), 411; https://doi.org/10.3390/info13090411
by Aliya Turganbayeva 1,*, Diana Rakhimova 1, Vladislav Karyukin 2, Aidana Karibayeva 2 and Asem Turarbek 2
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Information 2022, 13(9), 411; https://doi.org/10.3390/info13090411
Submission received: 2 July 2022 / Revised: 12 August 2022 / Accepted: 25 August 2022 / Published: 30 August 2022

Round 1

Reviewer 1 Report

The paper presents a new and interesting methodology for a very specific type of problem.  It was not discussed, but the methodology is likely extensible to other translation pairs.

This review includes an attachment containing scanned hand-written comments with both stylistic and content suggestions.  The major feedback is summarised below, with supporting details throughout the attachment.

The specific context of the research is unclear, since the problem was motivated by referring to MT systems that are not Google or Yandex (which should each be footnoted with a URL), with no specific reference to which systems were being described, and no review of those systems.  This is a major issue that should be addressed: motivate the work by quantifying or qualifying the output of existing MT systems (including the ones not relevant to this work), and tell which ones could benefit from the presented approach, and maybe summarise some thoughts on how to integrate the approach into them.  Some example sentences run through the existing systems would also be helpful.

The literature review should not run through each source saying what it contributes, but rather could be organised by relevant themes.  For example, relevant methods could be covered in one (or two) paragraph(s), conclusions of work on MT of compound/complex sentences could be covered in another paragraph(s), etc.  Also, relevant grammatical sources on Kazakh compound+complex sentences should be cited.  Аманжолов, Балақаев, and Кеңесбаев (and colleagues) all have published good grammars of Kazakh, any one of which could probably serve as a citation for your break-down of Kazakh compound+complex sentences.  Some assertions need to be cited throughout (I've added comments where relevant in the attachment), and the Apertium resources used in the work need to be cited (for example, apertium-kaz: http://www.lrec-conf.org/proceedings/lrec2014/summaries/1207.html).

The methods are not reproducible as they are presented.  Reproducibility is a core requirement of scientific literature.  To be reproducible, the code used in the research and the corpus assembled for the project should be available online somewhere under an open license, linked to from the paper.  Please read Pedersen 2008 (https://aclanthology.org/J08-3010.pdf) for further explanation.

The methodology and evaluation metrics are not clear.  These need to be explained in more detail and clarity for the results to be interpretable.  As such, the impact of the paper is not yet fully clear.  Please also clarify in the methodology and evaluation sections (and throughout) what is done manually and what is done in an automated way, especially when you use words like "edit", "annotate", "score", "perform", "use(d)", etc.

The introduction section needs an introductory paragraph at the beginning, and at the end a paragraph or two that summarises the methodology and results and then outlines the rest of the paper, saying what's in each section.

You should primarily use English terminology for compound (салалас) and complex (сабақтас) sentences and their types.  You can (and probably should) keep the Kazakh terminology, but only use it the first time you mention each term.  It would help to explain each term too.  Cite Kazakh grammars for this, and the terminology (otherwise it sounds like you made up the terms!) and use Kazakh orthography.  You should probably also find a way to refer to the two together, since "complex sentence" (as in the title) refers only to сабақтас (although perhaps you could be explicit in the introduction that when you use the term on its own it refers to both types of sentences?).

In general, try not to use terminology unless you need to use it in the paper to support the write-up of your work.  When you do use terminology that the reader may not know, explain it.  As above, use English-language terminology, adding the Kazakh term in parentheses on first mention for those who wish to reference it, and explain your terms.  (Relatedly, in the first paragraph, doesn't жалғаулықты refer to compound+complex sentences that are connected using verbal morphology (етістік жалғаулары) and not "connecting words" (like conjunctions)?  The way you explain it, it sounds like the reverse.)

Comments for author File: Comments.pdf

Author Response

Hello, thank you for your detailed notes. Your notes have been worked out as far as possible. I marked the corrections made according to your remark with yellow and pink markers in the attached document. After receiving the copyright certificate for our tools, we will publish it in the public domain on our github page (https://github.com/NLP-KazNU).

Author Response File: Author Response.pdf

Reviewer 2 Report

The topic introduce in this paper is very interesting, and is valuable for low-resource languages. However, the structure and writing of this article should be significantly improved. Some comments are:

 

1) I don't understand why "except Google and Yandex" (in Abstract)? I think the authors should give the reasons.

2) In abstract, the paper says "The paper analyzes the works on complex sentences in the Turkic, English, and Russian languages.", does "Turkic" means the Kazakh language?

3) The introduction of the propose method is not very clear. I suggest the authors provide a framework to describe the method.

4) I did't find any words to introduce the datasets and settings in the experiment section. 

Author Response

hello, thank you for your comments. Your comments have been worked out as far as possible. I marked the corrections made according to your remark with blue and pink markers in the attached document. After receiving the copyright certificate for our tools, we will publish it in the public domain on our github page (https://github.com/NLP-KazNU).

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper focuses on the semantic connection problem in automatic translation. Although the topic is interesting for the community and the current version is well-written, the paper still suffers from some issues. The authors must carefully address my concerns to get the revised version publication qualifications.

1) The authors need to provide detail about the problem statement(s) in the section of Introduction.

2) Since the community considers that the Kazakh language as a low-resource language somehow, the authors need to provide more information about low-resource MT scenario and they at least need to cover some important related work by providing new references in the section of Related Work. Some essential related references that the authors need to add them to their work are:

2-1) "Persian-Spanish Low-Resource Statistical Machine Translation Through English as Pivot Language", (Ahmadnia et al., 2017), In Proceedings of Recent Advances in Natural Language Processing (RANLP 2017), pp. 24-30.

2-2) "Dual Learning for Machine Translation", (He et al., 2016), In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016).

2-3) "Augmenting Neural MachineTranslation through Round-Trip Training Approach", (Ahmadnia and Dorr 2019), Open Computer Science (De Gruyter), 2019:9(1)268-278.

2-4) "Strengthening Low-Resource Neural Machine Translation through Joint Learning: The case of Farsi-Spanish", (Ahmadnia et al., 2021), in Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART 2021), pp. 475-481.

3) The proposed algorithm in section 3 is not clear enough to follow. The authors need to elaborate on this and provide detail information.

4) Results comparison to the state-of-the-art is highly recommended.

5) proofreading for the revised version submission is recommended.

6) Typo: 2. Related Works ---> 2. Related Work

I'm looking forward to reviewing the revised version.

Author Response

hello, thank you for your notes. Your notes have been worked out as far as possible. I marked the corrections made according to your remark with green and pink markers in the attached document. After receiving the copyright certificate for our tools, we will publish it in the public domain on our github page (https://github.com/NLP-KazNU).

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

It addresses all my previous comments and issues.

 

Back to TopTop