Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision

Information 2022, 13(4), 175; https://doi.org/10.3390/info13040175

by Anwar Aysa, Mijit Ablimit^*, Hankiz Yilahun and Askar Hamdulla

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Information 2022, 13(4), 175; https://doi.org/10.3390/info13040175

Submission received: 20 February 2022 / Revised: 18 March 2022 / Accepted: 29 March 2022 / Published: 31 March 2022

Round 1

Reviewer 1 Report

An interesting manuscript to read that addresses a real topic, namely Bilingual Lexicon Extraction based on weak supervision.

The authors propose a method for Chinese-Uyghur bilingual dictionary, two very different languages in both spelling and pronunciation, wanting to help improve machine translation, automatic knowledge extraction, and multilingual translations.

After an introductory section, the authors have a rather strange section 2. Basic assumptions and Related research, which includes a very short subsection based on a single bibliographic element [9], followed by the Related research part. For this section 2, I would recommend a rearrangement / renaming or a transfer of this subsection 2.1 which seems somehow awkward to be placed before Related research.

Subsection 3.1 contains elements about cross-language word embedding text representation, and a series of represented equations (1) and (2) that unfortunately appear without a bibliographic source (appear as highlighted in color in the anti-plagiarism software). Here I would recommend that the authors should place before or just after these equations the bibliographic sources from where they were reproduced.

The article appears in a check with anti-plagiarism software (see attached PDF file) with a similarity of 17%. I advise the authors to check the text again and where there are whole sentences that appear underlined to rephrase them or put them in quotation marks immediately next to the appropriate bibliographic source (eg, rows 82-84, 157-158, 202-206,226- 228, 271-278 etc.). There are no serious problems, the manuscript looks original, but my opinion is that it deserves these improvements for the final look.

Another advice would be to make figures 1 and 2 respectively a little larger and to have a clearer delimitation between them, their millimetric nearness can be confusing among readers, especially those unfamiliar with the methods used.

Also, the sub-sub-sections 4.3.1, 4.3.2, etc., should be better delimited in the text (their titles seem to be part of the previous paragraph).

The conclusions are particularly short, and it is a pity for a large-scale study. I would have liked to read in the manuscript even more details about the interpretations of the results obtained, possibilities for adding improvements of the method, and especially about the future plans of the authors for further research (a little more than a single sentence, from the only paragraph of conclusions).

The manuscript seems to be the result of a serious and well-worked experiment, with some improvements it is worth publishing.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The paper proposes a method for bilingual lexicon extraction for Chinese and Uyghur. The method uses word embedding alignment with the help of a seed dictionary. The authors have constructed a dataset on which the method is evaluated. Overall, the paper is well written (maybe slightly too long).

However, the experimental evaluation is not sufficient as it does not include any of the prior methods, even those mentioned in the background section.

Additionally, the method proposed in the paper bears significant similarity to Mikolov's algorithm for mono- and cross-lingual alignment that has been used successfully for the same purpose, please see references below:

Joulin, Armand, Piotr Bojanowski, Tomas Mikolov, Hervé Jégou, and Edouard Grave. "Loss in translation: Learning bilingual word mapping with a retrieval criterion." arXiv preprint arXiv:1804.07745 (2018).

Zhang, Mozhi, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, and Jordan Boyd-Graber. "Are Girls Neko or Sh\= ojo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization." arXiv preprint arXiv:1906.01622 (2019).

Schuster, Tal, Ori Ram, Regina Barzilay, and Amir Globerson. "Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing." arXiv preprint arXiv:1902.09492 (2019).

At least one of these methods must be compared to.

Detailed comments:

l.45 NMT does not use dictionaries anymore, it uses parallel corpora.

l.180-182 'The system can be applied...' - how? It is unclear and the statement seems too general.

l.251-252 What about the other direction? Does it change the results?

l.329-331 More information is needed. What is the # of words per article on average? The # of sentences? Average sentence length? Minimal and maximal article sizes? etc. Word distribution, as well.

l.364 Why 0.025? Is it a default value? Did you try other values?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

All of my comments were adequately addressed and the paper was modified accordingly.

Article Menu

Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision

Further Information

Guidelines

MDPI Initiatives

Follow MDPI