Next Article in Journal
Experimental Study on the Properties of Simulation Materials for an Aquifuge for a Fluid–Solid Coupling Physical Similarity Model Test
Previous Article in Journal
Multilabel Genre Prediction Using Deep-Learning Frameworks
Previous Article in Special Issue
Special Issue “Recent Trends in Natural Language Processing and Its Applications”
 
 
Article
Peer-Review Record

Neural Network-Based Bilingual Lexicon Induction for Indonesian Ethnic Languages

Appl. Sci. 2023, 13(15), 8666; https://doi.org/10.3390/app13158666
by Kartika Resiandi 1, Yohei Murakami 1 and Arbi Haza Nasution 2,*
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
Appl. Sci. 2023, 13(15), 8666; https://doi.org/10.3390/app13158666
Submission received: 12 May 2023 / Revised: 16 July 2023 / Accepted: 25 July 2023 / Published: 27 July 2023
(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

Round 1

Reviewer 1 Report

Please clearly indicate the benefit of your work and tell all steps from your model to the expected increased usage of ethnic Indonesian languages. It should also be clearly stated why this paper is interesting to the reader.

Please let improve English by a mother tongue speaker (e.g. avoid "because" at the beginning of a sentence: "Because the Indonesian ethnic language is a low resource language, and it has a limited 42 amount of data, we chose Minangkabau, Malay, Palembang, Javanese, and Sundanese 43 as the languages to implement the proposed method in this study where the bilingual 44 dictionaries are obtained from the result of our previous study [4].".

Author Response

Thank you so much for your valuable comments. We have added the responses to your comments at the end of the manuscript. Please see the attachment for the following part: Reviewer 1 comments and authors’ response.

Author Response File: Author Response.pdf

Reviewer 2 Report

This publication presents an interesting application of the use of recurrent neural networks, unfortunately, in the current form of the manuscript, it is not actually possible to use the results obtained as well as to determine their validity. In my opinion this paper should undergo major revision before considering for publishing.

Detailed comments:

1. The main achievement of this paper is " a bilingual dictionary between ethnic languages using a neural network approach to extract transformation rules using character level embedding and the Bi-LSTM method in a sequence-to-sequence model". Please make this dictionary available for download, as it is not possible to use.

2 At the present time, the experiments performed in this work are not reproducible, as neither the source codes nor the trained models have been published. Please make it possible to use the models for "Malay, Palembang, Javanese, and Sundanese".

3. Have the authors used any novel neural network architecture? The models discussed in Section 3 such as LSTM, Bi-LSTM, Seq2seq, Byte Pair Encoding are already known and widely used. If the authors have proposed something new, please indicate it in detail.

4. Section 3.6 - how exactly were the Bi-LSTM and LSTM models implemented? What libraries were used for this? What were the network architectures? There is no way to reproduce the architectures of these solutions without those details.

5. Table 4, 5, 9, 10, 11, 12 - what are the metrics used in K-Fold Cross-Validation?

6. "Figure 5. Comparison between SentencePiece with BPE and character level method". - "Figure 5 illustrates why the character-based method is superior to the BPE-based method" it is incomprehensible to me why this figure shows "why the character-based method is superior to the BPE-based method". Please elaborate this.

7. Table 8 - why have so many exemplars been misinterpreted by the Rule-Based Approach? Please elaborate on this.

Author Response

Thank you so much for your valuable comments. We have added the responses to your comments at the end of the manuscript. Please see the attachment for the following part: Reviewer 2 comments and authors’ response.

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper presents the usage of Bi-LSTM and LSTM models for Indonesian Ethnic Languages. Two models' hyperparameters are given. The experimental design was conducted. Closely related languages were investigated too.

My suggestion is to add the paper contribution into Introduction and limitation of the work into Conclusions

 

Author Response

Thank you so much for your valuable comments. We have added the responses to your comments at the end of the manuscript. Please see the attachment for the following part: Reviewer 3 comments and authors’ response.

Author Response File: Author Response.pdf

Reviewer 4 Report

This review discusses the importance of preserving Indonesian ethnic languages, which belong to the Austronesian language family and share many similarities. However, these languages are considered endangered based on previous research. To address this issue, a proposal is made to develop a bilingual dictionary using a neural network approach. The proposed model utilizes character level embedding and the Bi-LSTM method in a sequence-to-sequence model to extract transformation rules. The model consists of an encoder and a decoder, where the encoder reads the input sequence character by character, generating context and extracting a summary. In contrast, the decoder produces an output sequence influenced by the previous characters.

The first experiment focuses on Indonesian and Minangkabau languages, utilizing a dataset of 10,277-word pairs. The model's performance is evaluated using 5-Fold Cross-Validation. The character level seq2seq method, employing Bi-LSTM as an encoder and LSTM as a decoder, achieves an average precision of 83.92%, outperforming the SentencePiece byte pair encoding (with a vocabulary size of 33), which achieves an average precision of 79.56%. A rule-based approach is also used as a baseline to evaluate the neural network model's performance in pattern recognition. The neural network approach surpasses the baseline by providing 542 more correct translations.

Furthermore, the proposed model is implemented for four other Indonesian ethnic languages, Malay, Palembang, Javanese, and Sundanese, with smaller input dictionaries. The average precision obtained for each language is 65.08%, 62.52%, 59.69%, and 58.46%, respectively. This demonstrates that the neural network approach can effectively identify transformation patterns between closely related languages such as Indonesian, Malay, and Palembang, compared to distantly related languages like Javanese and Sundanese. The work is valid. However, I advise the authors to consider epistemic knowledge in these language models, considering works such as 'Knowing Knowledge: Epistemological Study of Knowledge in Transformers'.

Overall, the proposed bilingual dictionary creation approach using neural networks showcases promising results in preserving and documenting Indonesian ethnic languages, contributing to their revitalization and preventing further endangerment. 

Author Response

Thank you so much for your valuable comments. We have added the responses to your comments at the end of the manuscript. Please see the attachment for the following part: Reviewer 4 comments and authors’ response.

Author Response File: Author Response.pdf

Reviewer 5 Report

This paper proposes a deep learning model for induction to the Indonesian language. The idea is clear however, authors are required to consider the following corrections.

1- List the paper contributions

2- what are the research questions that are answered by this paper?

3- Show the novelty of your work clearly.

4- Draw the figure of the research methodology and explain the phases in detail.

5- Give a clear description of the dataset as a table.

6- The baseline models are not mentioned. Do you mean the benchmark data?

7-  How do you verify the results, please show the critical analysis in detail 

Minor English proofreading 

Author Response

Thank you so much for your valuable comments. We have added the responses to your comments at the end of the manuscript. Please see the attachment for the following part: Reviewer 5 comments and authors’ response.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Author has addressed my remarks. In my opinion paper can be accepted.

Reviewer 5 Report

Well done

O.K

Back to TopTop