Next Article in Journal
Foot-to-Ground Phases Detection: A Comparison of Data Representation Formatting Methods with Respect to Adaption of Deep Learning Architectures
Previous Article in Journal
Osmotic Message-Oriented Middleware for Internet of Things
 
 
Article
Peer-Review Record

A Lite Romanian BERT: ALR-BERT

by DragoÅŸ Constantin Nicolae 1,*, Rohan Kumar Yadav 2 and Dan TufiÅŸ 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Submission received: 3 February 2022 / Revised: 10 April 2022 / Accepted: 11 April 2022 / Published: 15 April 2022

Round 1

Reviewer 1 Report

1) Paper introduces a new BERT model for Romanian Lang using AL-BERT. Author claimed that this ALR-BERT out perform original Romanian-BERT.

2) However, it's not clear what are the advantage of ALR-BERT that outperform rom-BERT. The only diff I noted that ALR-BERT used 1.4 token size rather than 2 tokens used in Rom-BERT?

3) It's little confusing that you talk about ALR-BERTT, but out of sudden used M-BERT in table 2 and sec 3.3!

4) Where M-BERT came from?

5) More details to degree of diff bw ALR-BERT and Rom-BERT need to be explained and highlighted. 

6) Why your model is lighter than the rom-BERT? do you use less features, par, etc?

7) it is not clear which system you are talking about here, M-BERT or ALR-BERT?

8) table 5, shown that ALR-BERT does not outperform non of Rom-BERT and Me-BERT!

Some comments for authors:

1) line 152: 1 is footnote: ... in character coverage1 was

2) paper need to be more organized and provides more details on methods and approach.

3) proof read

4) ref needs to be in full and venue, i.e NIPS 2014. Where was it? etc..

Author Response

    1. Paper introduces a new BERT model for Romanian Lang using AL-BERT. Author claimed that this ALR-BERT out perform original Romanian-BERT.

Answer: The main of this paper is to introduce A Lite version of Romanian BERT. It does not outperform Romanian BERT in terms of accuracy. The main concept is to obtain a light version of language specific Romanian BERT which is efficient in terms of fewer parameter that supports downstream tasks in real-life applications. Even though the model is less performant, the light weight will make it very atractive for real world task as less computational power is required to run it.

    1. However, it's not clear what are the advantage of ALR-BERT that outperform rom-BERT. The only diff I noted that ALR-BERT used 1.4 token size rather than 2 tokens used in Rom-BERT?

Answers: The ALR-BERT  is more efficient than big model rom-BERT because ALBERT model in general is the lighter version of the BERT with fewer parameters. The attempt is to design a language specific Romanian ALBERT so that it can be used to scale the downstream tasks where BERT models are too large to use.

    1. It's a little confusing that you talk about ALR-BERT, but out of sudden used M-BERT in table 2 and sec 3.3.

Answer: The M-BERT (Multilang Bert) represents the multilingual BERT whose tokenizer has been used in our task.  M -BERT represents the baseline for any  implementation of the BERT architecture (for any language) and is imperative that we use it as comparison.

    1. Where M-BERT came from?

Answer: M-BERT is a multilingual BERT which is the main baseline for rom-BERT. We have used this term frequently to show the comparison between full BERT and lite BERT.

    1. More details to degree of diff bw ALR-BERT and Rom-BERT need to be explained and highlighted.

Answer: The ALR-BERT is a lite version of BERT which is popularly known as ALBERT. Our model is the Romanian version of ALBERT named as ALR-BERT.

    1. Why your model is lighter than the rom-BERT? do you use less features, par, etc?

Answer: It is already established experimentally and statistically that A Lite BERT (ALBERT) is the lighter model of BERT (as it uses a lighter architecture and thus a smaller number of parameters). Our proposed model is A lite Romanian BERT (ALR-BERT) - this a implementation of the proposed ALBERT architecture fine tuned for the Romanian language. Hence, it is the lighter version of the rom-BERT.

    1. It is not clear which system you are talking about here, M-BERT or ALR-BERT?

Answer: We are mostly talking about ALR-BERT but since M-BERT is the baseline for comparison, we have used it quite frequently throughout the manuscript. We have corrected the flow of use in the manuscript.

    1. table 5, shown that ALR-BERT does not outperform non of Rom-BERT and Me-BERT!

Answer: Yes, the main aim of the paper is not to outperform BERT but to make a lite version of BERT with fewer parameters for scaling downstream task that will make real world applications much more likely.

We also added the ablation study of ALBERT as compared to BERT (section 4.2.1)

We have proofread our article and corrected the flow of English.

Reviewer 2 Report

In this paper, a transformer based approach has been presented for Romanian language. The paper should be improved in terms of compared state-of-the art models and baselines. An ablation study should be carried out to emphasize the merit of the proposed scheme. The statistical validity of the results should be discussed by adding a statistical test and related plots on the empirical results.

Author Response

The paper proposes a new architecture for the Romanian language using a transformer-based model. There is already a Romanian version of the BERT. However, we proposed the ALBERT version of the Romanian language known as ALR-BERT. The main aim of the model is to design a lite version of Rom-BERT that is scalable in many downstream tasks. We have used the standard evaluation datasets and metrics as used in rom-BERT “The birth of Romanian BERT” published in EMNLP. All the datasets and metrics used in the papers are statistically validated.

We also added the ablation study of ALBERT as compared to BERT (section 4.2.1)

We have proofread our article and corrected the flow of English.

Reviewer 3 Report

The paper presents a Romanian language model based on the ALBERT architecture. The authors provided most of the essential aspects of model training — description of text data used to train the model, configuration of the NN architecture, parameters used in the training process, evaluation of the model on a downstream task, and comparison with other "large" models for Romanian. The only missing element is the presented model with other models regarding model size and processing time. This comparison is crucial for this paper as the primary motivation was to build a light model for Romanian. The authors reported the downside of the model, i.e., loss of 8 pp. on the downstream task, but the paper is missing information about the advantages of the smaller model.

 

Minor errors:

  1. Section 2 is duplicated (lines 87-118 are exact copies of 55-86).
  2. Lines 123-132 could be changed to a list to improve the readability.
  3. Table 1: It is unclear what the "Lines" refers to.
  4. Line 152: "coverage1" ?
  5. Line 160: "128-step" does it mean "128 subwords"?
  6. Line 164: "1e4" — should not be "1e-4"?
  7. Line 194: "8%" — should not be "8 percentage points / 8 pp"?

Author Response

We have stated in the paper about the advantages of having fewer parameters of ALR-BERT as compared to rom-BERT and m-BERT for scaling into downstream tasks.

We also added the ablation study of ALBERT as compared to BERT (section 4.2.1)

We have proofread our article and corrected the flow of English.

Reviewer 4 Report

One of my major concerns in this paper is that the author falls short in conducting a thorough evaluation of their new design. The proposed method is short contents such as the proposed algorithm. Authors need to describe the proposed algorithm.

Title is so short. One of my major concerns the evaluation and contents does not have a sufficient explanation. This article needs clarify actual realization; ascertaining that the research fits the aims and scope; and a better command and flow of English writing throughout the paper.

Author Response

This content of the paper is short because of the following reasons:

  • The proposed model is the lite version of rom-BERT i.e., ALR-BERT.
  • The Lite version of BERT (ALBERT) is a well-established transformer model that reduces the parameters and makes the downstream task more feasible (we refer to it in the article)
  • The selected datasets and metrics are statistically validated already in “The birth of Romanian BERT”.

The title is short because we used the simplest explanation that easily describes that the paper proposes a Lite version of rom-BERT that is ALR-BERT.

We also added the ablation study of ALBERT as compared to BERT (section 4.2.1)

We have proofread our article and corrected the flow of English.



Back to TopTop