Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Split-Based Algorithm for Weighted Context-Free Grammar Induction

Appl. Sci. 2021, 11(3), 1030; https://doi.org/10.3390/app11031030

by Mateusz Gabor¹

, Wojciech Wieczorek²

and Olgierd Unold^3,*

Reviewer 1: Anonymous

Reviewer 2:

Arnau Mir Torres

Reviewer 3: Anonymous

Appl. Sci. 2021, 11(3), 1030; https://doi.org/10.3390/app11031030

Submission received: 16 November 2020 / Revised: 18 January 2021 / Accepted: 20 January 2021 / Published: 24 January 2021

(This article belongs to the Special Issue Applied Artificial Intelligence (AI))

Round 1

Reviewer 1 Report

Despite the fact that I'm not very familiar with this topic and some parts were not clear to me, I found this paper and synthesis quite interesting.

In my view, the literature review is sufficient to support the chosen title. Also the results are clearly presented.

Moreover, I would suggest the following changes:
1) The addition of some research questions or research hypotheses would be helpful in terms of better understanding the scope and the outcomes of this research
2) A small discussion in the end regarding mainly, the limitations of this work together with future extension (further explained)
3) Finally, I would suggest a final proofreading of the manuscript.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors have to explain what is their relevant contribution in relation with their work:

Unold Olgierd, Gabor Mateusz, Wieczorek Wojciech. (2020).Unsupervised Statistical Learning of Context-free Grammar. W:Ana Rocha, Luc Steels, Jaap van den Herik (red.), ICAART 2020: Proceedings of the 12th International Conference on Agents and Artificial : Natural Language Processing in Artificial Intelligence, Vol. 1, (S. 431-438). Setúbal : SciTePress, DOI: 10.5220/0009383604310438

Moreover, the paper needs major revission.

The algorithm 1 of page 3 needs explanation. More concretely,

- Line 5, what is the set of rule weights wCKY? In line 11, you say "write(phi(A-->w_i) w wCKY[k][1][A] but A is not a rule, A is an element of N, isn't it?
- The same as before in line 22.
My point is that you reference the third component of wCKY[][][*] by an element of N but you say that the third components are elements of R, I don't understant!

- Moreover, when you say "sentence = w_1...w_n", I supose that w_n is w_L.

- Line 80 when you say "...X_a --> X_b,X_c..", I think it is "...X_a-->X_bX_c..."

With respect to the experimental results,

- It is better to perform a t.test instead of a wilcoxon test because the power is greater.
- Would it be possible to have experimental results with a non-generated data set?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Split algorithm for weighted context-free grammar induction

M. Gabor, W. Wieczorek and O. Unold

Review

The authors modify a previously presented method for grammatical inference of context-free languages. The proposed method considers positive and negative data and a "new approach" to infer the structure of the productions of the grammar. The paper is interesting although I have some concerns on it.

In the abstract, it is not clear what is the contribution of this paper. In fact, it seems that there is a contradiction between the title (that suggest the split method as the contribution of the paper) and the abstract which states that “split method in … WCFG was presented and verified”. It is not until Section 2 when the reader understand the main contribution of the paper.

I strongly suggest the authors to consult a native English Professor to review the manuscript, specially the Introduction section. Please, note that the cites can be better placed in order to ease the reading (i.e. “… in 1969 Horning proved [4] that…” can be substituted by “… in [4], Horning proved that…”, or even “… in 1969, Horning proved that… induction no negative evidence is mandatory [4].”).

According the (brief) description of the modification of WGCS, main contribution of the manuscript, (Section 2.2), the method generates “all possible combinations of all non-terminal symbols”. Although it may be no inconvenient for this time complexity, it is important to state the exponential nature of the method. Furthermore, because of this time complexity, the experimentation must include measures to test the evolution of the inference for each dataset. I suggest to include the number of productions generated during the inference process, the number of productions of the inferred grammar, and the time needed to complete the inference process.

In my opinion, Section 2.4 can be improved by including a short example to illustrate the split method.

In Section 2.5.1, I think that equation (2) does not relate well with Figure 3 and should state that \beta_{ij}= P(Y\Rightarrow w_i\hdots w_j) or alternatively Figure 3 should label the root of the shaded tree with X… I thing that the same applies to the outside probability… Could the authors clarify whether I am wrong or not? Furthermore, Figures 3 and 4 can be size reduced.

Please review cite [3] (Corporation, T.R (??))

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors have corrected all my concerns.

So, the work is ready for publication.

Author Response

The paper has been proofread by a native speaker.

Reviewer 3 Report

The authors have modified their manuscript. I still think the paper is interesting, and the results are excellent, but, I still have some concerns on the manuscript. In my opinion, the authors have an interesting piece of work, but they are more interested in rapid publication. I summarize below my main concerns.

I think it is not appropriate the authors not to address the correction of the language and defer all responsibility to the MDPI English Editing Services.

The authors’ revision now make me understand the meaning of the “all possible combinations of all non-terminal symbols” sentence. Anyway, the authors’ claim that this operation does not affect to the complexity of the method should be coupled with a proper study of the complexity (which is not included).

In my opinion, Section 2.4 deserves a short example to illustrate the split method. The authors do not consider my comment. A pseudocode algorithm is not an example.

In my revision I included that the experimentation must include measures to test the evolution of the inference for each dataset. I suggested to include the number of productions generated during the inference process, the number of productions of the inferred grammar, and the time needed to complete the inference process. It seems the authors lack of this data and do need to repeat the experimentation in order to have it. They provide the size of the inferred grammar only for their method, which is not enough to compare with other methods. I ask for the number of productions considered during the inference process. The authors claim that the number of “all productions is bounded by O(n^3), where n stands for “the length of the input sentence”. According Algorithm 1, line 2, I understand that the process is run for each sentence in the dataset, is it so? Could that number be considered as a tight bound? Can a similar bound be obtained for the other methods considered? This should be considered as a minimum to allow the reader to compare the methods.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 3 Report

No comments.

Article Menu

Split-Based Algorithm for Weighted Context-Free Grammar Induction

Further Information

Guidelines

MDPI Initiatives

Follow MDPI