Next Article in Journal
Electromechanical and Robotic Devices for Gait and Balance Rehabilitation of Children with Neurological Disability: A Systematic Review
Previous Article in Journal
Study on Soil Throwing Performance and Ditch Depth Stability of Ditching Device in Sandy Orchards in Southern Xinjiang
 
 
Article
Peer-Review Record

Few-Shot Relation Extraction on Ancient Chinese Documents

Appl. Sci. 2021, 11(24), 12060; https://doi.org/10.3390/app112412060
by Bo Li, Jiyu Wei, Yang Liu, Yuze Chen, Xi Fang and Bin Jiang *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2021, 11(24), 12060; https://doi.org/10.3390/app112412060
Submission received: 5 July 2021 / Revised: 26 November 2021 / Accepted: 14 December 2021 / Published: 17 December 2021
(This article belongs to the Section Mechanical Engineering)

Round 1

Reviewer 1 Report

This paper proposed a novel few-shot relation classification model as well as a small labelled dataset for the Ancient Chinese Documents analysis.

 

The proposed few-short relation classification model achieved a good performance on the benchmark dataset and demonstrated promising use in analyzing ancient Chinese documents with sparse labelled data.

 

The experiments are solid and the introduction is good.

 

-My biggest complaint is the language.

-I believe it requires a significant improvement on the language and the style. 

Author Response

Point 1: -My biggest complaint is the language.

-I believe it requires a significant improvement on the language and the style. 

Response 1: As per your requirement, we have performed proofreading for our English language and style. All changes could be seen in the revised paper.

Thank you very much for your comments and suggestions

Reviewer 2 Report

Unfortunately, I cannot assess the full strength of this paper, as my competencies include social-scientific assessment of results of extraction and topic models, not on the math part. All in all, the logic of the paper seems to be right, with necessary testing on place. The results of the proposed extraction model are good if the reported rise in quality is verified.

However, there are shortcomings in the paper that need to be addressed seriously.

First, the style of the paper reminds automatic translation into English from a non-Latin language (I suspect Chinese, as the paper is on Chinese documents). See the first paragraph: ‘For traditional humanity researches, as [1] has mentioned, are currently facing a crisis that humanity and humanistic meaning needs to compete with the social, economic, popular culture, and other prevailing entertainment. This crisis also influences society on pedagogy and job seeking tendency.’ Further on, phrases like ‘limited propagation on mass media’, ‘using a digital way’, ‘process and demonstrate numerously disorganized data’ tell of low command of English, to the extent that blurs the meaning of the paper in some of its parts. The paper needs to be corrected by a professional proofreader.

One other major question is that manually labeled datasets are used as both training and testing ones. In several recent works, researchers (see, e.g., Bodrunova et al., 2019, of SNAMS workshop) have shown that manual labeling of data needs to be re-evaluated, as it is not always can be taken as baseline / ground truth coding. The authors need discuss this at least shortly.

Another shortcoming is that the differences between Latin and hieroglyphic languages have to be shortly explained, and peculiarity or additional complications of Chinese need to be addressed, if they exist, preferably in comparative perspective with English.

In the results when the authors assess Figure 9, they say: ‘in attribute 482 we can find that the Zhou has a connection with his clothes and we believe this connection should 483 be a action.’ This is unclear, as previously they have put clothes into attributes (where they logically belong).

Other smaller issues relate to Tables and Figures.

Figure 1 does not have any mention in the text. This Figure needs explanation and a reference in the paper. Perhaps SOFTMAX and FFN are common knowledge in the world of text classification, but I am sure that the journal would demand that abbreviations and terms are explained.

Also, Table 1 is unclear. I think it needs to be clearer why the highlighted instances belong to one or another class (A or B), as it is not clear, and, in the text, there are no explanations. How does ‘geometric’ resembles ‘Wilton Bridge’? It is the ‘bridge’ that matters as a cross-word (generalization), and ‘geometric’ is generalization for ‘Euclidean’? Please make the Table 1 clearer for general reader.

Figure 3 is completely unclear for non-Chinese speakers and, thus, cannot be assessed.

In 5.4, ‘Table 4’ needs to be changed to ‘Table 5’.

Figure 9 has to be correctly presented; as for now, we see 2,5 pics of 7 which go beyond the margins.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 3 Report

The work presents an approach for relation extraction on Chinese documents, specially focused on Shih Chi. The approach takes in consideration a context on which there is a lack of large training corpus.

There are two main contributions 1) The Few-shot relation extraction approach and 2) a benchmark dataset (TinyACD-RC) built uppon Shih Chi and curated by historians.

The paper is scientific sounding, but the content is not very organized, neither well written.

It makes it really difficult for the reviewer to assess the content. There are also various remaining open questions. I recommend the paper to be accepted given that the author provides all answers to the open questions as well as revise some of the sections addressing some of the remaining questions.

#Major drawbacks

  1. The author did not provide a link to the repository containing its approach, nor the TinyACD-RC benchmark data set. That is a big issue for reproducibility research;
  2. It is not clear if the author evaluates Entity and Relation, the example provided in Figure 3 deals only with entities.
  3. The paper needs a profound revision. Some sentences need to be rewriting others given better support with citations.

#Introduction

-> Comment: “As mentioned in [1], nowadays traditional humanity” -> Rephrase this introduction, i.e. Nowadays [1] … . or “Traditional humanity ….. [1]”

-> Comment: ”which effectively impaired the difficulty of massive 24 data organization and usage” -> this affirmation is too strong, I suggest to rephrase to “which CAN”, and add citation to back your statement

-> Comment: “ in demand from a mess  of documents “ -> recommend to change MESS to other i.e. “a large quantity”

-> Comment: “According to the above reasons, researches and analysis in ancient Chinese documents 44 with limited data are significant, and due to the limitation of real-world scenario that a standard 45 training dataset will consume numerous resource (time, human, and finance) to build up, finding 46 a feasible measure to extract and analyze these limited data is imperative. “ -> really hard to understand, please rephrase it

-> Comment: “The common solutions for information” -> not really, common solutions are built on top of adhoc IR. Perhaps you mean “common ML solutions”, please rephrase it

-> Comment: “All these 72 factors make pretraining infeasible” -> I would not say infeasible, but a difficult task, please rephrase it

7 “However, this framework only works well for a small number of instances (under 700) and 148 very simple models, for complex models such as” -> I do not get your analogy, you also want to use the framework in a small set of instances in TinyACD-RC, could you elaborate more on that?

-> Comment: “this framework can hardly have an obvious improvement.” -> Why is that?

-> Comment: “RNN-based model always suffers from its sequential structure” -> not true, the right statement is that it can suffer, and also please provide a supportive evidence

#Structure

-> please provide a full stop in all Figure captions

I.e. Figure 2. MASCOT encoder structure of our model > Figure 2. MASCOT encoder structure of our model.

#Experiments

-> Please provide an evaluation using adhoc IR, i.e. tagme  gammaliu/tagme: Entity Linking system by A3 lab (github.com). Adhoc systems have the advantage of not needing any training data which suits to your scenario

-> Rephrase  “The none-of-the-above task” -> The none-of-the-above (NOTA) task

-> Please add the numbers in Figure 5

-> Please provide the ratio Test / Training dataset on all experiments

-> Rephrase: “​​the FewRel1.0 [5] task” -> FewRel 1.0

-> Figure 5, Figure 6, Table 6 etc, depicts results that I believe are BLEU accuracy, but you do not discuss that in any part of the section, please add that with reference.

-> Comment: “the mean of all class c support instances. As a result, benefited 366 from this rectification, model performance increased 5.91% on mini-ImageNet. “ -> did you actually evaluate the model on mini-ImageNet? why?

#Conclusion

I would like to see a discussion about what is the lesson learned that we can use to improve adhoc IR systems, if any.

#General Remarks

-> Please provide in all tables and pictures a legend describing each field and axis

I.e. Table 6, what is Model, what is NO. etc.

-> Please provide an accessible link for the TinyACD-RC and all approach results, for reproducibility.

-> Please replace Figure 3 with an example containing relation, if you deal with relation, if your approach deals only with entities, please reformulate your paper, the title and benchmark are misleading.




 

Author Response

Based on these comments and suggestions, we have made careful modifications to the original manuscript, and carefully proofread the manuscript to minimize typographical and grammatical errors. Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop