Next Article in Journal
Enhanced Inclusion through Advanced Immersion in Cultural Heritage: A Holistic Framework in Virtual Museology
Previous Article in Journal
An Explainable Method for Lung Cancer Detection and Localisation from Tissue Images through Convolutional Neural Networks
Previous Article in Special Issue
An Automatic Generation and Verification Method of Software Requirements Specification
 
 
Article
Peer-Review Record

Traditional Chinese Medicine Knowledge Graph Construction Based on Large Language Models

Electronics 2024, 13(7), 1395; https://doi.org/10.3390/electronics13071395
by Yichong Zhang and Yongtao Hao *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2024, 13(7), 1395; https://doi.org/10.3390/electronics13071395
Submission received: 17 February 2024 / Revised: 29 March 2024 / Accepted: 3 April 2024 / Published: 7 April 2024
(This article belongs to the Special Issue Applications of Big Data and AI)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript describes a method to build Traditional Chinese Medicine Knowledge graphs using a specific LLM, the iFLYTEK Spark Cognitive Large Model. With this tool, the authors perform those tasks necessary to construct a KG.

About references:

- missing reference "A Method for Traditional Chinese Medicine Knowledge Graph Dynamic Construction" - I would suggest refraining from including references in Chinese unless accompanied by English translations, as most readers may not comprehend them. - The reference for line 374 is missing. - Update references not solely from arXiv; for instance, "Language models are few-shot learners." - Line 193, there is a reference to Ni et al., but I believe it is not listed in the reference list as Reference 17 is a website.

Comments/suggestions:

I would emphasize and provide a more detailed explanation for why iFLYTEK was preferred over ChatGPT. Specifically, I would rephrase starting from line 217, as the sentence gives the impression that the choice was mainly due to cost considerations. In particular, I would argue how the choice saves task costs.

- I am unclear on how the data cleaning process was conducted. What is meant by "removal of outliers"?

- Is there any repository of the collected data?

- Regarding the data collection, Figure 2 is not informative at all. I would recommend providing a toy example with an English translation. What is meant by "infobox"? I would suggest revisiting section 3.1.

- I suggest dividing Figures 3, 4, and 5 using different colors for the various steps. Additionally, I would recommend providing a schema adaptable to Figures 4 and 5 as well to emphasize how the same approach (task description + few-shot demonstrations + output) was used in different phases.

I believe Figure 6 is incorrect because there is no image on the right or one on the left, but rather one below and one above. Additionally, once again, images containing text in Chinese are likely to be incomprehensible to those who do not understand Chinese.

- The experimental section is the one that convinces me the least. Are the 200 records you mention the test set? I don't understand how the experiments were conducted. How much data did you collect? And what percentage of these did you test? I believe that section 4 needs to be completely rewritten, and actual examples of errors must be provided. Furthermore, I would add another KG, not just the one in Figure 7.

 

Overall comment: The method used by the authors for tasks such as Named Entity Recognition or Entity Relationship Extraction via iFLYTEK is very simple but seems to work. However, the level of novelty is very low, and much of the method's effectiveness is indeed due to iFLYTEK.

 

 

 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

In this work "Traditional Chinese Medicine Knowledge Graph Construction Based on Large Language Models", the authors propose and evaluate a method to use trained LLM to structuralize data for Knowledge Graph construction of Tradition Chinese Medicine. In this work, the authors demonstrate how to tackle several tasks using LLM plus prompt.

I recommend publishing this work after a minor revision to address this point:

In 4. Experimental Results and Analysis, the authors only present the results of the NER task (3.2.2. Named Entity Recognition). The results from 3.2.3-3.2.5 should also be presented.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Authors propose a method for extraction of words and meanings (knowledge) from Chinese sentences, in aim to create a knowledge graph.

Authors should consider some changes to this manuscript.

TECHNICAL AND WRITING STYLE CORRECTIONS

1. Referencing of literature in text - Currently, numbers that represent literature item number in ordered literature list, are used as in superscript format - usually they are in square brackets as regular text without superscripting. Authors should carefully check guidelines for preparation of manuscript, particularly in this detail. It is particularly odd when superscripting numbers are put together as if they are knotted (line 87).

2. Literature list text - some items in the literature text are provided only by using Chinese letters. Since Electronics is an international journal, the literature list - authors, title of referenced literature, publisher etc. They all should be provided in English (and Chinese original in brackets, if needed).

3. Referencing sources in text - there are large portions of text, particularly in the introduction (for example from lines 38 to 86), where there are no references to literature, pointing to the sources of these statements. This is not a common practice to have statements, particularly in the introduction, without support of literature. 

4. Text alignment - it is usual to have justified text, so this correction should be made (for example, in lines 330-331).

5. Abbreviations - it is common that each abbreviation has the full name accompanied at the first occurrence. For example, API is well known abbreviation, but still, it requires to have it explained at the first appearance in text.

CONTENT CORRECTIONS 

1. Making knowledge graphs in aim to be used automatically is related to ontologies, mostly stored in RDF or OWL formats. Ontology has been mentioned only in introduction, lines 32-35. In the manuscript with this title there should be much more attention put to comparison of the proposed method of knowledge graph creation to ontology graphs and ontological languages such as RDF or OWL. It is particularly important, since in the introduction of this manuscript (lines 41-42) Google is mentioned with the use of knowledge graphs in search engines. It is also well known that for better SEO (search engine optimization) of web sites and support to web crawlers, semantics of web sites is integrated in web sites files, stored in RDF format, which is an ontology language. Therefore, this manuscript should be enhanced by background on ontologies and ontology triplets (entities, relations, attributes...) and related work analysis of automated support to ontologies, ontology languages and graphs and comparison to the proposed method of creating knowledge graphs. 

2. This manuscript relies on the help of web crawlers "We acquire the necessary TCM-related textual data by utilizing web crawlers and save it in txt format." (lines 242-243). It is not clear which particular tools, categorized as web crawlers, were used in this manuscript. It also is not clear which rules are set and how are they applied upon web crawlers (lines 254-256).

3. It is not clear what InfoBox stands for in this manuscript (line 246) - this term should be explained at first appearance with more details. 

4. The key process in this manuscript is knowledge extraction, i.e. knowledge formulation based on extracted data (performed by web crawling tools). Knowledge in this manuscript is formulated by relating entities and attributes. This relating is performed by utilizing some tools, but they are not even mentioned, but only described generally as being used as "cutting-edge technology". This is not appropriate to have too general statements in a manuscript such as this - having experimental results. Therefore, it is necessary to have section 3.2. more explained (lines 263-275), with more details on particular tools and techniques for entities and attributes extraction and relation. Currently, section 3.2. is too generally set.

5. What is "few-shot prompts" method? It has been briefly explained as a guiding method to LLM, particularly used in this manuscript, as explained in lines 204-206, that belong to section 2.3. Large Language Models. This method is emphasized as an essence of contribution in introduction: "The main contributions of this paper include: (1) Adopting LLMs for named entity recognition, utilizing few-shot learning techniques..." Therefore, this method needs to be explained separately. Some elements of this method were presented in section 3.2.2. Named Entity Recognition. It would be very beneficial if the few-shot learning method is explained by an algorithm - diagram expressing key steps of using the method.

6. Why having demonstration of method in the same section as brief explanation? It would be better to have few-shot method explained with more details in a separate subsection of the section 3. entitled Algorithm Implementation. The structure of the whole manuscript should be refined. It should be: Introduction, Background and Related work, Proposed method, Method application demonstration, Experiment (with subsections - experimental setup - having all details of used tools, experiment sample - which data sources i.e. which web sites were used for data collection etc., size of sample, results, discussion), Conclusion.

7. It is very important to use precise vocabulary in the manuscript...We have triplets that are elements of knowledge graphs, such as those connecting entities with relations or entities with attribute types and attribute values...etc...do we have "property triples" (line 253) and "triple dataset" (line 462)? What is the meaning of these words? Generally, this manuscript should have more explanations on some terms.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I am quite satisfied with the authors' responses to my questions/issues raised in my initial review.

Minor:

Is this text correct?

The appendix is an optional section that can contain details and data supplemental 654 to the main text—for example, explanations of experimental details that would disrupt 655 the flow of the main text but nonetheless remain crucial to understanding and reproduc- 656 ing the research shown; figures of replicates for experiments of which representative data 657 is shown in the main text can be added here if brief, or as Supplementary data. Mathemat- 658 ical proofs of results not central to the paper can be added as an appendix.

Author Response

Comments 1: Is this text correct?

Response 1: I'm sorry for the oversight on our part. We forgot to remove the corresponding explanatory content when adding the appendix. It has now been deleted, and we appreciate your careful pointing out once again.

Back to TopTop