Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Document Retrieval System for Biomedical Question Answering

Appl. Sci. 2024, 14(6), 2613; https://doi.org/10.3390/app14062613

by Harun Bolat

and Baha Şen^*

Reviewer 1:

Versavia Maria Ancușa

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2024, 14(6), 2613; https://doi.org/10.3390/app14062613

Submission received: 24 January 2024 / Revised: 9 March 2024 / Accepted: 18 March 2024 / Published: 20 March 2024

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

First, let me congratulate you on the amount of work it took to run all the variations, it speaks a lot to your commitment. I also like how you resumed your findings in the conclusion part.

On the minus side, the methods that were run, although scientifically sound, tend to be quite old. Have you considered using neural networks / transformers? Also, one big minus for me was the lack of comparison with other document retrieval systems, even considering some from other domains.

One interesting aspect that is not touched upon is time performance, especially since you query the database multiple times.

Since the document retrieval system described is part of a larger biomedical question answering system, you should mention where that system is available.

The references are quite old and with problems in their formatting. For example, reference [3] is clearly a fragment.

Side note, Scispacy is not new, it's from 2019. Yes, it's been updated since, but the starting point is 5 years ago.

Comments on the Quality of English Language

There are a lot of problems with correct article / preposition usage or presence, distracting a lot from the read quality. E.g.:

To build a system Indri

re-ranking according to the presence

If the document contains all

Some paragraphs feel disjointed and sound forced.

The whole document would benefit from being run through a tool, like Grammarly or being reviewed by a native English speaker.

Author Response

On the minus side, the methods that were run, although scientifically sound, tend to be quite old. Have you considered using neural networks / transformers? Also, one big minus for me was the lack of comparison with other document retrieval systems, even considering some from other domains.

By using information retrieval techniques and semantic similarity along with textual similarity, the disadvantages of the old methods have been tried to be eliminated. At the end of the study, we observed that the performance of the old methods, as we used them, did not surpass a certain level. These results showed us that transformer-based models need to be used. We did not make any changes to the system during the revision stage because there was not enough time to use transformer-based models. However, we will attempt to improve the results by using transformer (BERT) based or embedding-based models for both document retrieval and answer extraction systems.

A comparison with the performances of other systems developed within the scope of the BioASQ challenge has been added.

One interesting aspect that is not touched upon is time performance, especially since you query the database multiple times.

Making multiple database queries decreases performance. This method has been used as it improves the accuracy performance of the system. Transformer-based models will be employed to enhance both time and accuracy scores.

Since the document retrieval system described is part of a larger biomedical question answering system, you should mention where that system is available.

The document retrieval system, which is a part of the question-answering system, has been developed. It has not been published publicly at the moment. It is planned to be released as an application with subsequent improvements.

The references are quite old and with problems in their formatting. For example, reference [3] is clearly a fragment.

More recent references have been added, and existing references have been corrected.

Side note, Scispacy is not new, it's from 2019. Yes, it's been updated since, but the starting point is 5 years ago.

The use of more up-to-date libraries is planned alongside transformer-based models.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have chosen an interesting problem. But, there are some concerns as follows:

1. What are the main contributions by the authors? It's hard to interpret the novelty.

2. The organization of the paper is not understandable. The literature review and proposed methodology should be clearly separated.

3. References are pretty old., which further reduced the significance of the work.

4. The results are not compared with state-of-the art and latest published work.

Comments on the Quality of English Language

English is Ok but check for minor corrections.

Author Response

What are the main contributions by the authors? It's hard to interpret the novelty.

Ranking algorithms have been tested under different scenarios, and a ranking algorithm that performs well under the identified scenarios has been determined. The impact of using query expansion techniques in various scenarios (Pos tagger, UMLS Services) on document retrieval performance has been examined. For answer extraction, NER, UMLS CUI, Semantic Type, and Semantic Group features have been used, resulting in a performance improvement. However, it has been observed that these techniques do not achieve the desired performance, and the use of transformer-based models has been recommended.

The organization of the paper is not understandable. The literature review and proposed methodology should be clearly separated.

The sections of literature review and methodology were organized in a way to distinguish them from each other.

References are pretty old., which further reduced the significance of the work.

More up-to-date references have been added from the systems developed within the scope of the BioASQ challenge. Additionally, existing references have been corrected.

The results are not compared with state-of-the art and latest published work

A comparison has been added with the current systems developed within the scope of the BioASQ challenge.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Lines 62-64: this sentence needs an appropriate reference to provide the reader with a source for more information on the topic.

The text on lines 60-107 should actually be part of the introduction, not of the Materials and methods section.

Lines 111-123: because these concepts are not the creation of the papers’ authors but part of the current scientific knowledge, they need to be accompanied by appropriate references. The same holds true for lines 126-142. For all equations, please clarify the meaning of each term/symbol used (e.g. in equation 6, what QT, QO, QU, stand for?; in equation 8, what stands Simsentence, Simner for etc)

“nltk pos tagger package” – please cite appropriately this package. The same should be done for scispaCy, spaCy,

Lines 274-280: these seem to be results, therefore they should be placed in the Results section, this section should only cover methodology.

It was not clear to me what software/programming languages/libraries were used to make various computations, including performance metrics (accuracy, recall, precision etc).

The table captions should be characterize contents (and be self-explanatory): what is the difference between Table 6 and Table 8, for instance? (they have identical captions).

The biggest issue with this paper is the performance, which seems abysmally low (none of the parameters is above 0.5). Apparently the authors selected the weights based on those results, despite being very poor. Moreover, the authors do not discuss at all the low performance, the potential causes and strategies of improvement. They only mention that they intend to improve the system using deep learning. As a scientist, I do believe in the relevance of negative results studies, but they need to be acknowledged as such and interpreted critically. In its current form, I do not think that this paper can be published, but with a critical discussion of its approach, limitations and potential strategies of improvement it could be publishable.

Comments on the Quality of English Language

There is a need to review the correctness of many sentences in English. Most often it is clear what the authors intend to say, but the phrasing is very often unnatural/clumsy.

Author Response

Lines 62-64: this sentence needs an appropriate reference to provide the reader with a source for more information on the topic

References have been added.

The text on lines 60-107 should actually be part of the introduction, not of the Materials and methods section.

The specified lines have been moved to the introduction section.

Lines 111-123: because these concepts are not the creation of the papers’ authors but part of the current scientific knowledge, they need to be accompanied by appropriate references. The same holds true for lines 126-142. For all equations, please clarify the meaning of each term/symbol used (e.g. in equation 6, what QT, QO, QU, stand for?; in equation 8, what stands Simsentence, Simner for etc)

References have been added for ranking algorithms. The meanings of terms and symbols used in the formulas have been specified.

nltk pos tagger package” – please cite appropriately this package. The same should be done for scispaCy, spaCy,

References have been added for the specified packages.

Lines 274-280: these seem to be results, therefore they should be placed in the Results section, this section should only cover methodology.

No changes were made because it was not fully understood.

It was not clear to me what software/programming languages/libraries were used to make various computations, including performance metrics (accuracy, recall, precision etc).

The statement that all modules of the application were developed using the Python programming language has been added.

The table captions should be characterize contents (and be selfexplanatory): what is the difference between Table 6 and Table 8, for instance? (they have identical captions).

The table headings have been reviewed. The heading for Table 8 has been corrected.

The biggest issue with this paper is the performance, which seems abysmally low (none of the parameters is above 0.5). Apparently the authors selected the weights based on those results, despite being very poor. Moreover, the authors do not discuss at all the low performance, the potential causes and strategies of improvement. They only mention that they intend to improve the system using deep learning. As a scientist, I do believe in the relevance of negative results studies, but they need to be acknowledged as such and interpreted critically. In its current form, I do not think that this paper can be published, but with a critical discussion of its approach, limitations and potential strategies of improvement it could be publishable.

Comparison with the performances of other systems developed within the scope of the BioASQ challenge has been added. Generally, the performance of non-transformer-based systems for document retrieval developed within the BioASQ challenge is below 0.5. Our proposed system yields better results than some other systems using similar methods. However, it is below the expected level.

Additions have been made to the conclusion section for performance issues and improvement methods.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The dedication evident in the development of this paper is commendable; however, its significance is diminished by the absence of contemporary methodologies, such as transformer-based models, which are pivotal for achieving state-of-the-art outcomes. Given the evolving landscape of research methodologies and the growing reliance on advanced technologies, there arises a pressing need for a thorough reassessment of the approach adopted, aligning it more closely with current paradigms and technological advancements.

Comments on the Quality of English Language

Is improved, but still needs some work. Some minor mistakes like for example the phrase "The BioASQ challenge include two tasks which includes", which should be corrected to "The BioASQ challenge includes two tasks, which are."

Some phrases still feel forced, like "Within this organization, various benchmarks have been provided to assess researchers of QA systems" which could be slightly modified for clarity: "Within this organization, various benchmarks have been established to evaluate researchers' QA systems."

Author Response

First of all, we would like to thank you for your valuable evaluations. The assessments you provided have been quite guiding for us. Thanks to this, we have obtained the opportunity to refine the article a bit further. You can find our responses related to the evaluations you made below.
You are absolutely right in your comments and recommendations. In the continuation of this study, we will first re-implement the same processes using transformer-based models as you suggested. Due to the limited time for the revision, we were only able to make corrections based on your and other respected reviewers' feedback.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Minor issues raised with the first round of peer-review have been solved. With respect to the performance assessment and context, the authors have added performance from other studies and BioASQ challenge held in 2022 and 2023, which is an improvement. However, their performance seems still to be inferior to some of the results in those competition and even in their absence, it should be acknowledge that the current performance is rather low and an improvement is needed. This is not mentioned in the discussion, leaving the impression to the reader that this work is a great improvement. It seems to be just "another attempt with low results as often happen in the field".

Comments on the Quality of English Language

There is a need to review the correctness of a number of sentences in English. Most often it is clear what the authors intend to say, but the phrasing is very often unnatural/clumsy.

Author Response

Since algorithms using the bag of words method were used in the study, performing the system is lower than embeddings-based / transformer-based models. However, within the scope of the study, different ranking algorithms were tested for biomedical document retrieval. Different query expansion models were used for biomedical document retrieval, and a query expansion model that gave good results was determined. For biomedical document retrieval, other systems developed using classical methods and systems developed using transformer-based methods are added to the results section for comparison. It is seen in the tables that transformer-based models give better results.

Using NER, CUI, Semantic Type, and Semantic group services significantly improved answer extraction performance compared to using only text similarity.
In the continuation of this study, we will improve the system by using more up-to-date methods and algorithms to get better results.

Author Response File: Author Response.pdf

Round 3

Reviewer 1 Report

Comments and Suggestions for Authors

I really appreciate your hard work and how you took the suggestions previously given to improve your paper.
I understand the time pressure of getting this paper done, but I cannot get over the fact that the methodologies miss the one that revolutionized the field: transformers, especially as the new and improved state of the art presents chatgpt approaches (ref. 21)

Comments on the Quality of English Language

You need to further check your English, as there are small errors. For example, lines 428-430:

101 questions are used to test the system performance. For each question, 10 and 20 sentences with the highest similarity score are taken as answers. The similarity score evaluation results calculated with different weights are given in the Table 18 and Table 19

You have to remove passive voice misuse + others:

To test the system's performance 101 questions were used . For each question were taken the answers with the highest similarity score, in samples of 10 and 20 sentences . Table 18 and Table 19 present the similarity score evaluation results calculated with different weights.

Author Response

First of all, thank you very much for sparing your valuable time and evaluating our article. We have the chance to improve our article with your comments. Your valuable comments guide us.

As you stated in your comment, transformer-based methods are more up-to-date and provide better results. After we received your evaluation report, we started working on using the transformer-based BERT algorithm. Unfortunately, due to lack of time, we have not been able to add it to the application yet, but we continue to work on the subject. I hope we will soon include a transfer-based method in the system.

We also double-checked the language of the entire article and made corrections. We tried to minimize the missing or incorrect parts of the language.

We thank you again for your valuable comments and offer our respects.

Author Response File: Author Response.pdf

Article Menu

Document Retrieval System for Biomedical Question Answering

Further Information

Guidelines

MDPI Initiatives

Follow MDPI