Next Article in Journal
Breaking the Chains of Open Innovation: Post-Blockchain and the Case of Sensorica
Previous Article in Journal
Triadic Automata and Machines as Information Transformers
Previous Article in Special Issue
A Big Data Analytics Approach for the Development of Advanced Cardiology Applications
 
 
Article
Peer-Review Record

A Study on Ranking Fusion Approaches for the Retrieval of Medical Publications

Information 2020, 11(2), 103; https://doi.org/10.3390/info11020103
by Teofan Clipa 1 and Giorgio Maria Di Nunzio 1,2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Information 2020, 11(2), 103; https://doi.org/10.3390/info11020103
Submission received: 7 December 2019 / Revised: 22 January 2020 / Accepted: 7 February 2020 / Published: 14 February 2020
(This article belongs to the Special Issue Big Data Evaluation and Non-Relational Databases in eHealth)

Round 1

Reviewer 1 Report

The submission presents a large amount of scientific and well-structured and systematic experiments for comparing different approaches for Information retrieval.

Overall, as such it presents relevant work for the community and should be accepted. The paper is also elaborate, very detailed (allows re-implementation) and shows very many experiments.

 

However, there are some issues which are missing.

The issue of the collection being a medical collection is emphasized in the title as well as in the keywords. However, this aspect of the work is not dealt with neither in the introduction, the abstract nor in the description of the collection.

The research questions and the example (tropical fish) are also unrelated to the domain.

Some of the specific challenges of the domain should be mentioned.

In the section Experiments, there needs to be a more elaborate introduction to the CLEF lab. Explain background and scientific challenges, give examples for the data.

 

The section Background gives a through overview on the models. It is very nice that compared to many other IR papers, the authors exactly explain which variant of a model they use (e.g. TF without log).

 

2.2.2 LM should be spelled out in the section heading, also for 2.2.3

 

2.2.4. A general heading should be used, e.g. Word Embeddings, instead of one particular implementation

 

3.2. section heading should also be more elaborate, just mentioning a software does not help the reader who is not fully familiar with the area. (e.g. an interested medical expert)

 

BM25 does not outperform a simple TFIDF approach. Try to give some explanation. Could it have to do with the length of the documents which might be rather similar?

 

The statistical analysis of the results is very solid.

 

The abstract needs to include a brief list of the approaches tested to give the reader a better idea. Especially, considering the recent interest in word embeddings, it is good to mention that they were also considered.

Also the fact that a CLEF collection was used is important to mention to emphasize the strong empirical focus of the paper and the fact that a re-usable collection was used.

 

The performance of the best systems at the CLEF lab should be mentioned to put the results into better context for the reader.

Word embeddings are currently discussed as a very promising issue for IR. The paper brings the experiments with them towards the end. It is unclear why this is the case.

 

 

“retrieve millions of relevant results” since a user cannot even consider a million documents, they cannot be relevant. Actually, a hit list of a million is useless.

 

It is very positive that the data is available at a github page. The authors should consider to put all experiments also on Github.

 

Figure 4: it is widely agreed in the community that P@10 is a unreliable and unstable metric. The conclusions we can draw form it are very limited. This needs to be clarified for the reader. Also results based on P@10 are not a good start for the results section.

 

The boxplots seem to show the distribution of the performance for the topics. I could not find that information in the paper.

Font size in the figures is too small.

 

Details:

 

faster access to the collection -> faster access to collections

of the computers -> of  computers

Author Response

We would like to thank the reviewer for the helpful comments.
We prepared a list of items concerning the issues raised in the review.

Best Regards

Giorgio and Teofan

=========================

Reviewer 1

The issue of the collection being a medical collection is emphasized in the title as well as in the keywords. However, this aspect of the work is not dealt with neither in the introduction, the abstract nor in the description of the collection.

- We have described the problem of medical retrieval in the abstract and in the introduction (as well as in the experiments).


The research questions and the example (tropical fish) are also unrelated to the domain.

- We have fixed this issue.


Some of the specific challenges of the domain should be mentioned.

- We have fixed this issue in the introduction.


In the section Experiments, there needs to be a more elaborate introduction to the CLEF lab. Explain background and scientific challenges, give examples for the data.

- We have added a summary of the description of CLEF and CLEF eHealth both in the introduction and in the experiments.


2.2.2 LM should be spelled out in the section heading, also for 2.2.3

- We have fixed this issue.


2.2.4. A general heading should be used, e.g. Word Embeddings, instead of one particular implementation
+
3.2. section heading should also be more elaborate, just mentioning a software does not help the reader who is not fully familiar with the area. (e.g. an interested medical expert)

- We have fixed this issue (and modified also the other headings accordingly).


BM25 does not outperform a simple TFIDF approach. Try to give some explanation. Could it have to do with the length of the documents which might be rather similar?


- We have included some thoughts in the conclusions. It is very hard to tell why some models are slightly better/worse than others given that the pipeline of the text processing and weighting contains many steps (see for example https://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002/asi.23910)


The abstract needs to include a brief list of the approaches tested to give the reader a better idea. Especially, considering the recent interest in word embeddings, it is good to mention that they were also considered.
Also the fact that a CLEF collection was used is important to mention to emphasize the strong empirical focus of the paper and the fact that a re-usable collection was used.


- We have fixed this issue by extending the abstract.


The performance of the best systems at the CLEF lab should be mentioned to put the results into better context for the reader.
Word embeddings are currently discussed as a very promising issue for IR. The paper brings the experiments with them towards the end. It is unclear why this is the case.

- We have added some comments and insights in the conclusion section.


"retrieve millions of relevant results" since a user cannot even consider a million documents, they cannot be relevant. Actually, a hit list of a million is useless.

- We have fixed this issue.


Figure 4: it is widely agreed in the community that P@10 is a unreliable and unstable metric. The conclusions we can draw form it are very limited. This needs to be clarified for the reader. Also results based on P@10 are not a good start for the results section.

- Despite being unstable, P@10 is widely used in the CLEF evaluation forum (as well as other evaluation initiative like TREC and NTCIR) to compare high precision models. We have justified the use of P@10 in the beginning of Results section.


The boxplots seem to show the distribution of the performance for the topics. I could not find that information in the paper.

- We have added this information at the beginning of the Results section.


Font size in the figures is too small.

- We have fixed this issue.


faster access to the collection -> faster access to collections

- Fixed.


of the computers -> of computers

- Fixed.

Reviewer 2 Report

The article presents the results of a study comparing different approaches to the information retrieval of records in a medical database. The study is well designed and conducted. Results are interesting and merit publication.

My main concern regards the presentation of the text. The manuscript is very long and could be shorten without losing any relevant information. I suggest the authors to use a more direct language. A few examples, just from the abstract:

“In this work we wanted to compare and analyze a variety…” (l. 1) > “We compare a variety…”

“We found that query expansion and relevance improve…” (l. 6) > “Query expansion and relevance improve…”

“We also conducted statistical analysis of the runs and found that by applying QE+RF…” (l.8) > “By applying QE+RF…”

The previous three examples are just from the abstract, but the whole manuscript is in need of similar improvements. Additionally, there is no need to repeat the same information several times throughout the article. For instance, RQ1 is literally repeated three times in lines 41, 607 and 813 (and the same happens with RQ2 and RQ3).

The background section is very extensive (page 2 to 14). It reminds the literature review in a master dissertation or a similar academic essay. It may be interesting for readers without a background in information retrieval, but again the writing needs to be improved using a more direct language.

The methods are sound and the results are well presented.

Author Response

We would like to thank the reviewer for the helpful comments.
We prepared a list of items concerning the issues raised in the review.

Best

Giorgio and Teofan


==============================

Reviewer 2

My main concern regards the presentation of the text. The manuscript is very long and could be shorten without losing any relevant information. I suggest the authors to use a more direct language. A few examples, just from the abstract; [...].

- We have reviewed the whole paper, from abstract to conclusion, and shortened/simplified it when possible.

The background section is very extensive (page 2 to 14).

- We had the same doubts but we decided to keep it given the suggestions of reviewer 1 who appreciated the effort we put in the background in order to make the approaches clear for non experts in IR. We also reorganized some subsections in order to make the presentation of the topics easier to follow.

 

 

Back to TopTop