Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Printed Edition

A printed edition of this Special Issue is available at MDPI Books....

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Deep Instance Segmentation of Laboratory Animals in Thermal Images

Appl. Sci. 2020, 10(17), 5979; https://doi.org/10.3390/app10175979

by Magdalena Mazur-Milecka^*

, Tomasz Kocejko^*

and Jacek Ruminski^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2020, 10(17), 5979; https://doi.org/10.3390/app10175979

Submission received: 9 July 2020 / Revised: 6 August 2020 / Accepted: 25 August 2020 / Published: 28 August 2020

(This article belongs to the Special Issue Machine Learning for Biomedical Application)

Round 1

Reviewer 1 Report

The manuscript presents an approach of automatic segmentation of thermal images of rats for determination of different subjects. The motivation is observation of behavioral patterns of animals non-interrupted by human observers or bright light. Such information could provide better insight into the animal behavior.

Two approaches of image segmentation are considered, Mask R-CNN and TensorMask. The latter performs better, when pre-training is used, while the former performs when the training is performed from scratch.

The manuscript is well written and the topic is presented in a clear way.

My comments are:

This study is a sequel of a previous study reported by the same authors (ref. [37]). The authors should point this out in the introduction and inform a reader about the past study results and the motivation for the new study.
In Subsection 2.1, the authors mention that PANet and HTC can perform better than Mask R-CNN. Also in the previous publication [37] they mention that they would test these methods for segmentation. Why did they include only Mask R-CNN.
In 3. Methods, I am missing some information: room temperature, cage size, IR camera setup, state of the animals, etc.
Methods should also include a statement about official approval for the animal study issued by the responsible ethics commission.
1. Data preprocessing, the steps should be adequately explained, also including the motivation for specific steps (e.g., reduction of the bit depth). I would also exclude “ch3” from the manuscript, since it does not add any valuable information to the manuscript.
4. Learning configurations: In the abstract, it was mentioned that the visible images were used to pre-train the networks. I am missing the explanation of the procedure. 3-fold cross-validation was used, why not 10-fold? Different algorithms were studied. The authors should provide more information about the actual software (algorithm versions, programming language, OS).
Results, ls. 219-224: three different performance metrices were calculated (mAP, AP50, AP75). Although well known in the field, a brief explanation should be provided. Also a detailed explanation of the golden standard used to calculate the metrices should be provided (ie., the manual determination of the boundaries).
9, Fig. 2: Can you provides insets of the critical regions? It is difficult to observe the regions where two animals are in close vicinity.
In 5. Discussion, the observed differences should be explained based on the differences of the algorithms. Also a comparison between this study and the previous study results should be included. The motivation, e.g. animal behavior observation, should be discussed based on the results. Can this approach be used for other application in addition to the rat observation?
I’d love to have some information about the time demand for the training and segmentation? Could it be done in real-time?
How would the algorithms perform if more than two animals would be present?
There are some info missing in the references, e.g. [36], [37].

Author Response

Dear Reviewer

Thank you for taking your time to read our paper. We are grateful for your constructive comments. All of them made valuable improvements to our paper. We have carefully reviewed the comments and have revised the manuscript accordingly. Our responses are given in a point-by-point manner bellow.

We hope the revised version is now suitable for publication.

Reviewer 1.

The manuscript is well written and the topic is presented in a clear way.

My comments are:

This study is a sequel of a previous study reported by the same authors (ref. [37]). The authors should point this out in the introduction and inform a reader about the past study results and the motivation for the new study.

We introduced the appropriate changes. Thank you.

Changes: Lines 65-70

In Subsection 2.1, the authors mention that PANet and HTC can perform better than Mask R-CNN. Also in the previous publication [37] they mention that they would test these methods for segmentation. Why did they include only Mask R-CNN.

Response: In this paper we decided to compare two different approaches: single- and two-stage methods. That is why we decided to perform experiments with two models: Mask R-CNN and TensorMask. HTC architecture is an example of another group - multi-stage object detectors (cascade) [1]. PANet is strongly based on Mask R-CNN architecture [2]. Because Mask R-CNN is still a state-of-the-art and very popular architecture, it seems appropriate to us to investigate the results for this base algorithm first. Also, PANet results outperforms Mask R-CNN architecture but re-implemented by the PANet authors (line 94). Mask R-CNN implementation is still being improved, new architectures should be compared to latest version from the authors.

In total, we have tested 24 different model configurations for pre-trained learning and 22 for training.

We still plan to evaluate other methods in the future as we continue our research. We consider the study to evaluate the possibility to detect (instance segmentation) smaller regions within a rat like a saliva trace as a result of a bite, etc.

References:

1. Chen, Kai & Ouyang, Wanli & Loy, Chen Change & Lin, Dahua & Pang, Jiangmiao & Wang, Jiaqi & Xiong, Yu & Li, Xiaoxiao & Sun, Shuyang & Feng, Wansen & Liu, Ziwei & Shi, Jianping. (2019). Hybrid Task Cascade for Instance Segmentation. 4969-4978. 10.1109/CVPR.2019.00511.

2. Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, Jiaya Jia; Path Aggregation Network for Instance Segmentation; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8759-8768

In 3. Methods, I am missing some information: room temperature, cage size, IR camera setup, state of the animals, etc.

We introduced the appropriate changes. Thank you.

Changes: Lines 133-139

Methods should also include a statement about official approval for the animal study issued by the responsible ethics commission.

We introduced the appropriate changes. Thank you.

Changes: Lines 141-144

1. Data preprocessing, the steps should be adequately explained, also including the motivation for specific steps (e.g., reduction of the bit depth).

We introduced the appropriate changes. Thank you.

Changes: Lines 146-150 and 154-161

I would also exclude “ch3” from the manuscript, since it does not add any valuable information to the manuscript.

Response: The results showed that for the ch3 image training from scratch gave much worse results than for training on models pre-trained on different (visible) images. It can therefore be assumed that having a similar database where the image data is incomplete (blurred, obscured, overexposed, etc.) training the existing pre-trained models will improve the prediction results.

Another conclusion is that the ch3 images achieved a much larger difference between segmentation and bbox detection than the other images (Table 2. and 4.). The bbox detection results were lower than the remaining images but significantly less than the segmentation results.

We believe these results may also be useful even if the results are not the best.

4. Learning configurations: In the abstract, it was mentioned that the visible images were used to pre-train the networks. I am missing the explanation of the procedure.

We introduced the appropriate changes. Thank you.

Changes: Lines 190-194

Response: It is also described in lines 173-175

3-fold cross-validation was used, why not 10-fold?

Response: As the number of folds gets larger, the size of the subsets gets smaller. For database of 200 training images 10-fold cross-validation means, that validation/test groups of data samples may be not large enough to be statistically representative. We understand, that more folds means better evaluation considering the randomness of data. However, in the performed experiments we have repeated the cross-validation procedure to get different samples in the train/validation/test datasets. The obtained results were very similar.

Different algorithms were studied. The authors should provide more information about the actual software (algorithm versions, programming language, OS).

We introduced the appropriate changes. Thank you.

Changes: Lines 175, 192-193, 200

Results, ls. 219-224: three different performance metrices were calculated (mAP, AP50, AP75). Although well known in the field, a brief explanation should be provided.

We introduced the appropriate changes. Thank you.

The entire new section was added: Lines 201-231

Also a detailed explanation of the golden standard used to calculate the metrices should be provided (ie., the manual determination of the boundaries).

We introduced the appropriate changes. Thank you.

Changes: Lines 186-189

9, Fig. 2: Can you provides insets of the critical regions? It is difficult to observe the regions where two animals are in close vicinity.

We introduced the appropriate changes. Thank you.

Changes: Zoomed images added in Fig. 2.

In 5. Discussion, the observed differences should be explained based on the differences of the algorithms. Also a comparison between this study and the previous study results should be included. The motivation, e.g. animal behavior observation, should be discussed based on the results. Can this approach be used for other application in addition to the rat observation?

We introduced the appropriate changes. Thank you.

Changes: Lines 251-261, 315-320, 371-376

I’d love to have some information about the time demand for the training and segmentation? Could it be done in real-time?

Response: It is not possible to be done in real-time on NVIDIA DGX-1 Station. The training time varied considerably depending on the learning parameters and models used. The pre-trained models did not demand so many epochs as models trained from scratch. It took about 50 minutes to train the 2,000 epochs of the pre-trained model, and the time needed to train 100,000 epochs of the model from scratch is several hours. Time of inference was about one second for image. We believe, that the model could be tuned in the future to work in near-real time, but it was not the goal of this paper. However, this is an interesting idea for another study. Thank you.
There were no significant differences in training time between the Mask R-CNN and the TensorMask models.

How would the algorithms perform if more than two animals would be present?

Response: We suspect that increasing the number of objects would not affect the way the algorithms work. It could only affect the learning and inference time. However, this is an interesting idea for another study. Thank you.

There are some info missing in the references, e.g. [36], [37].

Response: Thank you so much for catching these confusing errors, which we have now corrected.

Reviewer 2 Report

This work looks very interesting. A detailed review of modern image segmentation techniques is provided. Segmentation methods have been successfully adapted to studies of laboratory animal behavior using thermal imaging. I recommend the paper for publishing in this journal.

I would recommend reviewing the introduction and moving the main contributions of this paper to Conclusion.

Author Response

Dear Reviewer

We hope the revised version is now suitable for publication.

Reviewer 2.

I would recommend reviewing the introduction and moving the main contributions of this paper to Conclusion.

We introduced the appropriate changes. Thank you.

Changes: Lines 124-125, 357-370

Reviewer 3 Report

The manuscript is not suitable in its current state to be published by a scientific journal due to various issues. Firstly, the structure is not adequate for scientific work. It is halfway between a narrative review, a technical report, and experimental work, making it unclear what it is about. This also means that the length of some sections is very long, that it lacks sufficient references, or that no statistical analysis has been carried out.

Introduction: There are a lot of sentences without reference. I would suggest to authors to include in all the sentences (except in sentences that are ideas of the authors, but there are a few numbers) references to support them.

Introduction, first paragraph: In my opinion, this paragraph is focused on more than one idea. It could be recommended to divide into more paragraphs, to be related to each paragraph with a different idea, which could improve the reading. In my opinion, the paragraph will be divided in two from line 32.

Introduction, line 40, “[6], [7]” should be “[6,7].

Introduction, line 41, “body temperature”: Body temperature could be understanding as core temperature. Therefore, it is recommended to use a more accurate term as a surface or skin temperature. Please, check all the manuscript following this idea.

Lines 65-152: In my opinion, there is no clear if the work tries to be a narrative review or an experimental work. I think that it is closer to be an experimental work, and for this reason, I think that the paper will be more attractive if it follows a conventional structure:

From line 85 to 152, synthesize all this information and include in the main body of the introduction. If your work is experimental, it is not convenient to distract the reader so much. Try that your introduction is not so long so that the reader comes with energy to read the most important aspects of his work that will be the methodology, results, and discussion sections.
Lines 65-84, I suggest remove all this information and to include the objectives of the study at the end of the introduction.

Results: Statistical analyzes should be performed for comparisons between methods. In my opinion, carrying out the appropriate statistical analyzes would strengthen the ideas and conclusions of the work. Therefore, a statistical analysis section should be included at the end of the methodology section.

Discussion: Please, start the section with a paragraph about the aim of your study and your main results.

Discussion: The discussion does not have references. Some sentences should be referenced to support them. On the other hand, the results obtained by the present work should be compared with previous studies that have carried out similar analyzes.

Author Response

Dear Reviewer

We hope the revised version is now suitable for publication.

Reviewer 3.

We introduced the appropriate changes. Thank you.

Changes: Lines: 17, 24, 35

We introduced the appropriate changes. Thank you.

Changes: Line 34

Introduction, line 40, “[6], [7]” should be “[6,7].

Response: Thank you so much for catching this confusing error, which we have now corrected.

We introduced the appropriate changes. Thank you.

Changes: The “body temperature” changed for “surface temperature” throughout the document.

From line 85 to 152, synthesize all this information and include in the main body of the introduction. If your work is experimental, it is not convenient to distract the reader so much. Try that your introduction is not so long so that the reader comes with energy to read the most important aspects of his work that will be the methodology, results, and discussion sections.

We introduced the appropriate changes. Thank you.

Changes: Lines 34 - 126 changed as suggested.

Lines 65-84, I suggest remove all this information and to include the objectives of the study at the end of the introduction.

We introduced the appropriate changes. Thank you.

Changes: Lines 357-370

We were not using the statistical testing of hypothesis but we used the de facto standard quantitative evaluation metrics that can be directly used to compare with other and future studies.

To make it more clear we introduced a new section (2.4) with the precise description of the metrics used in this study.

Changes: In lines: 201-231 , the metrics used are explained in more detail.

Comment: Metrics that were used in the paper (mAP, AP50, AP75 and indirectly IoU) are in fact a form of statistical analysis (based on precision-recall metrics).

mIOU gives average intersection over union across all segmentation classes in a semantic segmentation problem, giving all classes equal weight importance.

mAP gives average accuracy of predicted object locations across all the object predictions, matched to ground truth object predictions, and giving each object equal importance (with a few caveats. First off, it is normally done separately per class so you get an mAP value for each class, and secondly, mAP is done using bounding boxes with a set threshold of overlap to be considered a match, though theoretically could be done with pixel-wise IoU for a finer analysis).

In instance segmentation (pixel based) we are evaluating every pixel that should belong to each object (many pixels, many objects). That is why we are using the state-of-the-art quantitative analysis based on mean average precision (mAP) or metrics based on Intersection over Union (IoU). As the first reviewer mentioned ( those metrics are “well known in the field”) it is the current state of the art and it is used commonly in quantitative analysis and comparison of results. It is currently (2019-2020) used in most high-quality papers focused on application of instance segmentation for different domains (medical, biological, material studies, etc.). Some examples include (also papers from the prestigious conferences in the field of computer vision and image processing like CVPR, ICCV, ECCV, NIPS/NeurIPS etc.):

https://www.nature.com/articles/s42003-020-0905-5

https://www.nature.com/articles/s41699-020-0137-z

https://openaccess.thecvf.com/content_CVPR_2019/papers/Ahn_Weakly_Supervised_Learning_of_Instance_Segmentation_With_Inter-Pixel_Relations_CVPR_2019_paper.pdf

https://openaccess.thecvf.com/content_CVPR_2019/papers/Qi_Amodal_Instance_Segmentation_With_KINS_Dataset_CVPR_2019_paper.pdf

https://openaccess.thecvf.com/content_CVPR_2019/papers/Zhu_Learning_Instance_Activation_Maps_for_Weakly_Supervised_Instance_Segmentation_CVPR_2019_paper.pdf

https://openaccess.thecvf.com/content_CVPR_2019/papers/Yi_GSPN_Generative_Shape_Proposal_Network_for_3D_Instance_Segmentation_in_CVPR_2019_paper.pdf

https://openaccess.thecvf.com/content_CVPR_2019/papers/Hou_3D-SIS_3D_Semantic_Instance_Segmentation_of_RGB-D_Scans_CVPR_2019_paper.pdf

Modern datasets and instance segmentations use mAP for instance to instance overlap calculations during matching, essentially as you suggest above.

Discussion: Please, start the section with a paragraph about the aim of your study and your main results.

We introduced the appropriate changes. Thank you.

Changes: Lines 259-262

Response: The main results are listed in the Conclusions.

We introduced the appropriate changes. Thank you.

Changes: lines 318, 324, 331. The comparison with the previous work is included in the Discussion section (lines 315-320).

Round 2

Reviewer 1 Report

My comments were addressed. I'm looking forward to further reports about thermography imaging from the authors.

Article Menu

Printed Edition

Deep Instance Segmentation of Laboratory Animals in Thermal Images

Further Information

Guidelines

MDPI Initiatives

Follow MDPI