A Multi-Annotator Survey of Sub-km Craters on Mars
Round 1
Reviewer 1 Report
The article presents a description of a dataset that can be used to analyze small crater statistics over MC-11 East, and allows them to better train and validate their crater detection algorithms.
It is well written and the topic is of interest to the magazine. The bibliography used is adequate.
However, in order to be accepted it is necessary to perform some changes:
a) The structure of the article should be changed. Although the article type is a "dataset description", it would nevertheless be necessary to add a discussion and conclusions section. In these sections, the analyzed data could be compared with other similar data in aspects such as the variety of data, the quality of the data ...
b) The article would also improve if it is added a more detailed description of the dataset data, with some statistics, description of the information fields, ...
c) Another improvement refers to showing an example of data application and use.
Author Response
We would like to thank the reviewer for their careful reading and helpful comments on our manuscript. Below, we address the points raised and discuss what has been done to the paper to resolve them.
a) The structure of the article should be changed. Although the article type is a "dataset description", it would nevertheless be necessary to add a discussion and conclusions section. In these sections, the analyzed data could be compared with other similar data in aspects such as the variety of data, the quality of the data ...
- We have added a discussion section (lines 206-232) which provides context for how the dataset can be used by the community, and how it compares to others' datasets. We also discuss general lessons that can be taken from our project.
b) The article would also improve if it is added a more detailed description of the dataset data, with some statistics, description of the information fields, ...
- Added Table 3 (positional accuracy of annotations) to give more statistical insight into the annotations.
- Expanded Table 2 (min/max diameters)
- Added sentence (lines 109-111) about other xml fields to better explain the data structure.
- Add more information about the lower threshold used on the diameter (lines 98-102)
- In line with comments from another reviewer for 4 of the tiles, we have also added annotations from one of our 'expert' authors, which have been used to compare with the labels of our 'non-expert' annotators. Table 4 provides statistical information about this, and it is discussed in lines 186-195.
c) Another improvement refers to showing an example of data application and use.
- We absolutely agree that an application of the dataset is important, however we would argue that this is beyond the scope of the data descriptor under review. We are currently working on a Deep Learning application of this as a separate paper, and don't wish to publish results that are premature. We hope the reviewer understands our reasoning.
Reviewer 2 Report
The article and dataset under review is well done and as far as I can see it - being a Biological Oceanographer - an important contribution to research. However, there are a few things to improve to make this article more useful. I therefore recommend a major revision.
a) The data should be submitted to a recognized international archive. A personal github repository is not useful for longterm storage. In my research field https://www.seanoe.org/ would be a suitable example that accepts data and images. I found that github repositories can be uploaded to Zenodo, a recognised archive. See https://guides.github.com/activities/citable-code/
Then the doi of the project should be cited in the article.
b) I think the authors should also provide a ground-truth dataset for some (at least 4) of the image tiles analysed and compare the ground-truth to the results obtained with the students. It would be nice if the same method of multi-user annotations could be applied as well, but with experts as the annotators. The last sentences of the article should then be rewritten to provide info about the accuracy of the ground-truth data and the entire dataset.
c) the authors should provide further analysis on the accuracy of the annotations. In Fig. 4 (or another figure) the standard deviation, standard error or other measure for the accuracy of the clustered annotations should be shown. Furthermore, a clear threshold should be provided at which the resolution of the images is too low to distinguish craters or where the craters are too large for the tile size chosen. At least the lower threshold should coincide with a decrease in the accuracy of the clustered annotations. Annotations of craters below that size should then be removed, clearly marked as ambiguous or as craters that are below the detection threshold of the method.
Minor comments:
The header of table 1 is not well defined. As far as I see it, the numbers 1,2,3,4,5 and 6 indicate the different annotators and the table list the number of annotations the respective annotator has made. However, one could also misinterprete this as a representation of the frequency distribution of where column one shows the number of craters with 1 annotation, column two the number of craters with two annotations and so on. Please improve the header of the e.g. be writing T1, T2, T3, T4, T5 and T6 and explain that T1 stands for TWINKLE student 1 etc.
In table 2, should it not read "Average annotations per crater"?
In line 101 and I think also elsewhere the authors use the term 'cluster' when they should rather use the term 'crater'. This sentence 'in each object instance denoting the number of annotations included in that cluster' would better read 'in each object instance denoting the number of annotations for the respective crater'.
Author Response
We would like to thank the reviewer for their careful reading and helpful comments on our manuscript. Below, we address the points raised and discuss what has been done to the paper to resolve them.
a) The data should be submitted to a recognized international archive. A personal github repository is not useful for longterm storage. In my research field https://www.seanoe.org/ would be a suitable example that accepts data and images. I found that github repositories can be uploaded to Zenodo, a recognised archive. See https://guides.github.com/activities/citable-code/
- We have released a version from our GitHub and linked it to Zenodo with a DOI (https://doi.org/10.5281/zenodo.3946647) and have replaced the link in the Usage Notes section. We initially thought this would be done after acceptance by a journal, but we thank the reviewer for pointing this out.
b) I think the authors should also provide a ground-truth dataset for some (at least 4) of the image tiles analysed and compare the ground-truth to the results obtained with the students. It would be nice if the same method of multi-user annotations could be applied as well, but with experts as the annotators. The last sentences of the article should then be rewritten to provide info about the accuracy of the ground-truth data and the entire dataset.
- Unfortunately a ground-truth is impossible in this setting, as higher resolution imagery is not available. However, we have now made 'expert' annotations from one of our experienced authors of 4 randomly selected tiles. This is outlined in Table 4, and lines 186-195. The expert's annotations showed high agreement with the non-experts, which is promising. We think this addition to the paper is valuable in the validation of the data, and the approach ins general, so we appreciate the reviewer's suggestion.
c) the authors should provide further analysis on the accuracy of the annotations. In Fig. 4 (or another figure) the standard deviation, standard error or other measure for the accuracy of the clustered annotations should be shown. Furthermore, a clear threshold should be provided at which the resolution of the images is too low to distinguish craters or where the craters are too large for the tile size chosen. At least the lower threshold should coincide with a decrease in the accuracy of the clustered annotations. Annotations of craters below that size should then be removed, clearly marked as ambiguous or as craters that are below the detection threshold of the method.
- We have calculated the standard deviation of diameter and position within the clustered labels. This is described in Table 3 and lines 182-185.
- We had already applied a diameter threshold of 3 pixels during the clustering process, but the reviewer correctly noted this was not explained in the manuscript. The original annotation files contain all markings, including those below the threshold, but our clustering script rejects any below 3 pixels before aggregating the rest (this explains the small discrepancy in total annotations between Tables 1 and 2). We have added an explanation in lines 98-102, and we have added minimum/maximum diameters in Table 2)
Minor comment 1: The header of table 1 is not well defined. As far as I see it, the numbers 1,2,3,4,5 and 6 indicate the different annotators and the table list the number of annotations the respective annotator has made. However, one could also misinterprete this as a representation of the frequency distribution of where column one shows the number of craters with 1 annotation, column two the number of craters with two annotations and so on. Please improve the header of the e.g. be writing T1, T2, T3, T4, T5 and T6 and explain that T1 stands for TWINKLE student 1 etc.
- We have changed the headers to roman numerals, and altered the caption to further clarify.
Minor comment 2: In table 2, should it not read "Average annotations per crater"?
- Changed column header accordingly
Minor comment 3: In line 101 and I think also elsewhere the authors use the term 'cluster' when they should rather use the term 'crater'. This sentence 'in each object instance denoting the number of annotations included in that cluster' would better read 'in each object instance denoting the number of annotations for the respective crater'.
- altered wording to 'crater' (now line 115)
Round 2
Reviewer 1 Report
Dear Authors:
The changes done are enough. The paper can be published
Best regards
Reviewer 2 Report
None