Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessTechnical Note

Peer-Review Record

A Fast Method for the Selection of Samples in Populations with Available Genealogical Data

Diversity 2022, 14(2), 150; https://doi.org/10.3390/d14020150

by Dalibor Hršak¹

, Ivan Katanić²

and Strahil Ristov^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Diversity 2022, 14(2), 150; https://doi.org/10.3390/d14020150

Submission received: 27 January 2022 / Revised: 16 February 2022 / Accepted: 17 February 2022 / Published: 20 February 2022

(This article belongs to the Section Animal Diversity)

Round 1

Reviewer 1 Report

This is a concise and pithy study presenting a couple of faster (NK2, more exhaustive, and NK, greedy) algorithms for the sample selection problem (until now, N3 complexity). The usable codebase accompanies the study.

There are two issues that need to be addressed by the authors:

Just how "optimal" is the NK2 algorithm? How does it compare, performance-wise, with the established N3 algorithm? In general, there is no comparative (inter-algorithm) benchmarking in the manuscript.

"The two subtree problems can be solved independently as the choice of individuals from one subtree does not affect the values that will be assigned to edge in the other subtree" --- this is an intuitively appealing assumption, but is it really true? Could the authors explain? Also, the pedigree is considered as a forest of separate trees. Are they independent?

Provided these two issues are addressed (especially the comparative benchmarking), this promises to be a very useful study.

Author Response

Point 1: Just how "optimal" is the NK2 algorithm? How does it compare, performance-wise, with the established N3 algorithm? In general, there is no comparative (inter-algorithm) benchmarking in the manuscript.

Response 1: The new heuristic (greedy) algorithm works somewhat differently than the previous one implemented in Magellan software. It is interesting to note that, while much faster, it also produces slightly better results, i.e., larger scores. However, the differences in scores are very small, as well as the differences in scores compared to the optimal algorithm. We have added Table 2 that includes a few results to illustrate this fact, together with some additional text to that effect.

Point 2: "The two subtree problems can be solved independently as the choice of individuals from one subtree does not affect the values that will be assigned to edge in the other subtree" --- this is an intuitively appealing assumption, but is it really true? Could the authors explain?

Response 2: This follows from the way we express the total distance to be optimized: as a sum of contributions over all the edges. Contribution of an edge is the number of pairs of individuals from the subset whose path traverses the edge. It is equal to the product of counts of chosen individuals from both sides of the edge, i.e. K_e * (K - K_e), where K_e is the number of chosen individuals on one side of the edge. For each edge of a subtree it holds that all the individuals outside the subtree are on the same side of the edge. Hence, we're able to compute the contribution of subtree edges without knowing the exact choice of individuals outside the subtree, but only their count.

We agree that this should heve been better exposed in the paper, and we have included some additional sentences that we hope will clarify the concept a bit more.

Point 3: Also, the pedigree is considered as a forest of separate trees. Are they independent?

Response 3: A pedigree is indeed a complex structure where it is possible that the connections exist within any subset of individuals. However, when we deal only with a single gender lineage, where each individual has only one ancestor, the pedigree is reduced to a forest of independent trees. The trees are considered independent as the only connections between them are through the lineages of the other gender, which we do not take in account when dealing with mitochondria or Y chromosome.

Reviewer 2 Report

The authors present an interesting and potentially useful method for selection of samples in populations with genealogical data. The described method can be very useful if it works properly. Thus, this paper is worth publishing. However, some revisions are required.

Major points:

Is Magellan really the only other software available for such analyses? If not, please, compare the new method to other tools.
Please, provide links where the scripts you described can be seen and evaluated.

Minor points:

3. Line 94: What does the square mean (at the end of the sentence)?

4. Table 1. Please, use "min" for abbreviation of "minutes". In the SI system, which should be used, "m" stands for "meters", not "minutes".

Author Response

Point 1: Is Magellan really the only other software available for such analyses? If not, please, compare the new method to other tools.

Response 1: To the best of our knowledge, there exists no other software for this purpose. Magellan was confirmed as the first one by the reviewers at the time of publishing, and we are not aware of anything else being published in the meantime.

Point 2: Please, provide links where the scripts you described can be seen and evaluated.

Response2: The link to the github page has been provided in the abstract, and again in the introduction (line 43). We presumed that was the adequate position in the paper. However, it is possible that there exists another convention for link placement that we are not aware of. If that is the case, please let us know.

Point 3: Line 94: What does the square mean (at the end of the sentence)?

Response 3: We apologize for the informal use of that symbol. The intention was to separate descriptions of algorithms from the rest of the text. In the revised paper we have used vertical space instead.

Point 4: Table 1. Please, use "min" for abbreviation of "minutes". In the SI system, which should be used, "m" stands for "meters", not "minutes".

Response 4: We have corrected the text accordingly, thank you for the comment.

Round 2

Reviewer 1 Report

The revised manuscript is ready for publication. The two criticisms that I had have been addressed satisfactorily.

Reviewer 2 Report

The manuscript is acceptable now. Just a minor suggestion which can be included at the page proof stage - it would be useful for a reader if a link to the github page was provided a in the Methods section. It can be transferred from Introduction where only previously known information should be mentioned, so a reader does not expect description of newly prepared tools placed here. Keep the link in the Abstract, since Abstract is an independent part of the paper, published separately in various databases, thus, it is important to have the link there.

Article Menu

A Fast Method for the Selection of Samples in Populations with Available Genealogical Data

Further Information

Guidelines

MDPI Initiatives

Follow MDPI