An Information-Theoretic Bound on p-Values for Detecting Communities Shared between Weighted Labeled Graphs
Round 1
Reviewer 1 Report
The main objective of the paper is a method to evaluate the statistical significance of connectedness in weighted labeled graphs. The proposed method extends the so called “connect the dots” (CTD) method. The contributions of the paper are solid as it contains a new theoretical derivation that is comprehensively demonstrated using synthetic and real data. However, there are some issues that should be solved. The literal presentation of the paper is good including explanations and figures. However, there is still room for improvement in this regard. The results and discussions are convincing showing test cases using different parameters (cliques, node module contrasts). In summary, I consider the contents of the paper are potentially publishable, but the following specific issues should be addressed in a revised version of the paper.
- Page 5, lines 181-182, it is stated that high computational power is required to do the experiments. Please remark the computational order of the implemented methods and the running times measured for the experiments.
- Pages 4-5, Section 2.3, there are other related options to generate synthetic data based on signal graph processing that should be discussed. Recently, the Complex Graph Fourier Transform (CGFT) has been proposed for surrogating data. I suggest the following reference: https://doi.org/10.3390/e21080759.
- Page 7, lines 250-251, please comment on the applications where the graph parameters of Table 1 could be used or are valid.
- There is a character (a kind of box) after equation (18) that should be removed.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
The authors derive bounds on p-values associated to communities shared by weighted labeled graphs. To this aim, they employ some previously obtained results rooted into information theory, such as the Kraft-McMillan inequality and the Algorithmic Significance theorem.
The paper poses itself into the research field that could be named "statistical network theory", introducing methods for detecting network (sub)structures that enrich the currently available toolbox of algorithms explicitly carrying out tests of hypothesis (in one of the first papers written by the authors, they claim that the method they propose "is conceptually similar to the statistical significance method in the Neyman-Pearson hypothesis testing framework").
Overall, I find the methods presented by the authors in this paper of some relevance - as they shed light on the relationships between information theory, network theory and statistics - hence, deserving to be published.
Minor observations:
- to fully understand the results presented in this paper, I had to read the papers "Discovering simple DNA sequences by the algorithmic significance method" and "CTD: An information-theoretic algorithm to interpret...". In order to make the paper fully self-consistent I suggest them to make (even more explicit) the nature of the null and the alternative hypothesis, as well as their encodings (e.g. by taking part of the explanation provided in the two aforementioned papers and moving it here);
- the sentence "Our results are independent of the algorithm used to detect S and thus pave the way to many practical implementations" is not clear. Do the authors mean that their recipe can be even used to test the statistical significance of network substructures provided by some "external user", so to say?
- figure 2 is quite obscure: I suggest the authors to provide a graphical illustration of the entire workflow as well as some simple toy examples.
Author Response
Thank you for a thorough review. Please kindly see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
The article deals with the problem of locating in two labeled weighted graphs a relatively small set of nodes that is highly connected in both. The authors extend a recently proposed CTD ("Connect the Dots") approach to establish information-theoretic upper bounds on the p-values and lower bounds on the size and connectedness of communities that are detectable. The core idea of this paper is to use one of the input graphs as a proposer graph, while the other graph takes the role of a tester with the p-value calculated for the tester being corrected by applying the weighted Bonferroni correction.
The article combines theoretical material with experiments and careful software planning. In particular, the authors present information-theoretic upper bounds on the p-values, extend the recently proposed CTD method, and provide a set of carefully performed experiments validating the various claims. The whole idea described may be applied to other problems too, and the whole concept may interest both theoreticians and practitioners. On the other hand, the idea is mainly an extension of a previous work , and the authors do not make so clear to understand where the novelty lies or whether some constructions can be straightforwardly deduced by their previous paper. Therefore, the author should try to revisit portions of their paper in order to depict more clearly the places where the real novelty lies and the places that are straightforward extensions.
Overall, the article is very interesting and the techniques are worthwhile to be published and probably with extra applications, hence I vote for acceptance subject to minor revision.
Author Response
Please kindly see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
The quality of the paper has been improved significantly in the revised version of the paper. All my concerns have been adequately addressed including the following: improvement of the discussion on the computational complexity of the proposed method; discussion on alternative options to generate synthetic data based on graph signal processing; extending the relation of the proposed method with real applications; and in general, improvement of the literal presentation of the paper. Therefore, the paper should be ready for publication.
Reviewer 3 Report
The authors have replied satisfactorily to my remarks and I vote for acceptance.