Next Article in Journal
Infrared-Visible Image Fusion Based on Semantic Guidance and Visual Perception
Next Article in Special Issue
Information Theory for Biological Sequence Classification: A Novel Feature Extraction Technique Based on Tsallis Entropy
Previous Article in Journal
An Unsupervised Video Stabilization Algorithm Based on Key Point Detection
Previous Article in Special Issue
Detecting the Critical States of Type 2 Diabetes Mellitus Based on Degree Matrix Network Entropy by Cross-Tissue Analysis
 
 
Article
Peer-Review Record

An Information-Theoretic Bound on p-Values for Detecting Communities Shared between Weighted Labeled Graphs

Entropy 2022, 24(10), 1329; https://doi.org/10.3390/e24101329
by Predrag Obradovic 1,†, Vladimir Kovačević 1,†, Xiqi Li 2 and Aleksandar Milosavljevic 2,3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Entropy 2022, 24(10), 1329; https://doi.org/10.3390/e24101329
Submission received: 31 July 2022 / Revised: 12 September 2022 / Accepted: 17 September 2022 / Published: 21 September 2022
(This article belongs to the Special Issue Information Theory in Computational Biology)

Round 1

Reviewer 1 Report

The main objective of the paper is a method to evaluate the statistical significance of connectedness in weighted labeled graphs. The proposed method extends the so called “connect the dots” (CTD) method. The contributions of the paper are solid as it contains a new theoretical derivation that is comprehensively demonstrated using synthetic and real data. However, there are some issues that should be solved. The literal presentation of the paper is good including explanations and figures. However, there is still room for improvement in this regard. The results and discussions are convincing showing test cases using different parameters (cliques, node module contrasts). In summary, I consider the contents of the paper are potentially publishable, but the following specific issues should be addressed in a revised version of the paper.

- Page 5, lines 181-182, it is stated that high computational power is required to do the experiments. Please remark the computational order of the implemented methods and the running times measured for the experiments.

- Pages 4-5, Section 2.3, there are other related options to generate synthetic data based on signal graph processing that should be discussed. Recently, the Complex Graph Fourier Transform (CGFT) has been proposed for surrogating data. I suggest the following reference: https://doi.org/10.3390/e21080759.

- Page 7, lines 250-251, please comment on the applications where the graph parameters of Table 1 could be used or are valid.

- There is a character (a kind of box) after equation (18) that should be removed.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors derive bounds on p-values associated to communities shared by weighted labeled graphs. To this aim, they employ some previously obtained results rooted into information theory, such as the Kraft-McMillan inequality and the Algorithmic Significance theorem.

The paper poses itself into the research field that could be named "statistical network theory", introducing methods for detecting network (sub)structures that enrich the currently available toolbox of algorithms explicitly carrying out tests of hypothesis (in one of the first papers written by the authors, they claim that the method they propose "is conceptually similar to the statistical significance method in the Neyman-Pearson hypothesis testing framework").

Overall, I find the methods presented by the authors in this paper of some relevance - as they shed light on the relationships between information theory, network theory and statistics - hence, deserving to be published.

Minor observations:

- to fully understand the results presented in this paper, I had to read the papers "Discovering simple DNA sequences by the algorithmic significance method" and "CTD: An information-theoretic algorithm to interpret...". In order to make the paper fully self-consistent I suggest them to make (even more explicit) the nature of the null and the alternative hypothesis, as well as their encodings (e.g. by taking part of the explanation provided in the two aforementioned papers and moving it here);

- the sentence "Our results are independent of the algorithm used to detect S and thus pave the way to many practical implementations" is not clear. Do the authors mean that their recipe can be even used to test the statistical significance of network substructures provided by some "external user", so to say?

    - figure 2 is quite obscure: I suggest the authors to provide a graphical illustration of the entire workflow as well as some simple toy examples.

    Author Response

    Thank you for a thorough review. Please kindly see the attachment.

    Author Response File: Author Response.docx

    Reviewer 3 Report

    The article deals with the problem of locating in two labeled weighted graphs a relatively small set of nodes that is highly connected in both. The authors extend a recently proposed CTD ("Connect the Dots") approach to establish information-theoretic upper bounds on the p-values and lower bounds on the size and connectedness of communities that are detectable. The core idea of this paper is to use one of the input graphs as a proposer graph, while the other graph takes the role of a tester with the p-value calculated for the tester being corrected by applying the weighted Bonferroni correction.

    The article combines theoretical material with experiments and careful software planning. In particular, the authors present information-theoretic upper bounds on the p-values, extend the recently proposed CTD method, and provide a set of carefully performed experiments validating the various claims. The whole idea described may be applied to other problems too, and the whole concept may interest both theoreticians and practitioners. On the other hand, the idea is mainly an extension of a previous work , and the authors do not make so clear to understand where the novelty lies or whether some constructions can be straightforwardly deduced by their previous paper. Therefore, the author should try to revisit portions of their paper in order to depict more clearly the places where the real novelty lies and the places that are straightforward extensions.

    Overall, the article is very interesting and the techniques are worthwhile to be published and probably with extra applications, hence I vote for acceptance subject to minor revision.

    Author Response

    Please kindly see the attachment.

    Author Response File: Author Response.docx

    Round 2

    Reviewer 1 Report

    The quality of the paper has been improved significantly in the revised version of the paper. All my concerns have been adequately addressed including the following: improvement of the discussion on the computational complexity of the proposed method; discussion on alternative options to generate synthetic data based on graph signal processing; extending the relation of the proposed method with real applications; and in general, improvement of the literal presentation of the paper. Therefore, the paper should be ready for publication.

    Reviewer 3 Report

    The authors have replied satisfactorily to my remarks and I vote for acceptance. 

    Back to TopTop