Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Locating the Source of Diffusion in Complex Networks via Gaussian-Based Localization and Deduction

Appl. Sci. 2019, 9(18), 3758; https://doi.org/10.3390/app9183758

by Xiang Li¹

, Xiaojie Wang^1,*, Chengli Zhao¹, Xue Zhang¹ and Dongyun Yi^1,2

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2019, 9(18), 3758; https://doi.org/10.3390/app9183758

Submission received: 24 July 2019 / Revised: 27 August 2019 / Accepted: 6 September 2019 / Published: 8 September 2019

(This article belongs to the Section Applied Physics General)

Round 1

Reviewer 1 Report

The paper is a nice contribution to the problem of locating the source of a diffusion process over a network (tree or graph). The method is similar to existing approaches, with only some minor differences that however are important to obtain an algorithm with affordable complexity for large networks and at the same time similar or better accuracy. The performance analysis, conducted on both simulated and real data, is accurate and thorough.

An aspect that can be improved is the description of the covariance matrix. In particular, after line 122 where you introduce the path intersection matrix you may give a numerical example and, there or later in the performance assessment, also visualize the matrix elements in pseudocolor or whatever you like; at the moment this important aspect, which is the main difference between the two variants of your algorithm, is superficially treated.

I also recommend a careful proofreading of the paper, since there are several typos and some other language issues. For instance:

line 28: “difficult” should be “difficulty”

line 97 and 99: “observes” should be “observers”

lines 102-105: the sentence is repeated twice, with a slight modification

line 242: "source the” should be “the source”

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper describes and evaluates a method for estimating the source of a diffusion process in a, undirected, unweighted, static graph. This method uses the arrival times of the diffusion at a subset of “observer” nodes, to derive a Maximum Likelihood Estimator (MLE) for the source node (s), the starting diffusion time (t0) and the standard deviation of edge-to-edge delays (sigma). Diffusion times between edges are assumed Gaussian (with unknown deviation) and i.i.d. — this is a important hypothesis.

The proposed method is optimal for diffusion processes in trees, as the best MLE can be derived sequentially for each considered parameter. A variation is proposed for general (non-tree) graphs. The performance of both methods is evaluated in synthetic trees and general graphs, as well as in some real-world topologies. Proposed methods are compared to two state-of-the-art methods, i.e. the Gaussian heuristic (similar to the proposed mechanism, but with a given value for sigma), and the Time-Reversal Backward Spreading (TRBS) method.

Methods:

Line 122 introduces \Lambda_{ps} as “the path intersection matrix”, but this matrix is not formally defined. One can assume that this is a binary-valued matrix, with dimensions |V \ {s}| x |V \ {s}|, where the cell (i,j) is 1 if path P(s, oi) intersects with P(s, oj), and 0 otherwise. But other definitions for a “path intersection matrix” exist in the literature (see e.g. D. Y. Kho et al. “Path intersection matrices and applications to networks”), so this should be clarified. The definition has implications in the second method proposed, as it relies on the diagonal of this matrix \Lambda_{ps}. In the case of the previously advanced definition, (\Lambda_{ps})_{i,i} = 1 by definition, as the path(s, oi) always interacts with itself, D_s would be therefore the identity matrix, which is not stated in the paper.

The described method for trees finds the node for which the values for sigma, t0 and s maximize/minimize the corresponding expressions.

Table I is unclear. A reference should be added to point to Fig. 1. Also, it should be explicit on the table that only nodes that are not observers are included in the list. Also, it would be useful to insist on the fact that the sequential derivation of optimized values ensures that the optimal parameters lead to the same source candidate — the Table I example alone may be misleading.

In the 2nd method, the previous equations are modified to get to eq. (10). It should be more clearly justified why matrix \Lambda can be turned into matrix D, without affecting the optimization performance — or, if it affects the optimization performance, as it seems, how the resulting suboptimality can be assessed or bounded. While the paragraph between lines 162 and 167 seems correct, the wording is a bit vague/unclear — it is recommended to rephrase it and/or provide a more clear argument. Also, see above for the need of a clarification of matrix \Lambda and, consequently, the values of its diagonal.

Paragraph between lines 175-179 is unclear, in particular the distinction (or not) between GLAD-naive directly optimizing eq. (1) and GLAD-naive building a random BFS for each node and optimizing eq. (1), please clarify if it’s the same approach (then, why mention it twice?) or different naive approaches. To my current understanding, they correspond to the same naive approach — when applied to a tree graph, the random BFS for each node turns out to be the same and only tree. If this is correct, please state it clearly — current wording is difficult to understand.

Results:

In section 3.1, definition of the average ranking (line 213), “The average ranking is the average ranking of the real source in the algorithm” is not really a definition. Please provide a complete definition of the metric under consideration.

In section 3.2, it is not clear whether results are presented over trees or over non-tree (random) graphs, since the paragraph talks about and describes Erdos-Renyi and Barabasi-Albert random graphs. Please clarify this. In particular, it would be useful to describe in the main body of the text (and not in footnote (2)) what are ER/BA trees, i.e. how “ER trees” and “BA trees” are generated from (well-known, not tree) Erdos-Renyi or Barabasi-Albert random graphs.

Presented results show average values over “multiple times of numerical simulations”. This number of times should be precised. Also, it would be insightful to have some notion of the measures variance, e.g. 95% confidence interval. Since many of the figures show a relatively close performance from the examined alternatives, such indication of variance is needed to assess the relevance and potential benefit of proposed methods with respect to state-of-the-art mechanisms.

Finally, while it is perfectly fine (and helpful) to have results on both synthetic and real-world graph topologies, it is not clear what kind of topologies are been targeted when selecting real-world examples. What is the rationaly behind the current selection of topologies? Is there a particular use case that the authors would consider as particularly convenient for their method?

Conclusion:

It is recommended to rework the conclusion and discussion parts. The considered problem is interesting, the proposed MLE-based method is reasonable and the evaluation procedure and metrics make sense. However, it is hard to identify from the proposed experiments, presented results, and approaches a strong, relevant contribution — beyond some incremental improvement with respect to existing mechanisms, due to, if I’m understanding correctly, the estimation of additional statistical parameters (e.g. sigma) of the model. In that sense, any further clarification of the qualitative differences with respect to other mechanisms (GAU, TRBS) would be helpful.

More in general, proposed method relies on very strong assumptions on the nature of diffusion, e.g. the Gaussian i.i.d.for each edge in the graph. It could be interesting to discuss the impact of other delay distributions, and maybe test the performance of proposed methods under graphs with non-Gaussian delays.

Specific comments:

Line 78 states that considered graphs are “simple”. Are they simple in the sense that no loops and no multiple edges between nodes are allowed? How would that affect, in particular the multiple edge condition?

Figure 10, caption: gamma —> \gamma

Table 2: Title of 5th column should be kmax, not <kmax>

Line 412: applys —> applies

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Some clarifications have been included in the paper, following previous observations; the resulting text has thus an easier-to-understand description of the proposed methods.

Some of the comments should be, however, further addressed. In particular:

Concerning point 6, while current description is more detailed than the previous version, authors are encouraged to be more precise in the specification of the method to produce their ER and BA-based trees, for reproducibility purposes. The description should allow readers to implement their very procedure and obtain the same graphs to test. Clarity on this aspect is also necessary to assess the interest of evaluation of proposed algorithms in these graphs. Concerning point 7, authors are strongly encouraged to add indications of 95% confidence intervals (and not only mean values) in provided figures in the Results section. Otherwise, statistical relevance of differences between algorithms is difficult to assess. In some cases (point 1, 8, 9, 10), provided answers are interesting, and it may be worth to include (or make more explicit) the information they bring in the paper itself, as they add some value to the paper.

In a more general note, a general English style revision is still recommended throughout the whole paper.

Author Response

Please see the attachment!

Author Response File: Author Response.pdf

Article Menu

Locating the Source of Diffusion in Complex Networks via Gaussian-Based Localization and Deduction

Further Information

Guidelines

MDPI Initiatives

Follow MDPI