Next Article in Journal
Double-Layer Distributed and Integrated Fault Detection Strategy for Non-Gaussian Dynamic Industrial Systems
Previous Article in Journal
Supertransient Chaos in a Single and Coupled Liénard Systems
Previous Article in Special Issue
Dynamic Contact Networks in Confined Spaces: Synthesizing Micro-Level Encounter Patterns through Human Mobility Models from Real-World Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Simulating Weak Attacks in a New Duplication–Divergence Model with Node Loss †

1
Department of Statistics, University of Oxford, 24–29 St. Giles’, Oxford OX1 3LB, UK
2
The Alan Turing Institute, London NW1 2DB, UK
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in The 12th International Conference on Complex Networks and Their Applications, Menton Riviera, France, 28–30 November 2023.
Entropy 2024, 26(10), 813; https://doi.org/10.3390/e26100813
Submission received: 20 June 2024 / Revised: 7 September 2024 / Accepted: 17 September 2024 / Published: 25 September 2024

Abstract

:
A better understanding of protein–protein interaction (PPI) networks representing physical interactions between proteins could be beneficial for evolutionary insights as well as for practical applications such as drug development. As a statistical model for PPI networks, duplication–divergence models have been proposed, but they suffer from resulting in either very sparse networks in which most of the proteins are isolated, or in networks which are much denser than what is usually observed, having almost no isolated proteins. Moreover, in real networks, where a gene codes a protein, gene loss may occur. The loss of nodes has not been captured in duplication–divergence models to date. Here, we introduce a new duplication–divergence model which includes node loss. This mechanism results in networks in which the proportion of isolated proteins can take on values which are strictly between 0 and 1. To understand this new model, we apply strong and weak attacks to networks from duplication–divergence models with and without node loss, and compare the results to those obtained when carrying out similar attacks on two real PPI networks of E. coli and of S. cerevisiae. We find that the new model more closely reflects the damage caused by strong and weak attacks found in the PPI networks.

1. Introduction

From virtual internet to practical traffic control systems, from small social networks to large biological systems, networks are ubiquitous, and so are attacks on networks. For example, an internet cyber attack can slow down information transmission or cause information leakage, and drugs can target a number of different proteins. Reference [1] shows that partial inactivation of multiple nodes simultaneously in a network can be more effective than the complete elimination of a node, by measuring the sum of the inverse of the shortest path between any two nodes of biological networks (the network efficiency).
This result motivates the study of weak attacks in pharmaceutical designs. For example, broader-specificity, lower-affinity compounds or multidrug therapies may cause larger damage in network efficiency than high-affinity, high-specificity compounds. The success of multitarget drugs, like non-steroidal anti-inflammatory drugs (NSAIDs) [2], metformin [3], and Gleevec [4], to treat diseases including AIDS, cancer, atherosclerosis, and Alzheimer’s disease, all suggest that attacking multiple targets may be a useful therapeutic strategy.
To anticipate the effect of an attack, a well-fitting parametric network model could help gain insights. For protein–protein interaction (PPI) networks, duplication–divergence (DD) models have been suggested, see for example [5,6,7]. This paper hence starts with practically simulating weak attacks in a duplication–divergence model. Simulations from [8] suggest that DD models can generate networks which resemble PPI networks more than a basic Bernoulli random graph model. However, ref. [9] found that while Monte Carlo tests based on network comparison statistics do not reject the DD model for some small-virus PPI networks, they do reject it (at the 5% level) for E. coli, worm, fly, S. cerevisiae, and human PPI networks. Indeed, DD models are known not to be very realistic; for example, ref. [10] proved that as the number of nodes tends to infinity, the proportion of isolated nodes in a standard DD model converges to either 0 or 1, neither of which is realistic.
To understand theoretically how weak attacks damage PPI networks, it is instructive to consider a simple Bernoulli G ( n , M ) random graph with n nodes and M edges. We derive a Poisson approximation for the number of isolated nodes in a G ( n , M ) via Stein’s method, which gives explicit bounds in total variation distance, and we prove similar bounds for the number of isolated nodes after different attack strategies. These results lead to a clear statistical rejection of the hypothesis that the real PPI networks in this paper follow a G ( n , M ) model.
To identify a more realistic model for PPI networks, we notice that the current DD models ignore gene losses, a biological function [11] which can potentially balance the proportion of isolated nodes. As genes code for proteins, it is plausible that a model with node loss may perform better than standard duplication–divergence models for PPI networks. This paper introduces a new DD model with node loss, where a node can be lost with probability q if it is isolated. We compare the simulation results of weak attacks in a standard DD model and the DD model with node loss, and conclude that the new model indeed generates a more realistic performance.
This paper is structured as follows. Section 2 describes the datasets and attack strategies that are employed, as well as the damage strategy and measures of damage. Section 4 introduces the new DD model with node loss. Simulations of various attack strategies on PPI networks on real and synthetic networks are provided in Section 5. The results are discussed in Section 6. Appendix A contains details of the Poisson approximation results and Appendix B contains additional figures. The code is available at https://github.com/rh-zhang/Entropy_CNC2023 (accessed on 24 August 2024).

2. Data and Methods

2.1. Datasets

We use PPI networks for E. coli and S. cerevisiae downloaded from STRING (version 12.0, accessed on 11 March 2024), restricted to physical interactions between proteins only. The resulting networks are unweighted, undirected physical subnetworks representing direct interactions between proteins only, excluding indirect functional associations. We remove interactions with a STRING score [12] less than 0.500 for the E. coli PPI network and less than 0.400 for the S. cerevisiae PPI network, taking all evidence channels into account. The 0.400 threshold is the default threshold in STRING; the 0.500 threshold for E. coli is chosen such that the number of isolated nodes is of a similar magnitude (around 1100) in both networks, see Table 1. As shown in Figure 1, the number of isolated nodes increases as the threshold of STRING scores increases. However, the overall trend regarding the impact of weak attacks on the networks remains consistent in our results, as shown in Figure A9.
We note that there is no claim that all possible protein–protein interactions have been detected, and hence the STRING database is unlikely to contain all true interactions; it may also contain some false positive interactions. Our study is conceptual and hence not severely affected by such false positives and false negatives, under the assumption that there is no strong systematic connection between errors in the data and isolated proteins.
We assign a uniform weight of 1 to all the remaining edges in the datasets, with the summary statistics shown in Table 1. The reason for ignoring weights is conceptual simplicity.

2.2. Attack Strategies

The attack strategies used in this paper follow those from [1]. While in [1], networks with weighted edges are allowed, in our investigative study we set all edge weights equal to 1 initially; some attacks lead to a reduction in some of the edge weights. The attack strategies are split into three categories.
Type A: Complete knockout: the attack of a single target by eliminating all interactions of a given node, as shown in Figure 2A.
Type B: Partial inactivation of a target, as shown in Figure 2: B, which is modelled in two different ways:
B1: Partial knockout: half of the interactions of a given node are removed (the number of interactions removed is rounded down when the degree of the target is odd). If a node is attacked partially once, it will not be attacked again to ensure no node is completely knocked out. This is shown in Figure 2B1.
B2: Attenuation: all interactions of a given node are attenuated by halving their weight.
Type C: A distributed, system-wide attack, which can affect any interactions (i.e., edges) within a network. Again, such an attack is modelled in two different ways:
C1: Distributed knockout: edges are deleted independently at random, with the same deletion probability, as shown in Figure 2C1.
C2: Distributed attenuation: edges are chosen independently at random, with the same probability, and their weights are halved.
These attacks can be interpreted in pharmaceutical terms; a high-affinity drug completely eliminates an interaction while a low-affinity drug attenuates it, and a highly specific drug targets one single interaction only, while less specific drugs affect some or all interactions of a given node.

2.3. Successive Maximal Damage Strategy

As in [1], the nodes being attacked in the simulation of this paper are selected based on a successive maximal damage strategy. The search for maximal damage caused by multiple attacks is computationally very intensive. For instance, to determine which 5 of the 1000 edges of a given network need to be deleted in order to produce a maximal effect on the network efficiency, one would need to test 1000 ! / ( 5 ! 995 ! ) 8.25 × 10 12 cases in a single-simulation experiment.
Instead, we use a greedy algorithm: for each type of attack, in each step we choose the action that produces the largest damage. The greedy algorithm is carried out by first determining the damage caused by the removal of each individual node or edge, depending on the strategy. The node or edge causing the maximum damage is selected for removal in the subsequent attack. We note that the damage calculated in this manner is only an estimate of the maximal damage, since there may be more efficient combinations.

2.4. Measures of Damage

The damage induced by the attacks on the networks is measured by three metrics: the network efficiency for the transcriptions regulator networks, as used in [1], the average number of edges in the 1-step ego network for the PPI networks, as proposed in [13] for assessing the robustness of network metrics, and the number of isolated nodes.
The network efficiency (NE) of an undirected, unweighted graph of n nodes is i j 1 d i j , where d i j is the length of a shortest path between nodes i and j. If the network is weighted, d i j is the weight of a path between nodes i and j with a minimum weight. If any two nodes i j are disconnected, then d i j = , and their contribution to the calculation of network efficiency is 0. NE measures how efficiently a network exchanges information. The underlying idea is that the more distant two nodes are in a network, the less efficient their exchange of information will be.
The second measure is the average number of edges in the 1-step ego network, where a 1-step ego network consists of a focal node (the ego), the nodes to which the ego is directly connected (the alters), and the edges, if any, among the alters.
The third measure is the number of isolated nodes. We add this measure because an ideal attack would isolate a deleterious node. Moreover, in a Bernoulli random graph model this measure can be analysed analytically and thus is useful for providing theoretical underpinning.

2.5. Bernoulli Random Graphs

Given the number n nodes and the number M of edges in a simple network, in the absence of further information one may model the network as a G ( n , M ) graph. This is a random graph that is chosen uniformly at random from the collection of all simple graphs which have n nodes and M edges, where 0 M n 2 .
The distribution of the degree of a node v, D ( v ) , in a G ( n , M ) graph is hypergeometric; there are n 1 edges that are adjacent to v, out of the n 2 potential edges, of which we choose M. Abbreviating the number of node pairs by N = n 2 we thus have
P ( D ( v ) = k ) = n 1 k N ( n 1 ) M k N M , k = 0 , 1 , , M .
We can calculate the expected number of isolated nodes from this distribution, but not its variance, due to the dependence between edges. To clarify the dependence, for example, if we know that the first n 1 nodes have degree 0, then node n necessarily must have degree 0. As this dependence is usually weak, we derive a Poisson approximation for the number of isolated nodes in the total variation distance. The total variation distance d T V measures the largest absolute difference between the probabilities of the actual probability distribution and the Poisson approximation. For distributions P and Q taking values in Z + = { 0 , 1 , } , the total variation distance is defined as
d T V ( P , Q ) = sup A Z + | P ( A ) Q ( A ) | .
For M N ( n 1 ) , the probability that node i is isolated is
P ( I i = 1 ) = N ( n 1 ) M N M : = π .
With W denoting the number of isolated nodes, its expectation is E ( W ) = n π = : λ , and this is the parameter which we choose for the approximating Poisson distribution.
Theorem 1. 
It holds that
d T V ( L ( W ) ; P o ( λ ) ) min ( 1 , λ 1 ) e n p ( 1 + n 2 N + 2 n ) + p ( 1 + n 2 N + 2 n ) + 2 3 n + n 2 N + 2 n = 1 + n p 1 + n + N p 2 N N p n + 2 .
This bound tends to zero as p : = M / N 1 . The proof and more details can be found in Appendix A.1. Appendix A.1 also gives Poisson approximations for the number of isolated nodes after an attack for the different attack strategies. These results may be of independent interest.

3. Duplication–Divergence Models

Simulations suggest that duplication–divergence (DD) models generate networks which provide a better fit to protein interaction networks than the standard models [8]. There are different variations of duplication–divergence models in the literature, see for example [6,7,14,15]. Here, we use a version, from [15], which incorporates the parameters of the probability of edge divergence, p, but we exclude the possibility of a parent–child edge.
A standard duplication–divergence model D D ( t 0 ; p ) starts from a complete graph G t 0 on t 0 nodes (labelled from 1 to t 0 ), and then repeats the following steps until a graph of the desired size is obtained:
  • Duplication: at time t, a node u is selected uniformly at random. A node labelled as t + 1 is added, as well as the edges between node t + 1 and the neighbours of node u in the graph.
  • Divergence: edges involving node t + 1 are randomly retained with probability p.
An illustration of a DD model is shown in Figure 3.
Reference [15] found that the degree distributions of the DD model described above are in reasonable agreement with the distributions observed in real protein networks, and tuning the parameter p reveals a rich behaviour of the model. When p is large, the network growth lacks self-averaging and results in a great diversity of networks grown out of the same initial condition. For p < 0.5 , the average degree increases very slowly or tends to a constant, and the degree distribution has a power-law tail. Several real protein–protein networks are estimated to have a p value of around 0.4 [15]. As shown in Figure A1, the choice of p does not affect the qualitative behaviour of the models against attacks.

4. A New Duplication–Divergence Model Which Allows for Node Loss

Although simulations have shown that the DD model described above is more realistic than a G ( n , M ) model, ref. [10] proved that the proportion of isolated nodes in a DD model either converges to 0 or 1. This behaviour does not match biological intuition, and other network models do not exhibit it; for example, we prove in Appendix A that the proportion of isolated nodes in a G ( n , M ) model does not have to converge to either 0 or 1.
The quality of a network model has to be judged by the research question to be addressed. In a series of Monte Carlo tests for E. coli, worm, fly, S. cerevisiae, and human PPI networks and some small-virus PPI networks [9], a DD model (allowing for a non-zero probability of a parent–child edge) is rejected as a model for the large PPI networks based on network comparison statistics including graphlet correlation distance, graphlet degree distribution agreement, Netal, and Netdis. In contrast, in the small-virus PPI networks investigated in [9], the DD model is not rejected by most of these network comparison statistics. These statistics do not include the number of isolated nodes, but Netdis is based on subgraph counts in ego networks, and is thus related to our outcome measure of the average number of edges in 1-step ego networks. Hence, these Monte Carlo results indicate that the DD model may not be a good fit for larger PPI networks when the interest is in modelling the effect of attacks.
From a biological viewpoint, genes and the proteins they code for can not only duplicate, but can also be lost. For example, gene loss can occur during natural mutations and frameshifts [16]. Furthermore, many examples support the idea that gene loss can be an adaptive evolutionary force that is especially common when organisms are faced with abrupt environmental challenges [11]. Adaptive gene loss, or gene loss in general, can be of potential interest in the study of both biomedicine and evolution.
Therefore, we modify the DD model to allow for both node addition and for the loss of nodes. In addition to the process that generates a DD model, a node loss step is added after every duplication-and-divergence step. In particular, we focus on the node loss mechanism that a node can be lost with probability q if it is isolated.
  • Duplication: at time t, a node u is selected uniformly at random. A node labelled as t + 1 is added, as well as the edges between node t + 1 and the neighbours of node u in the graph.
  • Divergence: edges involving node t + 1 are randomly retained with probability p.
  • Node loss: a node is randomly lost with probability q if it is isolated.
A graph illustration of our new model is present in Figure 4.

5. Results

5.1. Simulation of Weak Attacks in Real PPI Networks

Here, we apply the various attack strategies to our PPI networks datasets with 10 repeats. Figure 5 shows that as for the PPI networks of E. coli and S. cerevisiae the number of targets that are subject to weak attacks increases, and the damage caused by weak attacks becomes larger and is significantly greater than the damage caused by complete knockout.
To understand the expected effects of attacks, a parametric model may be useful. Next, we investigate two such models.

5.2. The Number of Isolated Nodes in a Bernoulli Random Graph

As a baseline model for a PPI network, we use a G ( n , M ) model. In Appendix A we derive an upper bound for the total variation distance for the number of isolated nodes in real PPI networks using a G ( n , M ) graph under Poisson approximation, see Appendix A.1. The Poisson approximation comes with an explicit bound, which we abbreviate here as Δ , on the total variation distance (1). If W denotes the number of isolated nodes, λ its expectation under the G ( n , M ) model, and Z a Poisson-distributed random variable with mean λ , then it follows that for all k,
P ( Z k ) Δ P ( W k ) = P ( Z k ) + ( P ( W k ) P ( Z k ) ) P ( Z k ) + Δ .
Thus, the Poisson approximation can be used to assess statistical significance.
For our E. coli and S. cerevisiae data, the estimated upper bound for the total variation distance is 3.73 × 10 15 and 9.28 × 10 19 , respectively. While these bounds are small, the p-values associated with these bounds are 0 up to 6 significant digits under a two-sided test in which the null hypothesis of the G ( n , M ) model is rejected for very small or very large numbers of isolated nodes, lending evidence to the explanation that the G ( n , M ) model does not explain the observed number of isolated nodes well. The observed number of isolated nodes in E. coli and S. cerevisiae is 833 and 1100, respectively, whereas the expected number of isolated nodes under the G ( n , M ) model is 2.99 × 10 14 and 1.27 × 10 17 . This suggests that a G ( n , M ) graph may not be suitable for modelling these real PPI networks when the interest is in the number of isolated nodes as a summary statistics.
We further derived upper bounds for the total variation distance under Poisson approximation for the number of isolated nodes after different types of attack, see Appendix A.2, Appendix A.3 and Appendix A.4. Again, the results are highly significant, with p-values equal to 0 up to 6 significant digits, indicating that after an attack, the G ( n , M ) model still does not fit the data well. Hence, a different model for the data is needed. Next, we investigate the standard duplication–divergence model from Section 3.

5.3. Simulation of Weak Attacks in Duplication–Divergence Model

In this section, we present the simulation results of applying weak attacks to realisations of the standard duplication–divergence model D D ( t 0 ; p ) from Section 3. The model is undirected, and all edges are set to have unit weight; we take t 0 = 3 , and start the simulation of the graph with a triangle. This choice ensures that the generated networks can include triangles, resulting in non-zero local and global clustering coefficients; thus they are able to match this key characteristic of PPI networks. In contrast, if the graph is initiated with just a connected pair of nodes, the generated graphs cannot have any triangles; the corresponding simulation results, shown in Appendix B, are, however, similar regarding the effect of attacks. Reference [10] proves that p * solving the equation p e p = 1 is a critical value, in the sense that for p > p * there is no limiting degree distribution. In this paper, we take p to be 0.4, a value smaller than p * 0.567 . The simulations are run for 1000 steps, with five repeats.
The top two plots of Figure 6 show how partial attacks damage a DD network compared to complete knockout attacks. As illustrated in the top left plot of Figure 6, while increasing the number of nodes being attacked weakly eventually enhances the damage efficiency for a large number of attacks, complete knockout attacks serve as a robust method to destroy the network.
The bottom two plots of Figure 6 show how distributed attacks damage a DD network compared to complete knockout attacks. The horizontal line representing the damage caused by one complete knockout suggests that the effect of 6 distributed knockout or 13 distributed attenuation attacks is equivalent to the effect of one complete knockout. This indicates that distributed attacks are less effective than both complete knockout attacks and partial attacks.

5.4. Simulation of Weak Attacks in the New Node Loss Model

Now, we present the simulation results of applying weak attacks onto the new node loss model introduced in Section 4. Again, the model is undirected with all edges assigned unit weight. The simulations are run with 10 repeats and the average network efficiency values are reported to account for randomness. We note here that we did not carry out a grid search for the optimal parameter choices for the DD models without and with gene loss for the different organisms, as the focus of this paper is the qualitative behaviour of the new DD model with gene loss, and not detailed modelling of observed PPI networks.
Figure 7 shows the results for p = 0.4 and q = 0.2 under different weak attacks. We observe that in 25 attacks, a complete knockout attack is more effective than a partial attenuation when half of the edges connected to one node are eliminated, but less effective than a partial attenuation when halving two nodes or five nodes. Our results indicate that as the number of halved nodes increases, the weak attacks damage networks more efficiently. Furthermore, distributed attacks are less effective than complete knockout and partial attacks, mirroring the qualitative impact observed in real PPI networks.
We observe that the pattern of Figure 7 for the new node loss model is more similar to the pattern of Figure 5 for the real datasets than the pattern of Figure 6 for a standard DD model. This suggests that the new node loss model can mimic the effect of weak attacks on protein–protein interaction networks more realistically than the standard D D ( t 0 , p ) model.
Regarding the effect of the probability of node loss on weak attacks in the new node loss model, we notice that the number of distributed attacks required to achieve the equivalent effect as one complete knockout attack increases as q increases. This raises a natural question regarding how the value of q affects the efficiency of weak attacks in the new node loss model. In our simulations, shown in Figure 8, the resilience of the new node loss model to weak attacks results in a slower rate of network degradation. This can be attributed to the fact that higher q values correspond to an increased likelihood of losing isolated nodes, which in turn leads to a more connected graph structure.

6. Discussion

In this paper, we have assessed standard models for PPI networks and we have introduced a new node loss model which is motivated by observed gene loss in organisms. We show that our new node loss model captures the effect of weak attacks in a protein–protein interaction network more realistically than a standard DD model (i.e., q = 0).
To further enhance the robustness of our results, as future work we aim to derive analytical results for the average number of edges in a 1-step ego network and for the network efficiency before and after attacks in the new node loss model.
It is perhaps not surprising that the new node loss model performs better due to its incorporation of a natural and common biological adaptation, namely, gene loss, occurring throughout evolution. As a next step, variants of the new node loss model could be examined; for example, one could include the case where the probability of a parent–child node edge is not zero. In order to understand how node loss affects duplication–divergence behaviour, we also aim to investigate other parameters that can affect a node loss in a network; for example, a pair of nodes may be more likely to be lost if they are connected by an isolated edge.
Regarding the network representation of PPIs, we chose the PPI networks from the STRING database, which represents each protein-coding gene locus by only a single, representative protein. The datasets contain non-binary data which could be incorporated in the analysis. Moreover, future work will assess the effect of restricting the protein interactions from the STRING database to physical interactions, by repeating the analysis for the full STRING PPI networks. Hypergraph representations as in [17] may also be fruitful.

Author Contributions

Writing—original draft, R.Z. and G.R.; supervision, G.R. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to acknowledge a discussion with Alan Whitmore and Jonny Wray at e-Therapeutics which sparked this project. G.R. acknowledges supports from EPSRC grants EP/T018445/1, EP/W037211/1, EP/V056883/1, and EP/R018472/1.

Data Availability Statement

The data presented in this study are openly available in GitHub at https://github.com/rh-zhang/Entropy_CNC2023.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Poisson Approximation for the Number of Isolated Nodes in a G(n,M) Graph before and after Attacks

Appendix A.1. Poisson Approximation for the Number of Isolated Nodes in a G(n,M) Graph

For a G ( n , M ) graph G, n 2 , we define the edge indicators E i j so that E i j = 1 if there is an edge between i and j belonging to the edge set E ( G ) of G, and 0 otherwise. These edge indicators are not independent, as can be seen by the requirement that i < j E i j = M . Since M edges are chosen uniformly at random from N = n 2 possible edges, we have
P ( E i j = 1 ) = M N : = p
so that E i j B e ( p ) . We let
I i : I i ( n ) = j i ( 1 E i j )
be the indicator of the event that node i is isolated in G ( n , M ) . Then, for M N ( n 1 ) ,
P ( I i = 1 ) = N ( n 1 ) M N M : = π
is the same for each i, and for M > N ( n 1 ) it is 0. Our quantity of interest is W = i = 1 n I i , the number of isolated nodes in G ( n , M ) . From Equation (A1), λ = E ( W ) = n π . We point out that I i ’s are not independent, but when π is small the dependence is weak.
While Equation (A1) can be difficult to evaluate numerically for large N and M, we note that
π = N ( n 1 ) M N M = 1 M N 1 M N 1 1 M N n + 2 .
Thus, setting p = M N , we can bound n ( 1 M N n ) n 1 λ n ( 1 M N ) n 1 = n ( 1 p ) n 1 . To understand the distribution of W, Theorem 1 in the main text gives a Poisson approximation for which we provide a proof here. For convenience, we re-state the result.
Theorem A1. 
If W : = i = 1 n I i , we have for a G ( n , M ) graph,
d T V ( L ( W ) ; P o ( λ ) ) min ( 1 , λ 1 ) e n p ( 1 + n 2 N + 2 n ) + p ( 1 + n 2 N + 2 n ) + 2 3 n + n 2 N + 2 n = 1 + n p 1 + n + N p 2 N N p n + 2 .
Before we prove this result, we note that the bound, which we could call Δ as in Section 5.2, on the total variation distance is explicit; no limiting behaviour is assumed. However, it can be seen that Δ tends to 0 as p = M N 1 .
Proof. 
When assessing the goodness of fit of Poisson approximations, Stein’s method has become a strong tool under various dependence structures [18]. In our case, notice that given any realisation of G ( n , M ) , an associated realisation of G ( n , M ) conditional on I i = 1 is obtained simply by setting all the edge indicators ( E i j , E j i , i j n , j i ) equal to zero. This may create additional isolated nodes, but cannot destroy any. To exploit this fact, we use so-called size bias coupling, constructing a random variable W i * in the same probability space as W, which has the conditional distribution L W I i I i = 1 . Theorem 2.A in [19] gives that
d T V ( L ( W ) , Po ( λ ) ) J 1 i = 1 n p i E W W i *
with J 1 min 1 , λ 1 .
To construct such size bias coupling, as in [20], we introduce Z j = l i , j ( 1 E j l ) , so that Z j = 1 if j is not connected to any nodes excluding i and itself, and Z j = 0 otherwise. Then, for each i we can take as a size-biased variable
W i * = j : j i Z j = j : j i ( 1 E i j + E i j ) l i , j ( 1 E j l ) = j : j i l j ( 1 E j l ) + j : j i U j = W I i + j : j i U j ,
where
U j = l i , j ( 1 E j l ) l i , j ( 1 E j l ) ( 1 E j i ) = E i j l i , j ( 1 E j l ) .
For a G ( n , M ) graph, we have
E j = 1 , j i n E i j l i , j ( 1 E j l ) = n 1 1 N ( n 1 ) M 1 N M
since to make sure node j is only connected to node i, we need an edge between i and j chosen from n 1 nodes, and all the other M 1 edges are chosen from the edge set excluding j. Hence,
E | W i * W | ( n 1 ) N ( n 1 ) M 1 N M + N ( n 1 ) M N M .
Therefore, with p = M N and N = n 2 , by Equation (A2) we have
d T V ( L ( W ) ; P o ( λ ) ) = min ( 1 , λ 1 ) λ { ( n 1 ) N ( n 1 ) M 1 N M + N ( n 1 ) M N M } min ( 1 , λ 1 ) e n p ( 1 + n 2 N + 2 n ) + p ( 1 + n 2 N + 2 n ) + 2 3 n + n 2 N + 2 n = 1 + n p 1 + n + N p 2 N N p n + 2 ,
where the last step follows from standard inequalities. □
This bound tends to 0 as p = M N 1 as long as M N n + 1 .

Appendix A.2. Poisson Approximation for the Number of Isolated Nodes in a G(n,M) Graph after One Complete Knockout Attack

A complete knockout attack removes all the edges of a randomly picked node U. Assume that U = i . Let N = n 1 2 , set I j = 1 ( j is isolated in the graph before the attack), and
I j = 1 ( j   is   isolated   in   the   graph   after   the   attack ) .
Before the attack, denoting by d e g ( i ) the degree of i we have
P G ( n , M ) ( d e g ( i ) = k ) = n 1 k N ( n 1 ) M k N M : = p d e g k , n ,
for k min ( n 1 , M ) and 0 otherwise. Suppose that node i is attacked. Then, for j i and k min ( n 1 , M ) , as the edges in G ( n , M ) are distributed uniformly, if the attacked node has degree k then the graph after the attack is a G ( n 1 , M k ) graph. Hence, for k min ( n 1 , M ) ,
P ( I j = 1 | d e g ( i ) = k ) = N ( n 2 ) M k N M k = : π k ( n 1 ) ,
which is the same for each j i in the graph after the attack. Now, let W be the number of isolated nodes after one attack. We have
E ( W ) = 1 n v = 1 n E ( W | v is the vertex for duplication ) = 1 n v = 1 n k = 0 min ( n 1 , M ) P ( d e g ( v ) = k ) E ( W | v is the vertex for duplication , d e g ( v ) = k ) = k = 0 min ( n 1 , M ) λ k p d e g k , n
where
λ k E : = ( W | v is the vertex for duplication , d e g ( v ) = k ) .
Let Λ be a random variable taking values in λ k , k = 1 , , min ( n 1 , M ) , with
P ( Λ = λ k ) = p d e g k , n ,
using the notation (A4). Then, E Λ = k = 0 min ( n 1 , M ) p d e g k ( v ) λ k = E ( W ) . We now approximate the distribution of W by a mixed Poisson distribution. Let Z P o ( Λ ) . Then, for any function h,
E h ( W ) E h ( Z ) = k E ( h ( w ) | d e g ( U ) = k ) E h ( Z k ) p d e g k , n ,
where Z k P o ( λ k ) . For each choice of k, bounding | E ( h ( w ) | d e g ( U ) = k ) E h ( Z k ) | can then be carried out in a similar vein as for Theorem 1, as follows.
Theorem A2. 
For a G ( n , M ) graph after one complete knockout attack, we have
d T V ( L ( W ) ; P o ( Λ ) ) k = 0 min ( n 1 , M ) n 2 k N ( n 2 ) N p k k N N p k min ( 1 , λ k 1 ) e n p k ( 1 + n 3 N + 3 n ) + p k ( 2 + n 3 N + 3 n ) + 6 5 n + 2 n 2 N + 3 n = 1 + n p k 1 + n + N p k 3 N n N p k + 3 .
where W j = 1 n 1 I j ( n 1 ) , p k = M k N , and Λ given in (A5).
Proof. 
After one attack on node i of degree k the graph is a realisation of the G ( n 1 , M k ) model, together with an isolated node i. Again, we use size bias coupling. Given any realisation of G ( n 1 , M k ) , an associated realisation of G ( n 1 , M k ) conditional on I j = 1 is obtained simply by setting all the edge indicators ( E l j , E j l , l j n 1 , j l ) equal to zero. This may create additional isolated nodes, but cannot destroy any. By (A3), we have for k min ( n 1 , M )
E | W j * W | | d e g ( i ) = k ( n 2 ) N ( n 2 ) M k 1 N M k + N ( n 2 ) M k N M k .
Setting p k = M k N , using (A6) we have
d T V ( L ( W ) ; P o ( Λ ) ) k = 0 min ( n 1 , M ) p d e g k , n 1 min ( 1 , λ k 1 ) { ( n 2 ) N ( n 2 ) M 1 k N M k + N ( n 2 ) M k N M k } k = 0 min ( n 1 , M ) n 2 k N ( n 2 ) N p k k N N p k min ( 1 , λ k 1 ) e n p k ( 1 + n 3 N + 3 n ) + p k ( 2 + n 3 N + 3 n ) + 6 5 n + 2 n 2 N + 3 n       1 + n p k 1 + n + N p k 3 N n N p k + 3 .
To further bound this bound, we could bound p d e g k , n 1 by max k p , n 1 . More crudely, we can bound
d T V ( L ( W ) ; P o ( Λ ) ) k = 0 min ( n 1 , M ) e n p k ( 1 + n 3 N + 3 n ) + p k ( 2 + n 3 N + 3 n ) + 6 5 n + 2 n 2 N + 3 n = 1 + n p k 1 + n + N p k 3 N n N p k + 3 .

Appendix A.3. Poisson Approximation for the Number of Isolated Nodes in a G(n,M) Graph after One Partial Knockout Attack

A partial knockout attack on node i randomly removes half of its edges with other nodes. Here, we round down the number of edges removed, which means node i would never be isolated after one attack if its degree is 0 or 1 in the original graph G ( n , M ) . So, if the node i has degree k before the attack, then k / 2 edges are removed. Again, we let
I j = 1 ( j   is   isolated   in   the   graph   after   the   attack ) .
Theorem A3. 
For a G ( n , M ) graph after one partial knockout attack, we have
d T V ( L ( W ) ; P o ( Λ ) ) k = 0 n 2 n 2 k N ( n 2 ) N p k k N N p k min ( 1 , λ k 1 ) 1 + n p k 1 + n + N p k 3 N n N p k + 3 = e n p k ( 1 + n 3 N + 3 n ) + p k ( 2 + n 3 N + 3 n ) + 6 5 n + 2 n 2 N + 3 n ,
where W : = j = 1 n 1 I j ( n 1 ) , p k = M k 2 N , and Λ is given in (A5).
Proof. 
Firstly, letting d e g ( i ) denote the degree of node i in G ( n , m ) ,
P ( I i = 1 | d e g ( i ) = k ) = P ( I i = 1 | d e g ( i ) > 0 ) P ( d e g ( i ) > 0 ) + = P ( I i = 1 | d e g ( i ) = 0 ) P ( d e g ( i ) = 0 ) = 0 + π = π
and
P ( I j = 1 ) = k = 1 n 1 p d e g k , n P ( I j = 1 | d e g ( i ) = k )   for   j i .
In particular, we have
P ( I j = 1 | d e g ( i ) = k ) = P ( I j = 1 | d e g ( i ) = k ) + P ( i j , d e g ( j ) = 1 | d e g ( i ) = k ) P ( i j   is   deleted | i j , d e g ( j ) = 1 , d e g ( i ) = k ) .
For k m i n ( n 1 , M ) ,
P ( I j = 1 | d e g ( i ) = k ) = N k ( n 1 ) M k N M ; P ( i j , d e g ( j ) = 1 | d e g ( i ) = k ) = n 2 k 1 N ( 2 ( n 2 ) + 1 ) M k p d e g k , n N M .
Also,
P ( i j is deleted | i j , d e g ( j ) = 1 , d e g ( i ) = k ) = 1 2 if k is even k 1 2 k if k is odd .
Therefore,
P ( I j = 1 | d e g ( i ) = k ) = N k ( n 1 ) M k N M + n 2 k 1 N ( 2 ( n 2 ) + 1 ) M k p d e g k , n N M 1 2 if k is even N k ( n 1 ) M k N M + n 2 k 1 N ( 2 ( n 2 ) + 1 ) M k p d e g k , n N M k 1 2 k if k is odd
where for each case the first term represents the probability before attack and the second term represents the probability after attack.
Then, λ k : = E [ W | v is the vertex picked for duplication , d e g ( v ) = k ] is
λ k = π + ( n 1 ) N k ( n 1 ) M k N M + n 2 k 1 N ( 2 ( n 2 ) + 1 ) M k p d e g k , n N M 1 2 if k is even π + ( n 1 ) N k ( n 1 ) M k N M + n 2 k 1 N ( 2 ( n 2 ) + 1 ) M k p d e g k , n N M k 1 2 k if k is odd .
After one partial knockout attack, the graph is a realisation of the G ( n 1 , M k 2 ) model, combined with node i and its remaining edges if i does not become isolated, or the graph is a realisation of the G ( n 1 , M k 2 ) model combined with an isolated i if i becomes isolated. Conditioning on the different cases,
E h ( W ) E h ( Z ) = k = 0 n 1 { E [ h ( W ) | d e g ( i ) = k , i becomes isolated ) P ( i becomes isolated ) + E [ h ( W ) | d e g ( i ) = k , i does not become isolated ) P ( i does not become isolated ) E h ( Z k ) } p d e g k , n 1
where
Z k = Z k i s o P o ( λ k ) + 1 if v becomes isolated Z k n o n i s o P o ( λ k ) if v does not become isolated .
Hence,
E h ( W ) E h ( Z ) = k = 0 n 1 p d e g k , n { E [ h ( W ) | d e g ( i ) = k , i becomes isolated ] E h Z k i s o = P ( i becomes isolated ) + E [ h ( W ) | d e g ( i ) = k , i does not become isolated ] E h Z k n o n i s o = P ( i does not become isolated ) } = k = 0 n 2 p d e g k , n 1 { E [ g ( W ) | d e g ( i ) = k , i becomes isolated ] E g Z k i s o = P ( i becomes isolated ) + E [ g ( W ) | d e g ( i ) = k , i does not become isolated ] E g Z k n o n i s o = P ( i does not become isolated ) }
where g ( x ) = h ( x + 1 ) . Now, for each of the two cases we apply size bias coupling as in Theorem A2. We approximate E [ g ( W ) | d e g ( i ) = k , i becomes isolated ] by E [ g ( Z k i s o ) ] and we approximate E [ g ( W ) | d e g ( i ) = k , i does not become isolated ] by E [ g ( Z k n o n i s o ) ] .
Combining the bounds (A7) for the two cases,
E [ | W W | d e g ( i ) = k ] 1 n i = 1 n k = 0 n 2 p d e g k , n 1 { ( n 2 ) N ( n 2 ) M k / 2 1 N M k / 2 + N ( n 2 ) M k / 2 N M k / 2 = P ( i becomes isolated ) = + ( n 2 ) N ( n 2 ) M k / 2 1 N M k / 2 + N ( n 2 ) M k / 2 N M k / 2 P ( i does not become isolated )
k = 0 n 2 p d e g k , n 1 ( n 2 ) N ( n 2 ) M k / 2 1 N M k / 2 + N ( n 2 ) M k / 2 N M k / 2
Letting p k = M k 2 N , we have
d T V ( L ( W ) ; P o ( Λ ) ) k = 0 n 2 p d e g k , n 1 min ( 1 , λ k 1 ) { ( n 2 ) N ( n 2 ) M k 2 1 N M k 2 + N ( n 2 ) M k 2 N M k 2 } = k = 0 n 2 n 2 k N ( n 2 ) N p k k N N p k min ( 1 , λ k 1 ) { e n p k ( 1 + n 3 N + 3 n ) + p k ( 2 + n 3 N + 3 n ) + 6 5 n + 2 n 2 N + 3 n = 1 + n p k 1 + n + N p k 3 N n N p k + 3 } .
Again, we can use the crude upper bound
d T V ( L ( W ) ; P o ( Λ ) ) k = 0 n 1 e n p k ( 1 + n 3 N + 3 n ) + p k ( 2 + n 3 N + 3 n ) + 6 5 n + 2 n 2 N + 3 n = 1 + n p k 1 + n + N p k 3 N n N p k + 3 ,
which tends to 0 as p = M N 1 .

Appendix A.4. Poisson Approximation for the Number of Isolated Nodes in a G(n,M) Graph after One Distributed Knockout Attack

A distributed knockout attack on node i of degree k randomly removes its edges with other nodes according to a random distribution.
Theorem A4. 
In a G ( n , M ) graph, we have for W the number of isolated nodes after one distributed knockout attack,
d T V ( L ( W ) ; P o ( Λ ) ) k = 0 n 2 x = 0 k n 2 k N ( n 2 ) N p x k N N p x k x q x ( 1 q ) k x min ( 1 , λ k 1 ) = e n p x 1 + n 3 N + 3 n e p x 2 + n 3 N + 3 n + 6 5 n + 2 n 2 N + 3 n = 1 + n p x 1 + n + N p x 3 N n N p x + 3 .
where W : = j = 1 n 1 I j ( n 1 ) , λ k = q k + ( n 1 ) N k ( n 1 ) M k N M + n 2 k 1 N ( 2 ( n 2 ) + 1 ) M k p d e g k , n N M q , p x = M x N , and Λ is given in (A5).
Proof. 
Let X k B i n ( k , q ) denote the number of removed edges if the attacked node i has degree k. Let I j denote the indicator that node j is isolated after the attack. Then, after one attack, P ( I i = 1 | d e g ( i ) = k ) = q k , and for j i ,
P ( I j = 1 ) = k = 1 n 1 p d e g k , n 1 P ( I j = 1 | d e g ( i ) = k )
In particular,
P ( I j = 1 | d e g ( i ) = k ) = N k ( n 1 ) M k N M ,
and   P ( i j , d e g ( j ) = 1 | d e g ( i ) = k ) = n 2 k 1 N ( 2 ( n 2 ) + 1 ) M k p d e g k , n N M .
Also, we notice
P ( i j is deleted | i j , d e g ( j ) = 1 , d e g ( i ) = k ) = x = 0 k P ( X k = x ) P ( i j is deleted | X k = x ) = x = 0 k k x q k ( 1 q ) k x x k = q .
Therefore, substituting into Equation (A8), we obtain for k m i n ( n 1 , M )
P ( I j = 1 | d e g ( i ) = k ) = N k ( n 1 ) M k N M + n 2 k 1 N ( 2 ( n 2 ) + 1 ) M k p d e g k , n N M q .
Hence,
λ k E [ W | i is the vertex picked for duplication , d e g ( i ) = k ] = q k + ( n 1 ) N k ( n 1 ) M k N M + n 2 k 1 N ( 2 ( n 2 ) + 1 ) M k p d e g k , n N M q
After one distributed knockout attack on node i, let X k = x k . The graph is a realisation of the G ( n 1 , M k ) model if node i becomes isolated, and it is a realisation of the model G ( n 1 , M x k ) combined with node i and its remaining edges if node i does not become isolated. Any additional isolated nodes can only appear in the G ( n 1 , M k ) or G ( n 1 , M x k ) part of the model. With p x = M x N , a similar argument as for Equation (A10) gives as the upper bound for the total variation distance in a distributed knockout attack
d T V ( L ( W ) ; P o ( Λ ) ) k = 0 n 2 x = 0 k p d e g k , n 1 P ( X k = x ) min ( 1 , λ k 1 ) e n p x 1 + n 3 N + 3 n = e p x 2 + n 3 N + 3 n + 6 5 n + 2 n 2 N + 3 n 1 + n p x 1 + n + N p x 3 N n N p x + 3 = k = 0 n 2 x = 0 k n 2 k N ( n 2 ) N p x k N N p x k x q x ( 1 q ) k x min ( 1 , λ k 1 ) e n p x 1 + n 3 N + 3 n = e p x 2 + n 3 N + 3 n + 6 5 n + 2 n 2 N + 3 n 1 + n p x 1 + n + N p x 3 N n N p x + 3 .
Again, this bound can be bounded as
d T V ( L ( W ) ; P o ( Λ ) ) x = 0 n 1 e n p x ( 1 + n 3 N + 3 n ) + p x ( 2 + n 3 N + 3 n ) + 6 5 n + 2 n 2 N + 3 n = 1 + n p x 1 + n + N p x 3 N n N p x + 3 ,
which tends to 0 as p = M N 1 .

Appendix B. Additional Figures

Below, we present additional figures for this paper.

Appendix B.1. Additional Results for Simulated Networks Starting with a Triangle

Figure A1 shows the effect of p on duplication–divergence models starting with a triangle against different attacks. As p increases, the network efficiency decreases faster; however, the relative behaviour between strong and weak attacks remains unchanged. We show in Figure A2 that for a new node loss model starting with a triangle, the network efficiency also follows the same trend.
Figure A1. Effect of p when applying complete or weak knockout attacks on simulated networks from duplication–divergence models (i.e., q = 0 ) starting with a triangle.
Figure A1. Effect of p when applying complete or weak knockout attacks on simulated networks from duplication–divergence models (i.e., q = 0 ) starting with a triangle.
Entropy 26 00813 g0a1
Figure A2. Effect of p when applying complete or weak knockout attacks on simulated networks from the new node loss model starting with a triangle, with q = 0.4 .
Figure A2. Effect of p when applying complete or weak knockout attacks on simulated networks from the new node loss model starting with a triangle, with q = 0.4 .
Entropy 26 00813 g0a2
Figure 7 in the main text shows the effect of different attacks on simulated networks from the new node loss model starting with a triangle, with q = 0.2 , p = 0.4 . In Figure A3, we present simulation results on the new node models with other sets of parameters; we find that these display qualitatively similar behaviour against attacks. Compared to the simulations of new node loss model that begin with a single edge, starting with a triangle provides a more realistic representation. This is not only because the triangle-based simulations have a non-zero local and global clustering coefficient unlike the edge-based simulations, but we also notice that, when p = 0.4 , the network efficiency in the triangle-based simulations does not decline as rapidly as in Figure A7, more closely mirroring the behaviour of real PPI networks.
Figure A3. Network efficiency after up to 25 weak attacks on simulated networks from the new node loss model starting with a triangle with a divergence rate p = 0.4, where a node can be lost with probability q = 0.4 , 0.6, and 0.8. Left plots: knockout attacks. Blue line: complete knockout; red line: partial knockout with all the edges connected to one node being halved at each attack; green line: partial knockout with all the edges connected to two nodes being halved at each attack; orange line: partial knockout with all the edges connected to five nodes being halved at each attack. Right plots: attenuation attacks. Blue line: complete knockout; red line: partial attenuation with all the edges connected to one node being halved at each attack; green line: partial attenuation with all the edges connected to two nodes being halved at each attack; orange line: partial attenuation with all the edges connected to five nodes being halved at each attack.
Figure A3. Network efficiency after up to 25 weak attacks on simulated networks from the new node loss model starting with a triangle with a divergence rate p = 0.4, where a node can be lost with probability q = 0.4 , 0.6, and 0.8. Left plots: knockout attacks. Blue line: complete knockout; red line: partial knockout with all the edges connected to one node being halved at each attack; green line: partial knockout with all the edges connected to two nodes being halved at each attack; orange line: partial knockout with all the edges connected to five nodes being halved at each attack. Right plots: attenuation attacks. Blue line: complete knockout; red line: partial attenuation with all the edges connected to one node being halved at each attack; green line: partial attenuation with all the edges connected to two nodes being halved at each attack; orange line: partial attenuation with all the edges connected to five nodes being halved at each attack.
Entropy 26 00813 g0a3

Appendix B.2. Simulated Networks Starting with a Single Edge

In the main text, we present simulations of DD models and new node loss models initialised with a triangle. The figures below demonstrate that starting these models with an edge shows similar behaviour regarding the effect of attacks. However, when beginning with a single edge, no triangles are formed during the graph generation process, making the resulting networks less realistic for modelling PPI networks.
Figure A4 and Figure A5 show the effect of p in simulations from the DD model without and with node loss, respectively, starting with a single edge, for different attacks. We notice that for both models, the relative behaviour between strong and weak attacks remains unchanged for different values of p and the starting configuration of the simulations.
Figure A4. Effect of p when applying complete or weak knockout attacks on simulated networks from the duplication–divergence model starting with an edge.
Figure A4. Effect of p when applying complete or weak knockout attacks on simulated networks from the duplication–divergence model starting with an edge.
Entropy 26 00813 g0a4
Figure 7 in the main text gives the results for simulations from the new node loss model starting with a triangle, with q = 0.2 and p = 0.4 . Figure A6 shows similar behaviour when the model starts with a single edge and weakly attacks a greater number of nodes, namely that partial attacks can generate greater damage to networks than complete knockout attacks as the number of targeted nodes increases, while distributed attacks are less efficient than complete and partial attacks. Moreover, weak attacks show the same qualitative behaviour as for the real PPI networks, see Figure 5.
A similar conclusion on the effect of weak attacks in the new node loss model starting with an edge is obtained when q, the probability of node loss, equals 0.4, 0.6, and 0.8, as shown in Figure A7. Figure A8 also shows that the qualitative behaviour is similar when p = 0.2 .
Figure A5. Effect of p when applying complete or weak knockout attacks on simulated networks from the node loss model starting with an edge, with q = 0.4 .
Figure A5. Effect of p when applying complete or weak knockout attacks on simulated networks from the node loss model starting with an edge, with q = 0.4 .
Entropy 26 00813 g0a5
Figure A6. Weak attacks on simulated networks from the new node loss model starting with an edge where a node can be lost with probability q = 0.2 , using a divergence rate p = 0.4. The graph is undirected and has unit edge weight. Edges selected for distributed attacks are drawn from a random distribution. Top left: knockout attacks. Blue line: complete knockout; red line: partial knockout with all the edges connected to two nodes being halved at each attack; green line: partial knockout with all the edges connected to five nodes being halved at each attack; orange line: partial knockout with all the edges connected to ten nodes being halved at each attack. Top right: attenuation attacks. Blue line: complete knockout; red line: partial attenuation with all the edges connected to two nodes being halved at each attack; green line: partial attenuation with all the edges connected to five nodes being halved at each attack; orange line: partial attenuation with all the edges connected to ten nodes being halved at each attack. Bottom left: distributed attacks, with edges drawn from a random distribution; the horizontal line represents equivalent damage to the network achieved by one complete knockout. Bottom right: distributed attenuation attacks, with the weight of edges drawn from a random distribution to be halved; the horizontal line represents equivalent damage to the network achieved by one complete knockout.
Figure A6. Weak attacks on simulated networks from the new node loss model starting with an edge where a node can be lost with probability q = 0.2 , using a divergence rate p = 0.4. The graph is undirected and has unit edge weight. Edges selected for distributed attacks are drawn from a random distribution. Top left: knockout attacks. Blue line: complete knockout; red line: partial knockout with all the edges connected to two nodes being halved at each attack; green line: partial knockout with all the edges connected to five nodes being halved at each attack; orange line: partial knockout with all the edges connected to ten nodes being halved at each attack. Top right: attenuation attacks. Blue line: complete knockout; red line: partial attenuation with all the edges connected to two nodes being halved at each attack; green line: partial attenuation with all the edges connected to five nodes being halved at each attack; orange line: partial attenuation with all the edges connected to ten nodes being halved at each attack. Bottom left: distributed attacks, with edges drawn from a random distribution; the horizontal line represents equivalent damage to the network achieved by one complete knockout. Bottom right: distributed attenuation attacks, with the weight of edges drawn from a random distribution to be halved; the horizontal line represents equivalent damage to the network achieved by one complete knockout.
Entropy 26 00813 g0a6
Figure A7. Weak attacks on simulated networks from the new node loss model starting with an edge; p = 0.4 , q = 0.6 , or q = 0.8 , respectively. All the graphs are undirected with unit edge weight. Edges selected for distributed attacks are drawn from a random distribution.
Figure A7. Weak attacks on simulated networks from the new node loss model starting with an edge; p = 0.4 , q = 0.6 , or q = 0.8 , respectively. All the graphs are undirected with unit edge weight. Edges selected for distributed attacks are drawn from a random distribution.
Entropy 26 00813 g0a7
Figure A8. Weak attacks on simulated networks from the new node loss model starting with an edge; p = 0.2 and q = 0.4 . All the graphs are undirected with unit edge weight. Edges selected for distributed attacks are drawn from a random distribution.
Figure A8. Weak attacks on simulated networks from the new node loss model starting with an edge; p = 0.2 and q = 0.4 . All the graphs are undirected with unit edge weight. Edges selected for distributed attacks are drawn from a random distribution.
Entropy 26 00813 g0a8

Appendix B.3. More Results for PPI Networks

Figure A9 shows that when the thresholds of STRING scores which are used to filter E. coli and S. cerevisiae PPI networks are changed from 0.400 to 0.200 or 0.600, the qualitative impact of complete and weak attacks on the datasets are the same.
Figure A9. Effect of thresholds of STRING scores when applying complete or weak knockout attacks on real PPI networks, using the thresholds 0.200, 0.400, and 0.600.
Figure A9. Effect of thresholds of STRING scores when applying complete or weak knockout attacks on real PPI networks, using the thresholds 0.200, 0.400, and 0.600.
Entropy 26 00813 g0a9

References

  1. Ágoston, V.; Csermely, P.; Sándor, P. Multiple weak hits confuse complex systems: A transcriptional regulatory network as an example. Phys. Rev. 2005, 71, 051909. [Google Scholar] [CrossRef] [PubMed]
  2. Huang, S. Rational drug discovery: What can we learn from regulatory networks? Drug Discov. Today 2002, 7, s163–s169. [Google Scholar] [CrossRef] [PubMed]
  3. Prato, S.D.; Volpe, L. Rosiglitazone plus metformin: Combination therapy for Type 2 diabetes. Expert Opin. Pharmacother. 2004, 5, 2051. [Google Scholar]
  4. Kaelin, W.G. Gleevec: Prototype or Outlier? Sci. STKE 2004, 2004, pe12. [Google Scholar] [CrossRef] [PubMed]
  5. Solé, R.; Pastor-Satorras, R.; Smith, E.; Kepler, T.B. A model of large-scale proteome evolution. Adv. Complex Syst. 2002, 5, 43–54. [Google Scholar] [CrossRef]
  6. Chung, F.; Lu, L.; Dewey, G.; Galas, D. Duplication Models for Biological Networks. J. Comput. Biol. 2003, 10, 677–687. [Google Scholar] [CrossRef] [PubMed]
  7. Gibson, T.A.; Goldberg, D.S. Improving evolutionary models of protein interaction networks. Bioinformatics 2011, 27, 376–382. [Google Scholar] [CrossRef] [PubMed]
  8. Pastor-Satorras, R.; Smith, E.; Solé, R.V. Evolving protein interaction networks through gene duplication. J. Theor. Biol. 2003, 222, 199–210. [Google Scholar] [CrossRef] [PubMed]
  9. Ospina-Forero, L.; Deane, C.; Reinert, G. Assessment of model fit via network comparison methods based on subgraph counts. J. Complex Netw. 2019, 17, 226–253. [Google Scholar] [CrossRef]
  10. Hermann, F.; Pfaffelhuber, P. Large-scale behavior of the partial duplication random graph. Lat. Am. J. Probab. Math. Stat. 2016, 1408, 687–710. [Google Scholar] [CrossRef]
  11. Albalat, R.; Cañestro, C. Evolution by gene loss. Nat. Rev. Genet. 2016, 17, 379–391. [Google Scholar] [CrossRef] [PubMed]
  12. von Mering, C.; J Jensen, L.; Snel, B.; D Hooper, S.; Krupp, M.; Foglierini, M.; Jouffre, N.; Huynen, M.A.; Bork, P. STRING: Known and predicted protein-protein associations, integrated and transferred across organisms. Nucl. Acids Res. 2005, 33, D433–D437. [Google Scholar] [CrossRef] [PubMed]
  13. Bozhilova, L.V.; Whitmore, A.V.; Wray, J.; Reinert, G.; Deane, C.M. Measuring rank robustness in scored protein interaction networks. BMC Bioinform. 2019, 20, 446. [Google Scholar] [CrossRef] [PubMed]
  14. Barbour, A.; Lo, T.Y. The expected degree distribution in transient duplication divergence models. Lat. Am. J. Probab. Math. Stat. 2021, 19, 69–107. [Google Scholar] [CrossRef]
  15. Ispolatov, I.; Krapivsky, P.L.; Yuryev, A. Duplication-divergence model of protein interaction network. Phys. Rev. E 2005, 71, 061911. [Google Scholar] [CrossRef] [PubMed]
  16. Stern, D. The Chlamydomonas Sourcebook: Organellar and Metabolic Processes; Academic Press: Cambridge, MA, USA, 2008. [Google Scholar]
  17. Battiston, F.; Petri, G. Higher-Order Systems; Springer: Cham, Switzerland, 2022. [Google Scholar]
  18. Goldstein, L.; Rinott, Y. Multivariate Normal Approximations by Stein’s Method and Size Bias Couplings. J. Appl. Probab. 1996, 33, 1–17. [Google Scholar] [CrossRef]
  19. Barbour, A.; Holst, L.; Janson, S. Poisson Approximation; Oxford University Press: Oxford, UK, 1992. [Google Scholar]
  20. Barbour, A.; Reinert, G. Networks: Probability and Statistics. 2024; book manuscript in preparation. [Google Scholar]
Figure 1. Sensitivity analysis for the number of isolated nodes in the E. coli and S. cerevisiae PPI networks across varying STRING score thresholds.
Figure 1. Sensitivity analysis for the number of isolated nodes in the E. coli and S. cerevisiae PPI networks across varying STRING score thresholds.
Entropy 26 00813 g001
Figure 2. Attack strategies. (A) Complete knockout attack: all edges connected to the attacked node are eliminated. (B1) Partial knockout attack: half of the edges connected to the attacked node are eliminated. (C1) Distributed knockout attack: randomly selected edges are eliminated. Adapted from FIG.1 in [1].
Figure 2. Attack strategies. (A) Complete knockout attack: all edges connected to the attacked node are eliminated. (B1) Partial knockout attack: half of the edges connected to the attacked node are eliminated. (C1) Distributed knockout attack: randomly selected edges are eliminated. Adapted from FIG.1 in [1].
Entropy 26 00813 g002
Figure 3. Graph illustration of a duplication–divergence model.
Figure 3. Graph illustration of a duplication–divergence model.
Entropy 26 00813 g003
Figure 4. Graph illustration of a new duplication divergence model with node loss.
Figure 4. Graph illustration of a new duplication divergence model with node loss.
Entropy 26 00813 g004
Figure 5. The average number of edges in the 1-step ego network of an E. coli and S. cerevisiea PPI network after 25 attacks. (a) shows the average number of edges in the 1-step ego network in a E. coli PPI network under 25 knockout attacks. Blue line: complete knockout; red line: partial knockout with half of the edges connected to one node being removed at each attack; green line: partial knockout with half of the edges connected to two nodes being removed at each attack; orange line: partial knockout with half of the edges connected to five nodes being removed at each attack. (b) shows the average number of edges in the 1-step ego network in a S. cerevisiae PPI network under 25 attenuation attacks. Since a one-node halved knockout only deletes half of the edges connected to the selected node, when a node has a degree of at least 2 it causes less damage than a complete knockout which removes all the edges connected to the selected node.
Figure 5. The average number of edges in the 1-step ego network of an E. coli and S. cerevisiea PPI network after 25 attacks. (a) shows the average number of edges in the 1-step ego network in a E. coli PPI network under 25 knockout attacks. Blue line: complete knockout; red line: partial knockout with half of the edges connected to one node being removed at each attack; green line: partial knockout with half of the edges connected to two nodes being removed at each attack; orange line: partial knockout with half of the edges connected to five nodes being removed at each attack. (b) shows the average number of edges in the 1-step ego network in a S. cerevisiae PPI network under 25 attenuation attacks. Since a one-node halved knockout only deletes half of the edges connected to the selected node, when a node has a degree of at least 2 it causes less damage than a complete knockout which removes all the edges connected to the selected node.
Entropy 26 00813 g005
Figure 6. Network efficiency after up to 25 weak attacks on simulations from the duplication–divergence model starting with a triangle with a divergence rate p = 0.4 . Top left: knockout attacks. Blue line: complete knockout; red line: partial knockout with half of the edges connected to one node being removed at each attack; green line: partial knockout with half of the edges connected to two nodes being removed at each attack; orange line: partial knockout with half of the edges connected to five nodes being removed at each attack. Top right: attenuation attacks. Blue line: complete knockout; red line: partial attenuation with all the edges connected to one node being halved at each attack; green line: partial attenuation with all the edges connected to two nodes being halved at each attack; orange line: partial attenuation with all the edges connected to five nodes being halved at each attack. Bottom left: distributed attacks, with edges drawn from a random distribution; the horizontal line represents equivalent damage to the network achieved by one complete knockout. Bottom right: distributed attenuation attacks, with the weight of edges drawn from a random distribution to be halved; the horizontal line represents equivalent damage to the network achieved by one complete knockout.
Figure 6. Network efficiency after up to 25 weak attacks on simulations from the duplication–divergence model starting with a triangle with a divergence rate p = 0.4 . Top left: knockout attacks. Blue line: complete knockout; red line: partial knockout with half of the edges connected to one node being removed at each attack; green line: partial knockout with half of the edges connected to two nodes being removed at each attack; orange line: partial knockout with half of the edges connected to five nodes being removed at each attack. Top right: attenuation attacks. Blue line: complete knockout; red line: partial attenuation with all the edges connected to one node being halved at each attack; green line: partial attenuation with all the edges connected to two nodes being halved at each attack; orange line: partial attenuation with all the edges connected to five nodes being halved at each attack. Bottom left: distributed attacks, with edges drawn from a random distribution; the horizontal line represents equivalent damage to the network achieved by one complete knockout. Bottom right: distributed attenuation attacks, with the weight of edges drawn from a random distribution to be halved; the horizontal line represents equivalent damage to the network achieved by one complete knockout.
Entropy 26 00813 g006
Figure 7. Network efficiency after up to 25 weak attacks on simulations from the new node loss model starting with a triangle; a node can be lost with probability q = 0.2 , using a divergence rate p = 0.4. The graph is undirected and has unit edge weight. Top left: knockout attacks. Blue line: complete knockout; red line: partial knockout with half of the edges connected to one node being removed at each attack; green line: partial knockout with half of the edges connected to two nodes being removed at each attack; orange line: partial knockout with half of the edges connected to five nodes being removed at each attack. Top right: attenuation attacks. Blue line: complete knockout; red line: partial attenuation with all the edges connected to one node being halved at each attack; green line: partial attenuation with all the edges connected to two nodes being halved at each attack; orange line: partial attenuation with all the edges connected to five nodes being halved at each attack. Bottom left: distributed attacks, with edges drawn from a random distribution; the horizontal line represents equivalent damage to the network achieved by one complete knockout. Bottom right: distributed attenuation attacks, with the weight of edges drawn from a random distribution to be halved; the horizontal line represents equivalent damage to the network achieved by one complete knockout.
Figure 7. Network efficiency after up to 25 weak attacks on simulations from the new node loss model starting with a triangle; a node can be lost with probability q = 0.2 , using a divergence rate p = 0.4. The graph is undirected and has unit edge weight. Top left: knockout attacks. Blue line: complete knockout; red line: partial knockout with half of the edges connected to one node being removed at each attack; green line: partial knockout with half of the edges connected to two nodes being removed at each attack; orange line: partial knockout with half of the edges connected to five nodes being removed at each attack. Top right: attenuation attacks. Blue line: complete knockout; red line: partial attenuation with all the edges connected to one node being halved at each attack; green line: partial attenuation with all the edges connected to two nodes being halved at each attack; orange line: partial attenuation with all the edges connected to five nodes being halved at each attack. Bottom left: distributed attacks, with edges drawn from a random distribution; the horizontal line represents equivalent damage to the network achieved by one complete knockout. Bottom right: distributed attenuation attacks, with the weight of edges drawn from a random distribution to be halved; the horizontal line represents equivalent damage to the network achieved by one complete knockout.
Entropy 26 00813 g007
Figure 8. Effect of q on the efficiency of weak attacks on simulated networks from the node loss model starting from a triangle with p = 0.4, and q ranges from 0.2, 0.4, 0.6, to 0.8.
Figure 8. Effect of q on the efficiency of weak attacks on simulated networks from the node loss model starting from a triangle with p = 0.4, and q ranges from 0.2, 0.4, 0.6, to 0.8.
Entropy 26 00813 g008
Table 1. Summary statistics for the analysed networks; No. stands for Number of.
Table 1. Summary statistics for the analysed networks; No. stands for Number of.
NetworksE. coliS. cerevisiae
No. nodes30435925
No. edges52,914140,402
No. isolated nodes11411100
Average degree28.0559.51
Average local clustering coefficient0.310.40
Global clustering coefficient0.250.80
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, R.; Reinert, G. Simulating Weak Attacks in a New Duplication–Divergence Model with Node Loss. Entropy 2024, 26, 813. https://doi.org/10.3390/e26100813

AMA Style

Zhang R, Reinert G. Simulating Weak Attacks in a New Duplication–Divergence Model with Node Loss. Entropy. 2024; 26(10):813. https://doi.org/10.3390/e26100813

Chicago/Turabian Style

Zhang, Ruihua, and Gesine Reinert. 2024. "Simulating Weak Attacks in a New Duplication–Divergence Model with Node Loss" Entropy 26, no. 10: 813. https://doi.org/10.3390/e26100813

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop