Next Article in Journal
Application of Hybrid Model between the Technique for Order of Preference by Similarity to Ideal Solution and Feature Extractions for Bearing Defect Classification
Previous Article in Journal
A Note on the Geometry of RW Space-Times
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution

Physiological Controls Research Center, Óbuda University, 1034 Budapest, Hungary
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(6), 1441; https://doi.org/10.3390/math11061441
Submission received: 12 February 2023 / Revised: 9 March 2023 / Accepted: 12 March 2023 / Published: 16 March 2023

Abstract

:
Inhomogeneous random graphs are commonly used models for complex networks where nodes have varying degrees of connectivity. Computing the degree distribution of such networks is a fundamental problem and has important applications in various fields. We define the inhomogeneous random graph as a random graph model where the edges are drawn independently and the probability of a link between any two vertices can be different for each node pair. In this paper, we present an exact and an approximation method to compute the degree distribution of inhomogeneous random graphs using the Poisson binomial distribution. The exact algorithm utilizes the DFT-CF method to compute the distribution of a Poisson binomial random variable. The approximation method uses the Poisson, binomial, and Gaussian distributions to approximate the Poisson binomial distribution.

1. Introduction

Random graphs are widely used to model complex systems such as social networks, biological networks, and the internet. The degree distribution is an important characteristic of a network, as it provides information about the connectivity of nodes in the network [1], and its shape determines many network phenomena, such as robustness [2,3,4] or spreading processes [5,6,7]. Inhomogeneous random graphs are a type of random graph where the nodes are not equally likely to be connected. Instead, the probability of two nodes being connected depends on their attributes or characteristics. Example applications of inhomogeneous random graphs are social network analysis [8,9], modelling biological networks [10], or modelling transportation networks [11].
In the literature of network science and random graphs, inhomogeneous random graphs are not a well-defined random graph model, but they are a family of random graph models, where the nodes have varying degrees of connectivity. One example for inhomogeneous random graphs is the stochastic block model [8,9]. A stochastic block model (SBM) is defined by a V = C 1 C 2 C r partition of the vertex set and a r × r symmetric P edge probability matrix. For any two vertices u C i and v C j , the draw probability of the { u , v } undirected edge is P i j . Therefore, for any { u , v } vertex pair, the  p u v probability that the nodes u and v are connected is directly determined by the model parameters (the P matrix), and the links are drawn independently if the P matrix is given. A second example is the generalized random graph [12,13]. In case of a generalized random graph (GRG), the inhomogeneity is introduced into the model using vertex weights. For any i node of the network, there is given a w i > 0 vertex weight, and the probability that a edge is drawn between the nodes i and j is equal to
p i j = w i w j S + w i w j ,
where S = w 1 + + w n is the total weight of all vertices. The consequence of this definition is that vertices with high weight are more likely to have many neighbours than vertices with small weights, and vertices with extremely high weights could act as hubs observed in many real-world networks. Furthermore, if the w 1 , , w n parameters are deterministic and given, the edge probabilities can be computed with (1), and the edges are independent. A third example for inhomogeneous random graphs is the biased static edge voting model [14]. We use this model in our numerical tests in Section 4, and it is briefly described in Section 4.2. Similarly to the SBM and GRG models, if the model parameters are given, the  p i j link probability of any { i , j } node pair can be directly computed (see Equation (66)), and the edges are drawn independently.
For all these example inhomogeneous random graph models, the common property is that the edge probabilities can be directly computed from the model definition, and the edges are drawn independently. We define the inhomogeneous random graph (IRG) model via these properties (as it is defined in [13] (Section 6.7)). The inhomogeneous random graph is a random graph model on the vertex set V = { 1 , , n } , where the p a b draw probability of any { a , b } edge is given, and the edges are drawn independently. The IRG model can be considered as the natural generalization of the Erdős–Rényi (ER) random graph [15], where each link of the graph is drawn independently with a fixed p probability. If we set all the edge probabilities of the IRG model to a fixed p value, then we obtain an ER random graph. It is well known that the degree distribution of the ER model is close to the Poisson distribution [15]. When we observe the degree sequence of real-world networks, we often see that their empirical degree distribution has a fat tail [13]. Therefore, the ER random graph cannot be used to model real-world networks.
If the parameters are deterministic and given, the SBM, GRG, and the static edge voting model can be represented by an IRG. A further example for such a model is the Chung–Lu random graph [16]. However, not all inhomogeneous random graph models can be expressed as IRG. For example, the Norros–Reittu model [17] is a random multigraph model, while IRG is a model of a simple graph. A second example is the GRG model with random weights. Using random weights in GRG breaks the independence of the edges. A third example is the Barabási–Albert (BA) model [18]. The BA model is a dynamic network growth model, and for this case, we cannot derive the edge probabilities.
In this paper, we discuss a novel algorithm what allow us to compute the exact degree distribution of the IRG model and an approximation method to estimate the IRG degree distribution. The hardness of computing the degree distribution of the IRG model comes from the fact that each edge candidate of the network may have different draw probabilities; therefore, the degree distribution of any node is Poisson binomial (PB) [19,20]. The algorithm that we have developed to compute the degree distribution of the IRG model is based on the DFT-CF method invented by Yili Hong [19], and the approximation method uses the Poisson, binomial, and the Gaussian distributions to approximate the PB distribution. The proposed algorithms can be used to compute or approximate the degree distribution of any random graph model that can be represented by an IRG.
The structure of the remaining part of this paper is as follows: Section 2 contains the mathematical preliminaries of our study. In Section 2.1, we introduce the necessary notations and definitions. In Section 2.2, we briefly discuss the DFT-CF algorithm for computing the Poisson binomial distribution. Section 2.3 contains selected results about the approximation of the Poisson binomial distribution. In Section 3, we discuss the proposed algorithms to compute or approximate the degree distribution of the IRG model. In  Section 3.1, we formally define the problem that we aim to solve. In  Section 3.2, we present an exact algorithm to compute the degree distribution of the inhomogeneous random graph, and  in Section 3.3, we discuss an approximation method to estimate this distribution. In the first part of Section 3.3, we outline the general scheme of the approximation method; then, we provide an upper bound of the approximation error for the special cases, when the approximator distribution is Poisson (Section 3.3.1), binomial (Section 3.3.2), and Gaussian (Section 3.3.3). The results of the numerical experiments are provided in Section 4. The study is concluded with the discussion in Section 5.
The contribution of the authors are a novel algorithm to compute the exact degree distribution of the IRG model utilizing the DFT-CF method (Section 3.2) and the analysis of the estimation method for this distribution (Section 3.3). The idea of the approximation scheme is simple and not new: we group the similar nodes into clusters and apply the same approximator distribution within a cluster. Our contribution here is the analysis of the approximation error in the specific cases when the approximator distribution is Poisson (Section 3.3.1), binomial (Section 3.3.2), and Gaussian (Section 3.3.3).

2. Preliminaries

2.1. Notations and Definitions

We denote the set { 1 , , n } as [ n ] . A simple graph G is defined as a pair ( V ( G ) , E ( G ) ) , where V ( G ) is the set of vertices and E ( G ) is the set of edges. The vertices are labeled with integers, so the vertex set of an n-vertex graph is V = [ n ] . The degree of a vertex a is defined as the number of neighbors that a has in G. We denote the degree of a as d ( a ) or simply d a . The degree distribution of a deterministic or random graph G is defined as the distribution of d ( U ) , where U is a randomly and uniformly chosen vertex. Even in the case of a deterministic graph, d ( U ) is a random variable. In the deterministic case, we can express P ( d ( U ) = k ) as { a V d a = k } / n . If G is a random graph, then d a is a random variable, and we refer to the distribution of d a as the degree distribution of vertex a.
The Poisson binomial random variable N is defined as the sum of n independent random indicators: N = i = 1 n I i , where I i Bernoulli ( p i ) , i = 1 , , n . Note that N takes value in { 0 , 1 , , n } . We say that the p 1 , , p n values are the parameters of the distribution, and we use the notation N P B ( p 1 , , p n ) . When all p i s are identical, the distribution of N is binomial. Let ξ k = P ( N = k ) , k = 0 , 1 , , n be the probability mass function (pmf) for the Poisson binomial random variable N. The pmf of N can be expressed as:
ξ k = A H k j A p j j A c ( 1 p j ) ,
where H k is the set of all subsets of k integers that can be selected from [ n ] , and  A c is the complementary set of A in [ n ] . The direct use of this formula is computationally very expensive.
We introduce now the inhomogeneous random graph (IRG) model [13] (Section 6.7), denoted by I R G n ( P ) , where n is the number of vertices and P = { p i j } is a set of edge probabilities. In this model, edges are drawn independently, and the probability of drawing the edge { i , j } is given by p i j for all 1 i < j n . We formalize this as follows:
Definition 1.
The inhomogeneous random graph model I R G n ( P ) is defined as a random graph with vertex set [ n ] and edge probabilities P = { p i j } , where each edge { i , j } for all 1 i < j n is drawn independently with probability p i j .
The parameters of an IRG can also be represented by an n × n symmetric P matrix with P i i = 0 for all i [ n ] , and  P i j = P j i = p i j for any 1 i < j n . Since the elements of the P matrix are probabilities, therefore, 0 P i j 1 for all i , j [ n ] . By definition, the degree distribution of any i node in I R G n ( P ) is Poisson binomial with the parameters { p i 1 , , p i , i 1 , p i , i + 1 , , p i n } .
We briefly introduce the discrete Fourier transformation (DFT). DFT transforms the sequence of n + 1 complex numbers { x 0 , x 1 , , x n } into another sequence of complex numbers { y 0 , y 1 , , y n } , where the transformation is defined by the formula y k = l = 0 n x l exp ( i ω k l ) , k = 0 , 1 , , n , and  ω = 2 π / ( n + 1 ) . There are fast Fourier transform (FFT) algorithms to compute DFT efficiently. The best known and most commonly used FFT algorithm is the Cooley–Tukey algorithm [21].
We define the total variation norm [22,23] and, based on this, the total variational distance. Consider a signed measure μ on a measurable space ( X , Σ ) . First, we define two non-negative measures:
ν ¯ ( μ , E ) = sup μ ( A ) : A Σ and A E for all E Σ .
ν ̲ ( μ , E ) = inf μ ( A ) : A Σ and A E for all E Σ .
The total variation norm of the μ measure is defined as:
μ T V = ν ¯ ( μ , Σ ) + ν ̲ ( μ , Σ ) .
The total variational distance of the probability measures P and Q on the same ( Ω , F ) measurable space is defined as:
d T V ( P , Q ) = P Q T V = 2 sup | P ( A ) Q ( A ) | : A Ω .
The factor 2 above is usually dropped. Informally, this is the largest possible difference between the probabilities that two probability measures can assign to the same event. For discrete probability distributions, it is possible to write the T V distance as follows, where the 1 2 factor is applied to normalize d T V ( P , Q ) to the range [ 0 , 1 ] :
d T V ( P , Q ) = 1 2 x | P ( x ) Q ( x ) | .
We continue with the definition of p-norm and p-distance. For any p 1 integer, the p-norm [24] of the f L p function is defined as:
f p : = f ( x ) p d x 1 p .
The L p distance [24] of the functions f , g L p induced by the p-norm is given by:
d p ( f , g ) = f g p : = f ( x ) g ( x ) p d x 1 p .
We will use the notation L ( X ) to refer to the distribution of a random variable X.

2.2. Computing the Poisson Binomial Distribution: The DFT-CF Algorithm

Yili Hong showed in [19] that the probability mass function and the cumulative distribution function of the PB distribution can be computed directly using DFT. In particular, if  N P B ( p 1 , , p n } , then for all k = 0 , 1 , , n :
ξ k = P ( N = k ) = 1 n + 1 l = 0 n exp ( i ω l k ) x l ,
where x l = j = 1 n 1 p j + p j exp ( i ω l ) and ω = 2 π / ( n + 1 ) . In other words:
{ ξ 0 , ξ 1 , , ξ n } = 1 n + 1 D F T { x 0 , x 1 , , x n } .
Hong also provided an effective implementation of (11) in [19]. Let x l = a l + i b l for all l { 0 , 1 , , n } , where a l and b l are the real and imaginary parts of x l , respectively, and  i = 1 . It can be shown that x 0 = k = 0 n ξ k = 1 , and for l > 0 , the complex conjugate of x l can be expressed as:
x l ¯ = x n + 1 l = a n + 1 l i b n + 1 l , l = 1 , , n .
Thus, a l = a n + 1 l and b l = b n + 1 l . Let z j ( l ) = 1 p j + p j cos ( ω l ) + i p j sin ( ω l ) , and denote the modulus and the argument of z j ( l ) by | z j ( l ) | and A r g [ z j ( l ) ] , respectively. Then, a l and b l can be explicitly expressed by z j ( l ) . For all l = 1 , , n :
a l = d l cos j = 1 n A r g [ z j ( l ) ] ,
b l = d l sin j = 1 n A r g [ z j ( l ) ] ,
d l = exp j = 1 n log [ | z j ( l ) | ] .
Here, | z j ( l ) | = [ 1 p j + p j c o s ( ω l ) ] 2 + [ p j s i n ( ω l ) ] 2 1 / 2 and A r g [ z j ( l ) ] = a t a n 2 [ p j s i n ( ω l ) , 1 p j + p j c o s ( ω l ) ] . The function a t a n 2 ( y , x ) is defined as:
a t a n 2 ( y , x ) = a r c t a n ( y x ) x > 0 π + a r c t a n ( y x ) y 0 , x < 0 π + a r c t a n ( y x ) y < 0 , x < 0 π 2 y > 0 , x = 0 π 2 y < 0 , x = 0 0 y = 0 , x = 0
According to this, we can use Algorithm 1 to compute the pdf of N, where [ . ] denotes the ceiling function.
Algorithm 1 Computing the Poisson binomial pdf
 Let x 0 = 1 .
 Let x l = a l + i b l for l = 1 , , n , where:
a l , b l = are computed using Equations ( 13 ) and ( 14 ) for l = 1 , , [ n / 2 ] a l = a n + 1 l and b l = b n + 1 l for l = [ n / 2 ] + 1 , , n
{ ξ 0 , ξ 1 , , ξ n } = 1 n + 1 F F T { x 0 , x 1 , , x n }
return  { ξ 0 , ξ 1 , , ξ n }
The derivation of (10) is based on the characteristic function and DFT; therefore, the method is called the DFT-CF algorithm.

2.3. Approximation of the Poisson Binomial Distribution

In this section, we discuss various approximations of the Poisson binomial distribution. We will use these results in Section 3.3 to derive upper bounds for the approximation error of the estimated IRG degree distribution. More information about approximating the PB distribution, as well as other results, can be found in the paper [20] by Wenpin Tang and Fengmin Tang.

2.3.1. Poisson Approximation

First, we consider the use of the Poisson distribution as an approximation of the PB distribution. We use the notation P o i ( μ ) to the Poisson distribution with parameter μ . If X follows the PB distribution with parameters p 1 , , p n , then we can approximate the distribution of X by the Poisson distribution with the parameter μ = p 1 + + p n . The following theorem shows us how well the Poisson distribution approximates the Poisson binomial distribution.
Theorem 1
([20,25]). Let X P B ( p 1 , , p n ) and μ = i = 1 n p i . Then
1 32 m i n ( 1 , 1 μ ) i = 1 n p i 2 d T V ( L ( X ) , P o i ( μ ) ) 1 e μ 2 μ i = 1 n p i 2
We see in [20] from (17) that the Poisson approximation of the PB distribution is good if μ σ 2 = i = 1 n p i 2 i = 1 n p i , or equivalently, if  μ σ 2 μ . There are two cases:
  • For small μ , the upper bound in (17) is sharp.
  • For large μ , the approximation error is on the order of i = 1 n p i 2 / i = 1 n p i .
The constant 1 / 32 in the lower bound can be improved to 1 / 14 [26]. The Poisson approximation can be viewed as a mean-matching procedure.
In Section 3.3.1, we will use the following theorem to compute the total variation distance of two differently parametrized Poisson distribution functions:
Theorem 2
([27]). For any t > 0 and x 0 :
d T V ( P o i ( t + x ) , P o i ( t ) ) m i n x , 2 e t + x t .

2.3.2. Binomial Approximation

We denote the binomial distribution with parameters n and p by B i n ( n , p ) . Suppose that X P B ( p 1 , , p n ) and μ = i = 1 n p i . Then, we can use B i n ( n , μ / n ) as an approximation of the distribution of X. The first result on the approximation precision of the Poission binomial distribution using the binomial distribution is due to Ehm [20,28]. The advantage of the binomial approximation over the Poisson approximation is justified by Theorem 3 from Choi and Xia:
Theorem 3
([20,29]). Let X P B ( p 1 , , p n ) and μ : = i = 1 n p i . For  m 1 , let d m : = d T V ( L ( X ) , B i n ( m , μ / m ) ) . Then, for an m sufficiently large,
d m < d m + 1 < < d T V ( L ( X ) , P o i ( μ ) ) .

2.3.3. Gaussian Approximation

We denote the Gaussian distribution with expected value μ and variance σ 2 by N ( μ , σ 2 ) . The Gaussian approximation of the Poisson binomial distribution follows from the Lyapunov or Lindenberg central limit theorem [30]. If  X P B ( p 1 , , p n ) , μ = i = 1 n p i and σ 2 : = i = 1 n p i ( 1 p i ) , then we can use the Gaussian distribution with the parameters μ and σ 2 to approximate the distribution of X. The following theorem gives an upper bound for the error of Gaussian approximation in terms of p-distance:
Theorem 4
([20,31]). Let X P B ( p 1 , , p n ) , μ : = i = 1 n p n and σ 2 : = i = 1 n p i ( 1 p i ) . Then there exists a universal constant C > 0 such that
d p ( L ( X ) , N ( μ , σ 2 ) ) C σ for all p 1 .

3. Materials and Methods

3.1. Problem Formulation

Suppose we are given an I R G n ( P ) inhomogeneous random graph model (see Definition 1) with edge probabilities P = { p i j } , 1 i < j n . We aim to compute the degree distribution of I R G n ( P ) . In particular, suppose that the node set of I R G n ( P ) is V = [ n ] , and U is a uniformly distributed random variable on the integers V. We are looking for the probabilities λ k = P ( d ( U ) = k ) for all k { 0 , , n 1 } , where d ( a ) is the degree of node a V . In Section 3.2, we present an exact algorithm to compute { λ 0 , , λ n 1 } , while in Section 3.3, we discuss an approximation method to estimate the values of { λ 0 , , λ n 1 } .

3.2. Computing the Degree Distribution of Inhomogeneous Random Graph

We can apply (10) directly to compute the degree distribution of I R G n ( P ) . First, let us express λ k as:
λ k = a V P [ U = a ] P [ d U = k | U = a ] = 1 n a V P [ d a = k ] .
Since d a has a PB distribution with parameters { p a b : b V { a } } , we can use Equation (10):
P [ d a = k ] = 1 n l = 0 n 1 exp ( i ω l k ) x a l ,
where ω = 2 π / n and x a l = b V { a } ( 1 p a b + p a b exp ( i ω l ) ) . After substituting Equation (22) to the right side of Equation (21), we have:
λ k = 1 n 2 a V l = 0 n 1 exp ( i ω l k ) x a l = 1 n 2 l = 0 n 1 exp ( i ω l k ) a V x a l = 1 n 2 l = 0 n 1 exp ( i ω l k ) α l ,
where α l = a V x a l . In other words, { λ 0 , λ 1 , , λ n 1 } values can be expressed using the discrete Fourier transform:
{ λ 0 , λ 1 , , λ n 1 } = 1 n 2 D F T { α 0 , α 1 , , α n 1 } .
We often have additional information about the structure of the I R G (for example, when the IRG is used to represent a SBM or static edge voting model). Suppose that a partition V = i = 1 m M i of V is given where, for any i , j { 1 , , m } , i j , M i M j = , and the degree distribution of the nodes within the same M i group is the same: for any a , b M i : L ( d a ) = L ( d b ) . We choose a representative element of each M i set, which we denote by r ( M i ) . Since the degree distribution of the nodes in M i is the same, r ( M i ) can be chosen arbitrarily. Given this partition of V, we can rewrite Equation (21) as:
λ k = i = 1 m a M i P [ U = a ] P [ d a = k ]
Since nodes belonging to the same M i cluster have the same degree distribution, therefore, λ k = i = 1 m | M i | · P [ r ( M i ) = k ] . Hence, α l can be computed as:
α l = i = 1 m | M i | · x r ( M i ) l .
In a similar way, we can use the partitions of V to rewrite Equations (13)–(15):
a l = d l cos j = 1 m | M j | A r g [ z r ( M j ) ( l ) ] ,
b l = d l sin j = 1 m | M j | A r g [ z r ( M j ) ( l ) ] ,
d l = exp j = 1 m | M j | log [ | z r ( M j ) ( l ) | ] .
Based on this analysis, we give the pseudo-code of the algorithm to compute the exact degree distribution of the IRG model in Algorithm 2. Algorithm 2 uses Algorithm 3 to compute the x a l values. Algorithm 3 is the modified version of the first part of Algorithm 1, which computes x a 0 , , x a n 1 for a fixed a node, taking advantage of the M 1 , , M m clusters of V according to (27)–(29). The inputs of Algorithm 3 are p a r r a y and the M = { M 1 , , M m } partition of V, where p a r r a y contains the parameters of the PB distribution. The vector p a r r a y is a slice of the edge probability matrix: for any b V { a } , p a r r a y [ b ] = p a b , which is the probability of the { a , b } undirected link being created in the IRG model. Note that p a r r a y [ a ] = p a a = 0 by definition. Algorithm 2 calculates the pdf of the input IRG model invoking Algorithm 3 and utilizing (26). Its inputs are i r g _ m x and the M = { M 1 , , M m } clusters of V. The parameter i r g _ m x is the matrix representation of the IRG model. For any a , b V , i r g _ m x [ a , b ] = p a b . Because of the undirected nature of the IRG model, the matrix i r g _ m x is symmetric.

3.3. Approximation of the Degree Distribution of Inhomogeneous Random Graph

In this section, we present an approximation method to estimate the degree distribution of the IRG model. Suppose that given a V = i = 1 m M i partition of V, where for any i , j { 1 , , m } , i j : M i M j = . We consider the M i sets as node clusters, where within a given cluster, the degree distribution of the nodes is similar but not necessarily the same. Let us denote the cumulative distribution function of d ( U ) by Λ ( t ) :
Λ ( t ) = P [ d ( U ) t ] = a V P [ d ( U ) t | U = a ] P [ U = a ] = 1 n a V F a ( t ) = 1 n i = 1 m a M i F a ( t ) ,
where F a ( t ) is the CDF of d ( a ) , F a ( t ) = P [ d ( a ) t ] for all a V . Similarly, we can express Λ ( t ) as:
Λ ( t ) = i = 1 m P [ d ( U ) t | U M i ] P [ U M i ] = 1 n i = 1 m | M i | P [ d ( U ) t | U M i ] .
Algorithm 2 compute_irg_pdf(irg_mx, M = { M 1 , , M m } )
N = number of rows in i r g _ m x
 Initialize α 0 , , α N 1 to be 0
for each  M M do
        representative_node = r ( M )
        p a r r a y = i r g _ m x [representative_node]
        remove representative_node from M
        x = compute_x_vector( p a r r a y , M )
        add representative_node to M
        for i = 0 to N − 1 do
              α [ i ] = α [ i ] + | M | · x [ i ]
        end for
end for
for i = 0 to N − 1 do
        α [ i ] = α [ i ] / ( ( N 1 ) 2 )
end for
{ λ 0 , , λ N 1 } = F F T { α 0 , , α N 1 }
return  λ
Algorithm 3 compute_x_vector( p a r r a y , M = { M 1 , , M m } )
n = size( p a r r a y ) 1
for l = 0 to n do
      if l = 0 then
            x [ l ] = 1
      else if  l [ n / 2 ]  then
             s u m A r g = 0
             s u m L n M o d = 0
            for each  M M  do
                  if  | M | < 1  then
                          continue
                  end if
                   n o d e = any node from cluster M
                   p = p a r r a y [ n o d e ]
                   q 1 = 1 p + p cos ( ω l )
                   q 2 = p sin ( ω l )
                   s u m A r g = s u m A r g + a t a n 2 ( q 2 , q 1 )
                   s u m L n M o d = s u m L n M o d + l n ( q 1 2 + q 2 2 )
            end for
             d = exp ( s u m L n M o d )
             a = d cos ( s u m A r g )
             b = d sin ( s u m A r g )
             x [ l ] = a + b i
      else
            i d x = n l + 1
            a = R e l ( x [ i d x ] )
            b = I m ( x [ i d x ] )
            x [ l ] = a b i
      end if
end for
return x
We approximate the P [ d ( U ) t | U M i ] conditional distribution by F M i ( t ) :
F M i ( t ) P [ d ( U ) t | U M i ] .
We denote the approximation of Λ ( t ) by F ( t ) . We express F ( t ) as the linear combination of the functions F M i ( t ) :
Λ ( t ) F ( t ) = 1 n i = 1 m a M i F M i ( t ) = 1 n i = 1 m | M i | F M i ( t ) .
Based on this analysis, we present the scheme of the proposed approximation method in Algorithm 4. The input parameters of this algorithm are x, the  i r g _ m x edge probability matrix, and the M partition of the nodes. Algorithm 4 returns F ( x ) , the approximated value of Λ ( x ) . Note that we have not specified the computation of F M i in Algorithm 4. There are several possible ways to compute F M i . In the subsequent sections, we will describe some possible implementations. In Section 3.3.1, we will use the Poisson distribution; in Section 3.3.2, we will use the binomial distribution; and in Section 3.3.3, we wil use the Gaussian distribution to calculate F M i .
Algorithm 4 approximate_irg_CDF(x, irg_mx, M = { M 1 , , M m }
n = number of rows in i r g _ m x
 Initialize r e t to be 0
for each  M M do
        y = compute_cluster_CDF_approximation(x, irg_mx, M)
         r e t = r e t + | M | n · y
end for
return  r e t
We will use the results of the following analysis to derive an upper bound for the approximation error in the special cases when the approximator distributions are Poisson, binomial or Gaussian. Suppose we are given an ( X , . ) -normed space, and the function Λ ( t ) and its approximation F ( t ) are in X. Consider the distance function generated by the norm, defined as d ( x , y ) = x y for all x , y X . We also suppose that the functions { G a : a V } and { F a : a V } are in X. We can express the distance of Λ ( t ) and its approximation F ( t ) as:
d ( Λ , F ) = Λ F = 1 n i = 1 m a M i ( F a F M i ) = 1 n i = 1 m a M i ( F a G a + G a F M i ) .
We can think of F M i as the following: for all a nodes in M i , d a has its own F a Poisson binomial distribution. We can approximate the distribution of d a by the G a local approximation function. We would like to aggregate these { G a : a M i } local approximation functions, and the aggregated approximator is F M i . From the norm triangle inequality:
d ( Λ , F ) 1 n i = 1 m a M i F a G a + 1 n i = 1 m a M i G a F M i .
If . is the total variation norm, F M i , F a , G a are discrete probability distributions for all i { 1 , , m } and a V ; then, d ( . , . ) is the total variation distance. The 2 multiplicative factor comes from the connection between the total variation norm and the total variation distance given in (7).
d T V ( Λ , F ) 2 n i = 1 m a M i d T V ( F a , G a ) + 2 n i = 1 m a M i d T V ( G a , F M i ) .
On the other hand, if  . is the p-norm ( p 1 integer), then d ( . , . ) is the p-distance, and 
d p ( Λ , F ) 1 n i = 1 m a M i d p ( F a , G a ) + 1 n i = 1 m a M i d p ( G a , F M i ) .
We will use the notations: μ a = E [ d a ] and σ a 2 = V a r [ d a ] . For all M i node clusters, we denote μ M i and σ M i 2 as the common mean and variance used for the cluster M i . Furthermore, we suppose, that for all a M i and a i , A i , b i , B i 0 real numbers:
μ a [ μ M i a i , μ M i + A i ] ,
and
σ a [ σ M i b i , σ M i + B i ] .
We also suppose that μ M i a i > 0 and σ M i b i > 0 for all i = 1 , , m .

3.3.1. Approximation Using the Poisson Distribution

Consider the case when we use the Poisson distribution for the approximation of the degree distribution of an IRG. We discuss first how to compute the F M i cluster approximation functions using the Poisson distribution for all M i node clusters (how to implement the compute_cluster_CDF_approximation function in Algorithm 4). We specify F M i as the CDF of the Poisson distribution function with parameter μ M i , where μ M i is the average of the expected degrees in M i :
μ M i = Average μ a : μ a = b V { a } p a b , a M i .
We now derive an upper bound for the approximation error in the case when the F M i cluster approximation functions are defined by the Poisson distribution. For each a V node, the  G a local approximation function is specified as the CDF of the Poisson distribution with the parameter μ a . We apply Theorem 1 to give an upper bound on the TV distance between the F a actual degree distribution of node a and its local approximation function G a :
a M i d T V ( F a , G a ) a M i 1 e μ a 2 μ a ( μ a σ a 2 ) .
Since we are restricted to the cluster M i , we can use (38) and (39) to give an upper bound to the right-hand side of the inequality (41). For all a M i :
1 e μ a 2 μ a ( μ a σ a 2 ) 1 e ( μ M i + A i ) 2 ( μ M i a i ) ( μ M i + A i ) ( σ M i 2 b i 2 ) .
Therefore:
a M i d T V ( F a , G a ) | M i | 1 e μ M i A i 2 ( μ M i a i ) μ M i σ M i 2 + A i + b i 2 .
Computing an upper bound for the total variation distance between two Poisson distributions, we can use Theorem 2. Suppose that a M i and μ a μ M i . Then, from Theorem 2:
d T V ( G a , F M i ) m i n μ M i μ a , 2 e μ M i μ a .
Similarly, if  μ a > μ M i , then from Theorem 2:
d T V ( G a , F M i ) m i n μ a μ M i , 2 e μ a μ M i .
The right side of both (44) and (45) can be expressed as m i n μ M i μ a , 2 e μ M i μ a , for which the following upper bound can be given:
d T V ( G a , F M i ) m i n μ M i μ a , 2 e μ M i μ a l ( μ M i , a i , A i ) .
where:
l ( μ M i , a i , A i ) = m i n m a x a i , A i , 2 e m a x μ M i μ M i a i , μ M i + A i μ M i .
Finally, after substituting (43) and (46) into (36):
d T V ( Λ , F ) 1 n i = 1 m | M i | 1 e μ M i A i μ M i a i μ M i σ M i 2 + A i + b i 2 + 2 · l ( μ M i , a i , A i ) .
It is easy to see that if a i , A i , b i , B i 0 , then (48) goes to:
d T V ( Λ , F ) 1 n i = 1 m | M i | 1 e μ M i μ M i μ M i σ M i 2 .
The inequality in (49) also holds when the distribution of the node degrees within the same node cluster is the same.

3.3.2. Approximation Using the Binomial Distribution

We discuss now the calculation of F M i using the binomial distribution (the implementation of the compute_cluster_CDF_approximation function in Algorithm 4). In this case, we specify F M i as the CDF of the binomial distribution, with parameters n 1 and μ M i / ( n 1 ) , where n is the number of nodes and μ M i is given in (40). Similarly, for all a V , the  G a local approximation function is defined by the CDF of the binomial distribution with parameters n 1 , and  μ a / ( n 1 ) . From Theorem 3, we conclude that the upper bounds (48) and (49) derived for the Poisson approximation are applicable for the binomial approximation as well.

3.3.3. Approximation Using the Gaussian Distribution

We discuss the use of the Gaussian distribution to approximate the degree distribution of the IRG model. For each i { 1 , , m } , we define the F M i function (the compute_cluster_CDF_approximation in Algorithm 4) as the CDF of the Gaussian distribution with parameters μ M i and σ M i 2 , where μ M i is given in (40) and σ M i 2 is computed as:
σ M i 2 = Average σ a 2 : σ a 2 = a b p a b ( 1 p a b ) , a M i .
We now derive an upper bound for the approximation error in terms of 1-distance (p-distance with p = 1 ) in the case when the F M i cluster approximation functions are defined by the Gaussian distribution. For each a V , the  G a local approximation function is also defined by the Gaussian distribution: G a is the CDF of the Gaussian distribution with parameters μ a and σ a 2 . Let us apply now (37) for the Gaussian approximation. We denote the CDF of the standard normal distribution by Φ ( x ) . If  a M i , the 1-distance of F M i and G a can be expressed as:
d 1 ( F M i , G a ) = Φ x μ a σ a Φ x μ M i σ M i d x .
Introducing the notations l i ( x ) = x μ M i A i σ M i + B i and u i ( x ) = x μ M i + a i σ M i b i , it is easy to see that for all a M i : l i ( x ) x μ a σ a u i ( x ) and l i ( x ) x μ M i σ M i u i ( x ) . Therefore, since Φ ( x ) is increasing in x, for any a M i , we can apply the following upper bound:
Φ x μ a σ a Φ x μ M i σ M i Φ u i ( x ) Φ l i ( x ) .
We approximate the Φ u i ( x ) Φ l i ( x ) difference as:
Φ u i ( x ) Φ l i ( x ) = 1 2 π l i ( x ) u i ( x ) exp ( t 2 2 ) d t 1 2 π l i ( x ) u i ( x ) 1 1 + t 2 2 d t .
Substituting x = t 2 2 into the inequality e x 1 1 + x ( x 0 ) [32], we obtain e t 2 2 1 1 + t 2 2 , which proves (53). Since the antiderivative of 1 1 + t 2 2 is 2 arctan ( t / 2 ) + c :
Φ u i ( x ) Φ l i ( x ) 1 π arctan u i ( x ) 2 arctan l i ( x ) 2 .
Therefore:
d 1 ( F M i , G a ) 1 π arctan u i ( x ) 2 arctan l i ( x ) 2 .
After substituting the definition of u i ( x ) and l i ( x ) into the right side of (55) and using the identity arctan ( a x + b ) arctan ( c x + d ) d x = π b a d c if a , b > 0 (for derivation, see Appendix A):
1 π arctan x + a i μ M i 2 σ M i b i arctan x A i + μ M i 2 σ M i + B i d x = π a i + A i .
As a result, we have:
d 1 ( F M i , G a ) π a i + A i .
It is clear that d 1 ( F M i , G a ) goes to zero as a i , A i 0 . Furthermore, for any M i node cluster:
a M i d 1 ( G a , F M i ) | M i | π a i + A i .
Applying Theorem 4 with p = 1 :
a M i d 1 ( G a , F a ) a M i C σ a C a M i 1 σ M i b i | M i | C σ M i b i ,
where C is a universal constant from Theorem 4. After substituting (58) and (59) into (37):
d 1 ( Λ , F ) 1 n i = 1 m | M i | π a i + A i + C σ M i b i .
If a i , A i , b i 0 , then the right side of (60) goes to
d 1 ( Λ , F ) 1 n i = 1 m | M i | C σ M i .

4. Numerical Experiments

We demonstrate the developed methods with numerical experiments. In Section 4.1 we compute the degree distribution of the ER random graph using Algorithm 2. In Section 4.3 and Section 4.4, we experimentally test the precision of the approximation method. In Section 4.3, we observe how the approximation error changes as the network size changes. For this test, we use two IRG types: in the first type, the network has a block structure, and within one block, the degree distribution of the nodes is the same. In the second type, there is no such a structure; for any different a and b nodes, d a and d b follow different distributions. We group the a and b nodes to the same M cluster only if d a and d b have the same distribution; therefore, for the second IRG type, every group contains only a single node. In Section 4.4, we fix an IRG with the second type: for any different a and b nodes, d a and d b have different distributions. For this experiment, we create partitions C S ( V ) of the V nodes, where S denotes the common cluster size in the C S ( V ) partition. Therefore, for a fixed C S ( V ) partition and any M C S ( V ) node cluster, the distribution of d a and d b are different if a , b M and a b . We observe how approximation precision changes as the cluster size changes. We use the biased static edge voting model [14] to generate the test IRGs with appropriate structures. Hence, in Section 4.2, we briefly discuss the biased static edge voting model.

4.1. The ER Test

In the special case when all the p a b edge probabilities of an I R G n ( P ) are equal to the same p value, we obtain the ER random graph with the parameter p. We know that the degree distribution of the ER random graph model with a fixed n node number and p link probability is binomial [15] with parameters n 1 and p. Therefore, the degree distribution of the ER random graph with parameters n and p can be expressed as:
p n , p E R ( k ) = P ( d ( U n ) = k ) = n 1 k p k ( 1 p ) n 1 k for each k { 0 , , n 1 } ,
where U n is a uniform random variable on the V n = [ n ] node set. Let us denote the IRG model on the node set V n = [ n ] by I R G n ( p ) , where all the p a b edge probabilities are set to the same p probability. It is clear that E R n ( p ) and I R G n ( p ) denote the same random graph model; therefore, we expect that if we compute the degree distribution of I R G n ( p ) using Algorithm 2, we obtain exactly the degree distribution of the E R n ( p ) model given in (62). We experimentally tested this statement setting the n network size to be n = 1000 and computed the degree distribution of I R G n ( p ) using Algorithm 2 for each p { 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 } . Let us denote the degree distribution of I R G n ( p ) computed by Algorithm 2 with { p n , p I R G ( k ) : k { 0 , , n 1 } } , and calculate the total variation distance between the degree distributions { p n , p E R ( k ) : k { 0 , , n 1 } } and { p n , p I R G ( k ) : k { 0 , , n 1 } } :
T V n , p = d T V p n , p E R , p n , p I R G .
The magnitude of the T V n , p total variation distance values for each p = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 parameter value was 10 13 , which means that we can consider the distributions p n , p E R and p n , p I R G to be the same.

4.2. The Biased Static Edge Voting Model

We used the biased static edge voting model [14] to generate the appropriate IRG parameterizations for the experiments in Section 4.3 and Section 4.4. The model with N nodes is defined by the parameter set { D 1 , , D N } and a single positive real value η . For any a V , the parameter D a controls the local behaviour of node a, while η is a model-level control parameter. We can group the nodes based on their D a parameter values: nodes a and b belong to the same S group if and only if D a = D b . This naturally defines a partition of the nodes. We suppose that the D a parameters are in the set { 0 , 1 , , N 1 }, and we index a cluster with the common parameter of the nodes within the cluster: S i = { a : D a = i , a V } . Denote as V a b a random variable that represents the vote of the a node for the { a , b } edge candidate. We assume that for all a S i and b S j , V a b follows the same probability distribution. For any pair of different nodes a and b, the probability that there will be an edge between the nodes a and b depends on the incoming votes, and it is given by s ( V a b , V b a ) , where s is the edge probability function. The biased edge voting model specifies this definition in the following way: for any a S i and b a nodes, the  V a b random variable is Bernoulli distributed with a parameter of i N 1 , i.e.,  V a b B e r n o u l l i ( i N 1 ) . The edge probability function is given as:
s ( V a b , V b a ) = 1 e η v ( V a b , V b a ) ,
where η > 0 is the control parameter of the model, and 
v ( V a b , V b a ) = D a N 1 V a b + D b N 1 V b a .
The p a b probability that the { a , b } link is drawn is given by the formula [14]:
p a b = p a * + p b * p a * p b * ,
where:
p z * = D z N 1 1 e η D z N 1 for all z V .
The model is defined by the parameters η and D 1 , , D N , or equivalently by the η number and the partition V = [ N ] = S 0 S N 1 of the nodes, where for each a S i , D a = i (in some cases, S i can be empty). Given these parameters, links are drawn independently, and the probability that the edge between nodes a and b is drawn is given by (66). We use this model to generate different IRG parametrizations by applying Equation (66). It is clear that different parametrizations of the voting model lead to different I R G models. As we have seen, parametrization means to fix the value of the control parameter η and the sequence D 1 , , D N . To generate the D 1 , , D N sequence, we used two methods: range and lognormal. Range is defined as the first N non-negative integer: R a n g e ( N ) = ( 0 , 1 , , N 1 ) . We denote the lognormal sequence generator by L o g n o r m a l S e q ( μ , σ , N ) , where N is the length of the sequence, and  μ and σ are the parameters of the lognormal distribution. A positive random variable X is log-normally distributed with parameters μ and σ if l n ( X ) is normally distributed with mean μ and standard deviation σ . For the lognormal sequence generator algorithm, we suppose that the rate of nodes with parameter k is approximately r k = ( F ( k + 0.5 ) F ( k 0.5 ) ) / F ( N 0.5 ) , where F ( x ) is the cumulative distribution function of the lognormal distribution with parameters μ and σ . Therefore, the number of nodes with parameter k is approximately N r k . The algorithm is given in Algorithm 5. Its input parameter c d f can be any cumulative distribution function. We plotted the empirical density function of the sequence L o g n o r m a l S e q ( 5 , 0.6 , 3000 ) in Figure 1. The  η parameter controls the global behaviour of the model. In the rest of this paper, we fix the value of η to 2.0.
Algorithm 5 parameter_sequence_generator(node_nr, cdf)
  parameter_sequence = empty list
  not_finished_nodes = node_nr
  max_param = node_nr − 1
  normalizer = cdf(max_param + 0.5)
  for param = 0 to max_param do
       m = param − 0.5
       M = param + 0.5
       p = (cdf(M) − cdf(m))/normalizer
       nr = min(round( p · n o d e _ n r ), not_finished_nodes)
       Add param to the degree_parameter list nr times
       not_finished_nodes = not_finished_nodes − nr
       if not_finished_nodes ≤ 0 then
            Break
       end if
  end for
  return parameter_sequence

4.3. The Effect of Network Size on Approximation Accuracy

In this test, we experimentally observe the effect of network size on approximation accuracy. For a sequence of networks with increasing network size, we compare the degree distributions returned by the approximation method (Algorithm 4) to the exact degree distributions computed by Algorithm 2. We use the total variation distance for comparison. When the approximator uses the Poisson or the binomial distributions, we can directly use the total variation distance; however, when the Gaussian distribution is used for approximation, we use the following discretization to obtain a discrete probability distribution: if X is a continuous random variable with F X cdf, then its discretized version X ¯ has a pdf:
P [ X ¯ = k ] = F X ( k + 0.5 ) F X ( k 0.5 ) for all integers k .
To create the test IRG parametrizations, we used the biased static edge voting model using lognormal and range parametrization methods. For both cases, the network sizes are 50, 100, 300, 500, 1000, 1500, 2000, 2500, and 3000.
We denote the biased static edge voting model with parametrization R a n g e ( n ) and η = 2 by E V n R , and the IRG generated from E V n R using (66) by I R G n R ( P n R ) . It is clear that in I R G n R ( P n R ) , for all different nodes a and b, L ( a ) L ( b ) . Similarly, we denote the static edge voting model with parametrization L o g n o r m a l S e q ( n , μ , σ ) by E V n L N , and the IRG generated from E V n L N using (66) by I R G n L N ( P n L N ) . The values of μ and σ for the L o g n o r m a l S e q parametrization for each n can be found in Table 1. Because of the construction, I R G n L N ( P n L N ) has a block structure. It contains clusters, and within a cluster, the nodes have the same degree distribution. In Table 1, we collected basic statistics about the clusters for the used L o g n o r m a l S e q parametrizations.
Let us denote the exact degree distribution of I R G n T computed with Algorithm 2 by { p n , T ( k ) : k { 0 , , n 1 } } , where T can be R or L N . Similarly, we denote the approximated degree distribution of I R G n T computed with Algorithm 4 by { p n , T D ( k ) : k { 0 , , n 1 } } , where T can be R or L N and G stands for the used approximation distribution: P (Poisson), B (binomial), or G (Gaussian). For example, we denote the approximated degree distribution of I R G n R using the Poisson distribution by { p n , R P } . We calculated the total variation distance between the approximated and the exact degree distribution:
T V n , T D = d T V { p n , T } , { p n , T D } .
The calculated T V n , T D values are collected in Table 2 and Table 3. We also plotted the total variation distances in the function of the network size in Figure 2 and Figure 3. We can observe that in the case of the R a n g e parametrization, the approximation error monotonically decreases with the network size, and we achieve the best approximation using the Gaussian approximation. At the L o g n o r m a l S e q parametrization, we can observe an initial fluctuation in the approximation error, and after this, there is a monotone decreasing trend in the approximation error in function of the network size. In this case, we obtain the smallest approximation error when we use the binomial distribution.

4.4. The Effect of Cluster Size on Approximation Accuracy

In this experiment, we test how the accuracy of the approximation method depends on the cluster size. I R G ( P ) denotes the IRG model generated from the biased edge voting model with parametrization R a n g e ( 3000 ) using (66). This IRG is “very inhomogeneous” in the sense that all edge probabilities are different; therefore, the degree distribution of each node is different. We compute the exact degree distribution of I R G ( P ) using Algorithm 2 and denote the result by { p ( k ) : k { 0 , 1 , , 2999 } . We plotted { p ( k ) } in Figure 4.
We denote the number of nodes by n ( n = 3000 in the current setting) and identify the nodes by their parameter in the static edge voting model used to generate the I R G ( P ) . This means that for any a V node, the D a parameter of the node in the generator biased edge voting model was a. Let us fix a cluster size S > 0 and suppose that n is divisible by S. We define a C S ( V ) partition of V = [ n ] as:
C S ( V ) = { { 0 , , S 1 } , { S , , 2 S 1 } , , { n S , , n 1 } } .
The degree distributions of all nodes within a cluster of partition C S ( V ) are different. We tested the approximation method described in Section 3.3, where the node clusters are given by C S ( V ) , and the S cluster size is in S = {1500, 1000, 750, 600, 500, 300, 200, 100, 75, 60, 50, 30, 20, 10, 5, and 1}. For example, if the S cluster size is 1000, then we have 3 node clusters: C 1000 ( V ) = { M 1 = { 0 , , 999 } , M 2 = { 1000 , , 1999 } , M 3 = { 2000 , , 2999 } } . We computed the approximated degree distribution of I R G ( P ) using Algorithm 4 and denoted the result distributions by { p S D ( k ) : k { 0 , 1 , , 2999 } } , where S denotes the cluster size and D represents the type of distribution used in the approximation, which can be P (Poisson), B (binomial), or G (Gaussian). For all S cluster sizes from S , we calculated the total variation distance between the exact degree distribution (plotted in Figure 4) and the approximated degree distributions:
T V S D = d T V { p ( k ) } , { p S D ( k ) } .
The results are collected in Table 4 and plotted in Figure 5. We can observe that the approximation error decreases as the cluster shrinks (or the number of clusters increases). If the cluster size is huge, then the Poisson approximation gives the smallest approximation error, while if the clusters are small, the Gaussian approximation gives the best results, although in the case of small clusters, the difference between the approximators is small.

5. Discussion

Inhomogeneous random graph is a random graph model where the links of the graph are drawn independently and the link probabilities can be different. It can be seen as the generalization of the Erdős–Rényi random graph [15], where the edges are drawn independently, but the probability that any different two nodes are linked is a fixed p value. The degree distribution of a deterministic or random graph with n nodes is defined as the probability that the degree of a uniformly chosen node equals to k for all k = 0 , , n 1 . The degree distribution has a central role in network science, not only because it is needed to compute several other network properties [1], but also because the shape of the degree distribution mostly determines the outcome of many important network processes, such as the spread of viruses [5], diffusion of innovations [6,7], or attacks against critical infrastructure [2,3,4]. The degree distribution of many real-world networks have fat tails; therefore, the ER random graph model is not suitable to model real world networks, because its degree distribution is binomial [15]. Therefore, many alternative random graph models have been proposed to be able to model real world networks, such as the stochastic block model [8], generalized random graphs [12], Chung–Lu random graphs [16], the Norros–Reittu model [17], and the static edge voting model [14] (Section 4.2), which we used to generate the appropriate IRG parametrisations to test our algorithms in Section 4. Using different parametrizations of the IRG model, one can achieve random network models with very different degree distributions. IRG is interesting not only as the generalization of the ER random graph but also as a tool to analyse other random graph models, such as the stochastic block model, generalized random graphs, or the static edge voting model.
In this paper, we focused on the calculation of the degree distribution of the IRG model. In Section 3.2, we discussed an algorithm to compute the exact degree distribution utilizing the DFT-CF method [19] developed by Yili Hong. The proposed algorithm (Algorithm 2) is highly parallelizable since the sub-step given in Algorithm 3 can be called independently. Furthermore, if the IRG model has a block structure, Algorithm 2 can take advantage of it. In Section 3.3, we presented a method to approximate the degree distribution of the IRG model. There are several reasons why one would apply approximation even if an exact computational method is available. One reason is that approximation is computationally cheaper than the exact method. A second reason is that the approximation method may also be used in the case when the IRG is not fully defined. At the beginning of Section 3.3, we discussed the general scheme of the proposed approximation method, which is presented in Algorithm 4. The idea of the approximation method is simple: we group the nodes of the network according to their statistical behaviour. For each node group, we approximate the common behaviour of the nodes within the group, and finally, we aggregate these group approximations. As a result, we obtain a mixture model, which we use as an approximation of the exact degree distribution. Similarly to the exact algorithm, it can be implemented effectively in a multi-thread environment. In Algorithm 4, we did not specify how to approximate the common behaviour of a node group, because it can be done in many ways, but in the subsequent subsections, we analysed three possible ways: using the Poisson distribution (Section 3.3.1), the binomial distribution (Section 3.3.2), and the Gaussian distribution (Section 3.3.3). Furthermore, we derived an upper bound for the approximation error for all three cases: Equation (48) for the Poisson and the binomial approximations, and Equation (60) for the Gaussian approximation.
Determining which distribution to use for optimal results is a natural question. Unfortunately, we do not have a clear answer to this. During the numerical experiments in Section 4, we found that the structure of the IRG and the granularity of the clustering influences which distribution will lead to the most accurate approximation. In Section 4.3, we tested the approximation method on IRG models having a block structure (see Figure 2 and Table 2), where within each block, the nodes obey the same degree distribution. In this case, we could observe that using the binomial distribution gave the most accurate results. However, in the case where the degree distribution of each node was different and we did not apply grouping, using the Gaussian distribution gave the best results (see Figure 3 and Table 3). In Section 4.4, we tested the effect of the clustering granularity on the approximation precision. We fixed an IRG model where the degree distribution of each node is different and applied the approximation with clustering, where the cluster size was different for each test case. We found that for larger cluster sizes, the usage of the Poisson distribution gave the most accurate estimate, and for smaller cluster sizes, the Gauss distribution gave the best results (see Figure 5 and Table 4).
The approximation method can be extended or improved in several ways. In this study, we analysed the usage of Poisson, binomial and Gaussian distributions. However, there are other distributions that we could use in a similar way. One obvious possibility is using the PB distribution itself. Another candidate for this is the translated Poisson distribution [20,33] or the Pólya approximation of the PB distribution [34]. Another direction can be the optimal selection of the group approximation method. We have seen in Section 4.3 and Section 4.4 and in the previous paragraph that the structure of the IRG and the granularity of the clustering influences which distribution will lead to the most accurate approximation. It is an open question if we can implement a selection method to find the optimal approximator distribution.

Author Contributions

Conceptualization, R.P.; methodology, R.P.; software, R.P.; validation, R.P.; investigation, R.P.; writing—original draft preparation, R.P.; writing—review and editing, L.K. and R.P.; supervision, L.K.; project administration, L.K.; funding acquisition, L.K. All authors have read and agreed to the published version of the manuscript.

Funding

Project no. 2019-1.3.1-KK-2019-00007 has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the 2019-1.3.1-KK funding scheme. This project has been supported by the Hungarian National Research, Development and Innovation Fund of Hungary, financed under the TKP2021-NKTA-36 funding scheme.

Data Availability Statement

The source code of the numerical experiments is available at https://github.com/rpethes/IRG (accessed on 11 February 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

We calculate arctan ( a x + b ) arctan ( c x + d ) d x if a , b > 0 . First, derive the antiderivative of arctan ( a x + b ) arctan ( c x + d ) . From linearity of integral:
Q ( x ) = arctan ( a x + b ) arctan ( c x + d ) d x = arctan ( a x + b ) d x arctan ( c x + d ) d x .
Let’s continue with the indefinite integral arctan ( a x + b ) d x . Applying the substitution u = a x + b :
arctan ( a x + b ) d x = 1 a arctan ( u ) d u .
The antiderivative of arctan ( u ) is u arctan ( u ) ln ( u 2 + 1 ) 2 + C [35], therefore:
R ( x ; a , b ) : = arctan ( a x + b ) d x = a x + b arctan ( a x + b ) a ln ( a x + b 2 + 1 ) 2 a + C .
Therefore:
Q ( x ) = R ( x ; a , b ) R ( x ; c , d ) .
Since lim x arctan ( x ) = π 2 and lim x arctan ( x ) = π 2 :
arctan ( a x + b ) arctan ( c x + d ) = l i m x Q ( x ) Q ( x ) = π b a d c .

References

  1. Barabási, A.-L. Network Science; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
  2. Albert, R.; Jeong, H.; Barabási, A.-L. Attack and error tolerance of complex networks. Nature 2000, 406, 378. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Cohen, R.; Erez, K.; ben-Avraham, D.; Havlin, S. Resilience of the Internet to random breakdowns. Phys. Rev. Lett. 2000, 85, 4626. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Cohen, R.; Erez, K.; ben-Avraham, D.; Havlin, S. Breakdown of the Internet under intentional attack. Phys. Rev. Lett. 2001, 86, 3682. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Pastor-Satorras, R.; Vespignani, A. Epidemic spreading in scalefree networks. Phys. Rev. Lett. 2001, 86, 3200–3203. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Valente, T.W. Network Models of the Diffusion of Innovations; Hampton Press: Cresskill, NJ, USA, 1995. [Google Scholar]
  7. Rogers, E.M. Diffusion of Innovations; Simon and Schuster: New York, NY, USA, 2010. [Google Scholar]
  8. Holl, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar]
  9. Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. J. Phys. Rev. E 2011, 83, 016107. [Google Scholar] [CrossRef] [Green Version]
  10. Barabási, A.; Oltvai, Z.N. Network biology: Understanding the cell’s functional organization. Nat. Rev. Genet. 2004, 5, 101–113. [Google Scholar] [CrossRef]
  11. Buhl, J.; Gautrais, J.; Reeves, N.; Solé, R.V.; Valverde, S.; Kuntz, P.; Theraulaz, G. Topological patterns in street networks of self-organized urban settlements. Eur. Phys. J.-Condens. Matter Complex Syst. 2006, 49, 513–522. [Google Scholar] [CrossRef]
  12. Tom, B.; Deijfen, M.; Martin-Löf, A. Generating simple random graphs with prescribed degree distribution. J. Stat. Phys. 2006, 124.6, 1377–1397. [Google Scholar]
  13. Van Der Hofstad, R. Random Graphs and Complex Networks; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
  14. Róbert, P.; Kovács, L. Voting to the link: A static network formation model. Acta Polytech. Hung. 2020, 17, 207–228. [Google Scholar]
  15. Erdős, Paul and Alfréd Rényi On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 1960, 5, 17–60.
  16. Chung, F.; Chung, F.R.; Graham, F.C.; Lu, L. Complex Graphs and Networks; No. 107; American Mathematical Soc.: Providence, RI, USA, 2006. [Google Scholar]
  17. Ilkka, N.; Reittu, H. On a conditionally Poissonian graph process. Adv. Appl. Probab. 2006, 38, 59–75. [Google Scholar]
  18. Barabási, A.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Hong, Y. On computing the distribution function for the Poisson binomial distribution. Comput. Stat. Data Anal. 2013, 59, 41–51. [Google Scholar] [CrossRef]
  20. Tang, W.; Tang, F. The Poisson binomial distribution—Old & New. Stat. Sci. 2022, 1, 1–12. [Google Scholar]
  21. Cooley, J.W.; Tukey, J.W. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965, 19, 297–301. [Google Scholar] [CrossRef]
  22. Saks, S. Theory of the Integral; Warszawa–Lwów: G.E. Stechert & Co.: New York, NY, USA, 1937. [Google Scholar]
  23. Total Variation. Available online: https://handwiki.org/wiki/Total_variation (accessed on 21 January 2023).
  24. Rudin, W. Functional Analysis, 2nd ed.; McGraw-Hill: New York, NY, USA, 1991. [Google Scholar]
  25. Barbour, A.D.; Hall, P. On the rate of Poisson convergence. In Mathematical Proceedings of the Cambridge Philosophical Society; Cambridge University Press: Cambridge, UK, 1984; Volume 95. [Google Scholar]
  26. Janson, S. Coupling and Poisson approximation. Acta Appl. Math. 1994, 34, 7–15. [Google Scholar] [CrossRef]
  27. Adell, J.A.; Jodrá, P. Exact Kolmogorov and total variation distances between some familiar discrete distributions. J. Inequalities Appl. 2006, 2006, 1–8. [Google Scholar] [CrossRef] [Green Version]
  28. Ehm, W. Binomial approximation to the Poisson binomial distribution. Stat. Probab. Lett. 1991, 11, 7–16. [Google Scholar] [CrossRef]
  29. Choi, K.P.; Xia, A. Approximating the number of successes in independent trials: Binomial versus Poisson. Ann. Appl. Probab. 2002, 12, 1139–1148. [Google Scholar] [CrossRef]
  30. Billingsley, P. Probability and Measure; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  31. Petrov, V.V. Sums of Independent Random Variables; De Gruyter: Berlin, Germany, 2022. [Google Scholar]
  32. Spivak, M. Calculus, 4th ed.; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
  33. Röllin, A. Translated Poisson approximation using exchangeable pair couplings. Ann. Appl. Probab. 2007, 17, 1596–1614. [Google Scholar] [CrossRef] [Green Version]
  34. Skipper, M. A Pólya approximation to the Poisson-binomial law. J. Appl. Probab. 2012, 49, 745–757. [Google Scholar] [CrossRef] [Green Version]
  35. Inverse Trigonometric Functions. Available online: https://en.wikipedia.org/wiki/Inverse_trigonometric_functions (accessed on 21 January 2023).
Figure 1. The empirical density function of the sequence L o g n o r m a l S e q ( 5 , 0.6 , 3000 ) .
Figure 1. The empirical density function of the sequence L o g n o r m a l S e q ( 5 , 0.6 , 3000 ) .
Mathematics 11 01441 g001
Figure 2. Total variational distance between the approximated and the exact degree distribution in the function of network size when the IRGs are generated by L o g n o r m a l S e q parametrization of the biased static edge voting model.
Figure 2. Total variational distance between the approximated and the exact degree distribution in the function of network size when the IRGs are generated by L o g n o r m a l S e q parametrization of the biased static edge voting model.
Mathematics 11 01441 g002
Figure 3. Total variational distance between the approximated and the exact degree distribution in the function of network size when the IRGs are generated by R a n g e parametrization of the biased static edge voting model.
Figure 3. Total variational distance between the approximated and the exact degree distribution in the function of network size when the IRGs are generated by R a n g e parametrization of the biased static edge voting model.
Mathematics 11 01441 g003
Figure 4. The exact degree distribution of the IRG generated by the biased static edge voting model with R a n g e ( 3000 ) parametrization.
Figure 4. The exact degree distribution of the IRG generated by the biased static edge voting model with R a n g e ( 3000 ) parametrization.
Mathematics 11 01441 g004
Figure 5. Total variational distance between the approximated and the exact degree distribution in the function of cluster numbers, where the IRG is generated using the R a n g e ( 3000 ) parametrization and the clusters are given by C S ( V ) .
Figure 5. Total variational distance between the approximated and the exact degree distribution in the function of cluster numbers, where the IRG is generated using the R a n g e ( 3000 ) parametrization and the clusters are given by C S ( V ) .
Mathematics 11 01441 g005
Table 1. Node cluster size statistics of different L o g n o r m a l S e q parametrizations. Size: number of nodes. Parametrization: Parameters of the lognormal sequence parametrization methods. Nr of clusters: number of node clusters. Mean: mean size of node clusters. Sd: standard deviation of node cluster size. Min; max: minimum and maximum node cluster size.
Table 1. Node cluster size statistics of different L o g n o r m a l S e q parametrizations. Size: number of nodes. Parametrization: Parameters of the lognormal sequence parametrization methods. Nr of clusters: number of node clusters. Mean: mean size of node clusters. Sd: standard deviation of node cluster size. Min; max: minimum and maximum node cluster size.
SizeParametrizationNr of ClustersMeanSdMin; Max
50logN(1.5, 0.6, 50)124.22.81; 9
100logN(2, 0.6, 100)214.63.41; 11
300logN(2.3, 0.6, 300)358.57.81; 24
500logN(2.7, 0.6, 500)559.18.81; 27
800logN(3, 0.6, 800)7610.410.31; 32
1000logN(4, 0.6, 1000)1725.74.71; 15
1500logN(4.1, 0.6, 1500)2067.26.51; 20
2000logN(4.3, 0.6, 2000)2577.77.01; 22
2500logN(4.5, 0.6, 2500)3157.87.21; 22
3000logN(5, 0.6, 3000)4826.15.21; 16
Table 2. Total variational distance between the approximated and the exact degree distributions when the IRG is created using L o g n o r m a l S e q parametrization.
Table 2. Total variational distance between the approximated and the exact degree distributions when the IRG is created using L o g n o r m a l S e q parametrization.
ParametrizationGauss ( TV n , LN G ))Poisson ( TV n , LN P )Binomial ( TV n , LN B )
logN(1.5, 0.6, 50)0.0580320.0095100.002926
logN(2, 0.6, 100)0.0396680.0067080.002313
logN(2.3, 0.6, 300)0.0621060.0017090.000731
logN(2.7, 0.6, 500)0.0451690.0012900.000548
logN(3, 0.6, 800)0.0414330.0009540.000425
logN(4, 0.6, 1000)0.0112960.0020920.000905
logN(4.1, 0.6, 1500)0.0137230.0013170.000617
logN(4.3, 0.6, 2000)0.0123290.0010790.000518
logN(4.5, 0.6, 2500)0.0104310.0009770.000472
logN(5, 0.6, 3000)0.0047130.0012220.000565
Table 3. Total variational distance between the approximated and the exact degree distributions when the IRG is created using R a n g e parametrization.
Table 3. Total variational distance between the approximated and the exact degree distributions when the IRG is created using R a n g e parametrization.
ParametrizationGauss( TV n , R G )Poisson ( TV n , R P )Binomial ( TV n , R B )
range(50)0.0029100.0748890.021244
range(100)0.0014720.0630560.016119
range(300)0.0004870.0448690.011275
range(500)0.0003010.0376590.009719
range(800)0.0001950.0318710.008521
range(1000)0.0001590.0294270.008014
range(1500)0.0001110.0254940.007185
range(2000)0.0000860.0230550.006656
range(2500)0.0000700.0213460.006274
range(3000)0.0000600.0200530.005980
Table 4. Total variational distance between the approximated and the exact degree distribution when the IRG is created using R a n g e ( 3000 ) parametrization and the clusters are given by C S ( V ) .
Table 4. Total variational distance between the approximated and the exact degree distribution when the IRG is created using R a n g e ( 3000 ) parametrization and the clusters are given by C S ( V ) .
Nr of ClustersCluster Size (S)Gaussian ( { p S G ( k ) } )Poisson ( { p S P ( k ) } )Binomial ( { p S B ( k ) } )
215000.8555960.7684070.840941
310000.7828090.6587500.762074
47500.7109520.5559950.684635
56000.6460580.4683300.616409
65000.5908100.3956410.558709
103000.4220190.2086540.397337
152000.2741590.0714880.251296
301000.0544900.0193090.052811
40750.0159590.0196470.019708
50600.0048140.0197970.010066
60500.0017010.0198770.007355
100300.0003290.0199900.006188
150200.0001580.0200250.006037
300100.0000720.0200460.005958
60050.0000610.0200510.005974
300010.0000600.0200530.005980
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pethes, R.; Kovács, L. An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution. Mathematics 2023, 11, 1441. https://doi.org/10.3390/math11061441

AMA Style

Pethes R, Kovács L. An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution. Mathematics. 2023; 11(6):1441. https://doi.org/10.3390/math11061441

Chicago/Turabian Style

Pethes, Róbert, and Levente Kovács. 2023. "An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution" Mathematics 11, no. 6: 1441. https://doi.org/10.3390/math11061441

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop