An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution

Pethes, Róbert; Kovács, Levente

doi:10.3390/math11061441

Open AccessArticle

An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution

by

Róbert Pethes

^*

and

Levente Kovács

Physiological Controls Research Center, Óbuda University, 1034 Budapest, Hungary

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(6), 1441; https://doi.org/10.3390/math11061441

Submission received: 12 February 2023 / Revised: 9 March 2023 / Accepted: 12 March 2023 / Published: 16 March 2023

Download

Browse Figures

Versions Notes

Abstract

:

Inhomogeneous random graphs are commonly used models for complex networks where nodes have varying degrees of connectivity. Computing the degree distribution of such networks is a fundamental problem and has important applications in various fields. We define the inhomogeneous random graph as a random graph model where the edges are drawn independently and the probability of a link between any two vertices can be different for each node pair. In this paper, we present an exact and an approximation method to compute the degree distribution of inhomogeneous random graphs using the Poisson binomial distribution. The exact algorithm utilizes the DFT-CF method to compute the distribution of a Poisson binomial random variable. The approximation method uses the Poisson, binomial, and Gaussian distributions to approximate the Poisson binomial distribution.

Keywords:

inhomogeneous random graph; Poisson binomial distribution; degree distribution; DFT-CF method

MSC:

05C80

1. Introduction

Random graphs are widely used to model complex systems such as social networks, biological networks, and the internet. The degree distribution is an important characteristic of a network, as it provides information about the connectivity of nodes in the network [1], and its shape determines many network phenomena, such as robustness [2,3,4] or spreading processes [5,6,7]. Inhomogeneous random graphs are a type of random graph where the nodes are not equally likely to be connected. Instead, the probability of two nodes being connected depends on their attributes or characteristics. Example applications of inhomogeneous random graphs are social network analysis [8,9], modelling biological networks [10], or modelling transportation networks [11].

In the literature of network science and random graphs, inhomogeneous random graphs are not a well-defined random graph model, but they are a family of random graph models, where the nodes have varying degrees of connectivity. One example for inhomogeneous random graphs is the stochastic block model [8,9]. A stochastic block model (SBM) is defined by a

V = C_{1} \cup C_{2} \cup \dots \cup C_{r}

partition of the vertex set and a

r \times r

symmetric P edge probability matrix. For any two vertices

u \in C_{i}

and

v \in C_{j}

, the draw probability of the

{u, v}

undirected edge is

P_{i j}

. Therefore, for any

{u, v}

vertex pair, the

p_{u v}

probability that the nodes u and v are connected is directly determined by the model parameters (the P matrix), and the links are drawn independently if the P matrix is given. A second example is the generalized random graph [12,13]. In case of a generalized random graph (GRG), the inhomogeneity is introduced into the model using vertex weights. For any i node of the network, there is given a

w_{i} > 0

vertex weight, and the probability that a edge is drawn between the nodes i and j is equal to

p_{i j} = \frac{w_{i} w_{j}}{S + w_{i} w_{j}},

(1)

where

S = w_{1} + \dots + w_{n}

is the total weight of all vertices. The consequence of this definition is that vertices with high weight are more likely to have many neighbours than vertices with small weights, and vertices with extremely high weights could act as hubs observed in many real-world networks. Furthermore, if the

w_{1}, \dots, w_{n}

parameters are deterministic and given, the edge probabilities can be computed with (1), and the edges are independent. A third example for inhomogeneous random graphs is the biased static edge voting model [14]. We use this model in our numerical tests in Section 4, and it is briefly described in Section 4.2. Similarly to the SBM and GRG models, if the model parameters are given, the

p_{i j}

link probability of any

{i, j}

node pair can be directly computed (see Equation (66)), and the edges are drawn independently.

For all these example inhomogeneous random graph models, the common property is that the edge probabilities can be directly computed from the model definition, and the edges are drawn independently. We define the inhomogeneous random graph (IRG) model via these properties (as it is defined in [13] (Section 6.7)). The inhomogeneous random graph is a random graph model on the vertex set

V = {1, \dots, n}

, where the

p_{a b}

draw probability of any

{a, b}

edge is given, and the edges are drawn independently. The IRG model can be considered as the natural generalization of the Erdős–Rényi (ER) random graph [15], where each link of the graph is drawn independently with a fixed p probability. If we set all the edge probabilities of the IRG model to a fixed p value, then we obtain an ER random graph. It is well known that the degree distribution of the ER model is close to the Poisson distribution [15]. When we observe the degree sequence of real-world networks, we often see that their empirical degree distribution has a fat tail [13]. Therefore, the ER random graph cannot be used to model real-world networks.

If the parameters are deterministic and given, the SBM, GRG, and the static edge voting model can be represented by an IRG. A further example for such a model is the Chung–Lu random graph [16]. However, not all inhomogeneous random graph models can be expressed as IRG. For example, the Norros–Reittu model [17] is a random multigraph model, while IRG is a model of a simple graph. A second example is the GRG model with random weights. Using random weights in GRG breaks the independence of the edges. A third example is the Barabási–Albert (BA) model [18]. The BA model is a dynamic network growth model, and for this case, we cannot derive the edge probabilities.

In this paper, we discuss a novel algorithm what allow us to compute the exact degree distribution of the IRG model and an approximation method to estimate the IRG degree distribution. The hardness of computing the degree distribution of the IRG model comes from the fact that each edge candidate of the network may have different draw probabilities; therefore, the degree distribution of any node is Poisson binomial (PB) [19,20]. The algorithm that we have developed to compute the degree distribution of the IRG model is based on the DFT-CF method invented by Yili Hong [19], and the approximation method uses the Poisson, binomial, and the Gaussian distributions to approximate the PB distribution. The proposed algorithms can be used to compute or approximate the degree distribution of any random graph model that can be represented by an IRG.

The structure of the remaining part of this paper is as follows: Section 2 contains the mathematical preliminaries of our study. In Section 2.1, we introduce the necessary notations and definitions. In Section 2.2, we briefly discuss the DFT-CF algorithm for computing the Poisson binomial distribution. Section 2.3 contains selected results about the approximation of the Poisson binomial distribution. In Section 3, we discuss the proposed algorithms to compute or approximate the degree distribution of the IRG model. In Section 3.1, we formally define the problem that we aim to solve. In Section 3.2, we present an exact algorithm to compute the degree distribution of the inhomogeneous random graph, and in Section 3.3, we discuss an approximation method to estimate this distribution. In the first part of Section 3.3, we outline the general scheme of the approximation method; then, we provide an upper bound of the approximation error for the special cases, when the approximator distribution is Poisson (Section 3.3.1), binomial (Section 3.3.2), and Gaussian (Section 3.3.3). The results of the numerical experiments are provided in Section 4. The study is concluded with the discussion in Section 5.

The contribution of the authors are a novel algorithm to compute the exact degree distribution of the IRG model utilizing the DFT-CF method (Section 3.2) and the analysis of the estimation method for this distribution (Section 3.3). The idea of the approximation scheme is simple and not new: we group the similar nodes into clusters and apply the same approximator distribution within a cluster. Our contribution here is the analysis of the approximation error in the specific cases when the approximator distribution is Poisson (Section 3.3.1), binomial (Section 3.3.2), and Gaussian (Section 3.3.3).

2. Preliminaries

2.1. Notations and Definitions

We denote the set

{1, \dots, n}

as

[n]

. A simple graph G is defined as a pair

(V (G), E (G))

, where

V (G)

is the set of vertices and

E (G)

is the set of edges. The vertices are labeled with integers, so the vertex set of an n-vertex graph is

V = [n]

. The degree of a vertex a is defined as the number of neighbors that a has in G. We denote the degree of a as

d (a)

or simply

d_{a}

. The degree distribution of a deterministic or random graph G is defined as the distribution of

d (U)

, where U is a randomly and uniformly chosen vertex. Even in the case of a deterministic graph,

d (U)

is a random variable. In the deterministic case, we can express

P (d (U) = k)

as

|{a \in V ∣ d_{a} = k}| / n

. If G is a random graph, then

d_{a}

is a random variable, and we refer to the distribution of

d_{a}

as the degree distribution of vertex a.

The Poisson binomial random variable N is defined as the sum of n independent random indicators:

N = \sum_{i = 1}^{n} I_{i}

, where

I_{i} \sim Bernoulli (p_{i})

,

i = 1, \dots, n

. Note that N takes value in

{0, 1, \dots, n}

. We say that the

p_{1}, \dots, p_{n}

values are the parameters of the distribution, and we use the notation

N \sim P B (p_{1}, \dots, p_{n})

. When all

p_{i}

s are identical, the distribution of N is binomial. Let

ξ_{k} = P (N = k)

,

k = 0, 1, \dots, n

be the probability mass function (pmf) for the Poisson binomial random variable N. The pmf of N can be expressed as:

ξ_{k} = \sum_{A \in H_{k}} [\prod_{j \in A} p_{j} \prod_{j \in A^{c}} (1 - p_{j})],

(2)

where

H_{k}

is the set of all subsets of k integers that can be selected from

[n]

, and

A^{c}

is the complementary set of A in

[n]

. The direct use of this formula is computationally very expensive.

We introduce now the inhomogeneous random graph (IRG) model [13] (Section 6.7), denoted by

I R G_{n} (P)

, where n is the number of vertices and

P = {p_{i j}}

is a set of edge probabilities. In this model, edges are drawn independently, and the probability of drawing the edge

{i, j}

is given by

p_{i j}

for all

1 \leq i < j \leq n

. We formalize this as follows:

Definition 1.

The inhomogeneous random graph model

I R G_{n} (P)

is defined as a random graph with vertex set

[n]

and edge probabilities

P = {p_{i j}}

, where each edge

{i, j}

for all

1 \leq i < j \leq n

is drawn independently with probability

p_{i j}

.

The parameters of an IRG can also be represented by an

n \times n

symmetric P matrix with

P_{i i} = 0

for all

i \in [n]

, and

P_{i j} = P_{j i} = p_{i j}

for any

1 \leq i < j \leq n

. Since the elements of the P matrix are probabilities, therefore,

0 \leq P_{i j} \leq 1

for all

i, j \in [n]

. By definition, the degree distribution of any i node in

I R G_{n} (P)

is Poisson binomial with the parameters

{p_{i 1}, \dots, p_{i, i - 1}, p_{i, i + 1}, \dots, p_{i n}}

.

We briefly introduce the discrete Fourier transformation (DFT). DFT transforms the sequence of

n + 1

complex numbers

{x_{0}, x_{1}, \dots, x_{n}}

into another sequence of complex numbers

{y_{0}, y_{1}, \dots, y_{n}}

, where the transformation is defined by the formula

y_{k} = \sum_{l = 0}^{n} x_{l} exp (- i ω k l)

,

k = 0, 1, \dots, n

, and

ω = 2 π / (n + 1)

. There are fast Fourier transform (FFT) algorithms to compute DFT efficiently. The best known and most commonly used FFT algorithm is the Cooley–Tukey algorithm [21].

We define the total variation norm [22,23] and, based on this, the total variational distance. Consider a signed measure

μ

on a measurable space

(X, Σ)

. First, we define two non-negative measures:

\bar{ν} (μ, E) = sup \{μ (A) : A \in Σ and A \subset E\} for all E \in Σ .

(3)

\underset{̲}{ν} (μ, E) = inf \{μ (A) : A \in Σ and A \subset E\} for all E \in Σ .

(4)

The total variation norm of the

μ

measure is defined as:

{∥ μ ∥}_{T V} = \bar{ν} (μ, Σ) + |\underset{̲}{ν} (μ, Σ)| .

(5)

The total variational distance of the probability measures P and Q on the same

(Ω, F)

measurable space is defined as:

d_{T V} (P, Q) = {∥ P - Q ∥}_{T V} = 2 sup \{| P (A) - Q (A) | : A \in Ω\} .

(6)

The factor 2 above is usually dropped. Informally, this is the largest possible difference between the probabilities that two probability measures can assign to the same event. For discrete probability distributions, it is possible to write the

T V

distance as follows, where the

\frac{1}{2}

factor is applied to normalize

d_{T V} (P, Q)

to the range

[0, 1]

:

d_{T V} (P, Q) = \frac{1}{2} \sum_{x} | P (x) - Q (x) | .

(7)

We continue with the definition of p-norm and p-distance. For any

p \geq 1

integer, the p-norm [24] of the

f \in L^{p}

function is defined as:

{∥ f ∥}_{p} : = {(\int_{- \infty}^{\infty} {|f (x)|}^{p} d x)}^{\frac{1}{p}} .

(8)

The

L^{p}

distance [24] of the functions

f, g \in L^{p}

induced by the p-norm is given by:

d_{p} (f, g) = {∥ f - g ∥}_{p} : = {(\int_{- \infty}^{\infty} {|f (x) - g (x)|}^{p} d x)}^{\frac{1}{p}} .

(9)

We will use the notation

L (X)

to refer to the distribution of a random variable X.

2.2. Computing the Poisson Binomial Distribution: The DFT-CF Algorithm

Yili Hong showed in [19] that the probability mass function and the cumulative distribution function of the PB distribution can be computed directly using DFT. In particular, if

N \sim P B (p_{1}, \dots, p_{n}}

, then for all

k = 0, 1, \dots, n

:

ξ_{k} = P (N = k) = \frac{1}{n + 1} \sum_{l = 0}^{n} exp (- i ω l k) x_{l},

(10)

where

x_{l} = \prod_{j = 1}^{n} [1 - p_{j} + p_{j} exp (i ω l)]

and

ω = 2 π / (n + 1)

. In other words:

{ξ_{0}, ξ_{1}, \dots, ξ_{n}} = \frac{1}{n + 1} D F T {x_{0}, x_{1}, \dots, x_{n}} .

(11)

Hong also provided an effective implementation of (11) in [19]. Let

x_{l} = a_{l} + i b_{l}

for all

l \in {0, 1, \dots, n}

, where

a_{l}

and

b_{l}

are the real and imaginary parts of

x_{l}

, respectively, and

i = \sqrt{- 1}

. It can be shown that

x_{0} = \sum_{k = 0}^{n} ξ_{k} = 1

, and for

l > 0

, the complex conjugate of

x_{l}

can be expressed as:

\bar{x_{l}} = x_{n + 1 - l} = a_{n + 1 - l} - i b_{n + 1 - l}, l = 1, \dots, n .

(12)

Thus,

a_{l} = a_{n + 1 - l}

and

b_{l} = - b_{n + 1 - l}

. Let

z_{j} (l) = 1 - p_{j} + p_{j} cos (ω l) + i p_{j} sin (ω l)

, and denote the modulus and the argument of

z_{j} (l)

by

| z_{j} (l) |

and

A r g [z_{j} (l)]

, respectively. Then,

a_{l}

and

b_{l}

can be explicitly expressed by

z_{j} (l)

. For all

l = 1, \dots, n

:

a_{l} = d_{l} cos \{\sum_{j = 1}^{n} A r g [z_{j} (l)]\},

(13)

b_{l} = d_{l} sin \{\sum_{j = 1}^{n} A r g [z_{j} (l)]\},

(14)

d_{l} = exp \{\sum_{j = 1}^{n} log [| z_{j} (l) |]\} .

(15)

Here,

| z_{j} (l) | = {({[1 - p_{j} + p_{j} c o s (ω l)]}^{2} + {[p_{j} s i n (ω l)]}^{2})}^{1 / 2}

and

A r g [z_{j} (l)] = a t a n 2 [p_{j} s i n

(ω l), 1 - p_{j} + p_{j} c o s (ω l)]

. The function

a t a n 2 (y, x)

is defined as:

a t a n 2 (y, x) = \{\begin{cases} a r c t a n (\frac{y}{x}) & x > 0 \\ π + a r c t a n (\frac{y}{x}) & y \geq 0, x < 0 \\ - π + a r c t a n (\frac{y}{x}) & y < 0, x < 0 \\ \frac{π}{2} & y > 0, x = 0 \\ - \frac{π}{2} & y < 0, x = 0 \\ 0 & y = 0, x = 0 \end{cases}

(16)

According to this, we can use Algorithm 1 to compute the pdf of N, where

[.]

denotes the ceiling function.

Algorithm 1 Computing the Poisson binomial pdf

Let

x_{0} = 1

.

Let

x_{l} = a_{l} + i b_{l}

for

l = 1, \dots, n

, where:

a_{l}, b_{l} = \{\begin{cases} are computed using Equations (13) and (14) & for l = 1, \dots, [n / 2] \\ a_{l} = a_{n + 1 - l} and b_{l} = - b_{n + 1 - l} & for l = [n / 2] + 1, \dots, n \end{cases}

{ξ_{0}, ξ_{1}, \dots, ξ_{n}} = \frac{1}{n + 1} F F T {x_{0}, x_{1}, \dots, x_{n}}

return

{ξ_{0}, ξ_{1}, \dots, ξ_{n}}

The derivation of (10) is based on the characteristic function and DFT; therefore, the method is called the DFT-CF algorithm.

2.3. Approximation of the Poisson Binomial Distribution

In this section, we discuss various approximations of the Poisson binomial distribution. We will use these results in Section 3.3 to derive upper bounds for the approximation error of the estimated IRG degree distribution. More information about approximating the PB distribution, as well as other results, can be found in the paper [20] by Wenpin Tang and Fengmin Tang.

2.3.1. Poisson Approximation

First, we consider the use of the Poisson distribution as an approximation of the PB distribution. We use the notation

P o i (μ)

to the Poisson distribution with parameter

μ

. If X follows the PB distribution with parameters

p_{1}, \dots, p_{n}

, then we can approximate the distribution of X by the Poisson distribution with the parameter

μ = p_{1} + \dots + p_{n}

. The following theorem shows us how well the Poisson distribution approximates the Poisson binomial distribution.

Theorem 1

([20,25]). Let

X \sim P B (p_{1}, \dots, p_{n})

and

μ = \sum_{i = 1}^{n} p_{i}

. Then

\frac{1}{32} m i n (1, \frac{1}{μ}) \sum_{i = 1}^{n} p_{i}^{2} \leq d_{T V} (L (X), P o i (μ)) \leq \frac{1 - e^{- μ}}{2 μ} \sum_{i = 1}^{n} p_{i}^{2}

(17)

We see in [20] from (17) that the Poisson approximation of the PB distribution is good if

μ - σ^{2} = \sum_{i = 1}^{n} p_{i}^{2} ≪ \sum_{i = 1}^{n} p_{i}

, or equivalently, if

μ - σ^{2} ≪ μ

. There are two cases:

For small $μ$ , the upper bound in (17) is sharp.
For large $μ$ , the approximation error is on the order of $\sum_{i = 1}^{n} p_{i}^{2} / \sum_{i = 1}^{n} p_{i}$ .

The constant

1 / 32

in the lower bound can be improved to

1 / 14

[26]. The Poisson approximation can be viewed as a mean-matching procedure.

In Section 3.3.1, we will use the following theorem to compute the total variation distance of two differently parametrized Poisson distribution functions:

Theorem 2

([27]). For any

t > 0

and

x \geq 0

:

d_{T V} (P o i (t + x), P o i (t)) \leq m i n (x, \sqrt{\frac{2}{e}} (\sqrt{t + x} - \sqrt{t})) .

(18)

2.3.2. Binomial Approximation

We denote the binomial distribution with parameters n and p by

B i n (n, p)

. Suppose that

X \sim P B (p_{1}, \dots, p_{n})

and

μ = \sum_{i = 1}^{n} p_{i}

. Then, we can use

B i n (n, μ / n)

as an approximation of the distribution of X. The first result on the approximation precision of the Poission binomial distribution using the binomial distribution is due to Ehm [20,28]. The advantage of the binomial approximation over the Poisson approximation is justified by Theorem 3 from Choi and Xia:

Theorem 3

([20,29]). Let

X \sim P B (p_{1}, \dots, p_{n})

and

μ : = \sum_{i = 1}^{n} p_{i}

. For

m \geq 1

, let

d_{m} : = d_{T V} (L (X), B i n (m, μ / m))

. Then, for an m sufficiently large,

d_{m} < d_{m + 1} < \dots < d_{T V} (L (X), P o i (μ)) .

(19)

2.3.3. Gaussian Approximation

We denote the Gaussian distribution with expected value

μ

and variance

σ^{2}

by

N (μ, σ^{2})

. The Gaussian approximation of the Poisson binomial distribution follows from the Lyapunov or Lindenberg central limit theorem [30]. If

X \sim P B (p_{1}, \dots, p_{n})

,

μ = \sum_{i = 1}^{n} p_{i}

and

σ^{2} : = \sum_{i = 1}^{n} p_{i} (1 - p_{i})

, then we can use the Gaussian distribution with the parameters

μ

and

σ^{2}

to approximate the distribution of X. The following theorem gives an upper bound for the error of Gaussian approximation in terms of p-distance:

Theorem 4

([20,31]). Let

X \sim P B (p_{1}, \dots, p_{n})

,

μ : = \sum_{i = 1}^{n} p_{n}

and

σ^{2} : = \sum_{i = 1}^{n} p_{i} (1 - p_{i})

. Then there exists a universal constant

C > 0

such that

d_{p} (L (X), N (μ, σ^{2})) \leq \frac{C}{σ} for all p \geq 1 .

(20)

3. Materials and Methods

3.1. Problem Formulation

Suppose we are given an

I R G_{n} (P)

inhomogeneous random graph model (see Definition 1) with edge probabilities

P = {p_{i j}}

,

1 \leq i < j \leq n

. We aim to compute the degree distribution of

I R G_{n} (P)

. In particular, suppose that the node set of

I R G_{n} (P)

is

V = [n]

, and U is a uniformly distributed random variable on the integers V. We are looking for the probabilities

λ_{k} = P (d (U) = k)

for all

k \in {0, \dots, n - 1}

, where

d (a)

is the degree of node

a \in V

. In Section 3.2, we present an exact algorithm to compute

{λ_{0}, \dots, λ_{n - 1}}

, while in Section 3.3, we discuss an approximation method to estimate the values of

{λ_{0}, \dots, λ_{n - 1}}

.

3.2. Computing the Degree Distribution of Inhomogeneous Random Graph

We can apply (10) directly to compute the degree distribution of

I R G_{n} (P)

. First, let us express

λ_{k}

as:

λ_{k} = \sum_{a \in V} P [U = a] P [d_{U} = k | U = a] = \frac{1}{n} \sum_{a \in V} P [d_{a} = k] .

(21)

Since

d_{a}

has a PB distribution with parameters

{p_{a b} : b \in V ∖ {a}}

, we can use Equation (10):

P [d_{a} = k] = \frac{1}{n} \sum_{l = 0}^{n - 1} exp (- i ω l k) x_{a}^{l},

(22)

where

ω = 2 π / n

and

x_{a}^{l} = \prod_{b \in V ∖ {a}} (1 - p_{a b} + p_{a b} exp (i ω l))

. After substituting Equation (22) to the right side of Equation (21), we have:

λ_{k} = \frac{1}{n^{2}} \sum_{a \in V} \sum_{l = 0}^{n - 1} exp (- i ω l k) x_{a}^{l} = \frac{1}{n^{2}} \sum_{l = 0}^{n - 1} exp (- i ω l k) \sum_{a \in V} x_{a}^{l} = \frac{1}{n^{2}} \sum_{l = 0}^{n - 1} exp (- i ω l k) α^{l},

(23)

where

α_{l} = \sum_{a \in V} x_{a}^{l}

. In other words,

{λ_{0}, λ_{1}, \dots, λ_{n - 1}}

values can be expressed using the discrete Fourier transform:

{λ_{0}, λ_{1}, \dots, λ_{n - 1}} = \frac{1}{n^{2}} D F T {α_{0}, α_{1}, \dots, α_{n - 1}} .

(24)

We often have additional information about the structure of the

I R G

(for example, when the IRG is used to represent a SBM or static edge voting model). Suppose that a partition

V = \cup_{i = 1}^{m} M_{i}

of V is given where, for any

i, j \in {1, \dots, m}

,

i \neq j, M_{i} \cap M_{j} = \emptyset

, and the degree distribution of the nodes within the same

M_{i}

group is the same: for any

a, b \in M_{i} : L (d_{a}) = L (d_{b})

. We choose a representative element of each

M_{i}

set, which we denote by

r (M_{i})

. Since the degree distribution of the nodes in

M_{i}

is the same,

r (M_{i})

can be chosen arbitrarily. Given this partition of V, we can rewrite Equation (21) as:

λ_{k} = \sum_{i = 1}^{m} \sum_{a \in M_{i}} P [U = a] P [d_{a} = k]

(25)

Since nodes belonging to the same

M_{i}

cluster have the same degree distribution, therefore,

λ_{k} = \sum_{i = 1}^{m} | M_{i} | \cdot P [r (M_{i}) = k]

. Hence,

α_{l}

can be computed as:

α_{l} = \sum_{i = 1}^{m} | M_{i} | \cdot x_{r (M_{i})}^{l} .

(26)

In a similar way, we can use the partitions of V to rewrite Equations (13)–(15):

a_{l} = d_{l} cos \{\sum_{j = 1}^{m} | M_{j} | A r g [z_{r (M_{j})} (l)]\},

(27)

b_{l} = d_{l} sin \{\sum_{j = 1}^{m} | M_{j} | A r g [z_{r (M_{j})} (l)]\},

(28)

d_{l} = exp \{\sum_{j = 1}^{m} | M_{j} | log [| z_{r (M_{j})} (l) |]\} .

(29)

Based on this analysis, we give the pseudo-code of the algorithm to compute the exact degree distribution of the IRG model in Algorithm 2. Algorithm 2 uses Algorithm 3 to compute the

x_{a}^{l}

values. Algorithm 3 is the modified version of the first part of Algorithm 1, which computes

x_{a}^{0}, \dots, x_{a}^{n - 1}

for a fixed a node, taking advantage of the

M_{1}, \dots, M_{m}

clusters of V according to (27)–(29). The inputs of Algorithm 3 are

p a r r a y

and the

M = {M_{1}, \dots, M_{m}}

partition of V, where

p a r r a y

contains the parameters of the PB distribution. The vector

p a r r a y

is a slice of the edge probability matrix: for any

b \in V ∖ {a}

,

p a r r a y [b]

=

p_{a b}

, which is the probability of the

{a, b}

undirected link being created in the IRG model. Note that

p a r r a y [a]

=

p_{a a} = 0

by definition. Algorithm 2 calculates the pdf of the input IRG model invoking Algorithm 3 and utilizing (26). Its inputs are

i r g_m x

and the

M = {M_{1}, \dots, M_{m}}

clusters of V. The parameter

i r g_m x

is the matrix representation of the IRG model. For any

a, b \in V

,

i r g_m x [a, b] = p_{a b}

. Because of the undirected nature of the IRG model, the matrix

i r g_m x

is symmetric.

3.3. Approximation of the Degree Distribution of Inhomogeneous Random Graph

In this section, we present an approximation method to estimate the degree distribution of the IRG model. Suppose that given a

V = \cup_{i = 1}^{m} M_{i}

partition of V, where for any

i, j \in {1, \dots, m}, i \neq j : M_{i} \cap M_{j} = \emptyset

. We consider the

M_{i}

sets as node clusters, where within a given cluster, the degree distribution of the nodes is similar but not necessarily the same. Let us denote the cumulative distribution function of

d (U)

by

Λ (t)

:

Λ (t) = P [d (U) \leq t] = \sum_{a \in V} P [d (U) \leq t | U = a] P [U = a] = \frac{1}{n} \sum_{a \in V} F_{a} (t) = \frac{1}{n} \sum_{i = 1}^{m} \sum_{a \in M_{i}} F_{a} (t),

(30)

where

F_{a} (t)

is the CDF of

d (a)

,

F_{a} (t) = P [d (a) \leq t]

for all

a \in V

. Similarly, we can express

Λ (t)

as:

Λ (t) = \sum_{i = 1}^{m} P [d (U) \leq t | U \in M_{i}] P [U \in M_{i}] = \frac{1}{n} \sum_{i = 1}^{m} | M_{i} | P [d (U) \leq t | U \in M_{i}] .

(31)

Algorithm 2 compute_irg_pdf(irg_mx,

M = {M_{1}, \dots, M_{m}}

)

N =

number of rows in

i r g_m x

Initialize

α_{0}, \dots, α_{N - 1}

to be 0

for each

M \in M

do

representative_node =

r (M)

p a r r a y = i r g_m x

[representative_node]

remove representative_node from M

x = compute_x_vector(

p a r r a y

,

M

)

add representative_node to M

for i = 0 to N − 1 do

α [i] = α [i] + | M | \cdot x [i]

end for

for i = 0 to N − 1 do

α [i] = α [i] / ({(N - 1)}^{2})

end for

{λ_{0}, \dots, λ_{N - 1}} = F F T {α_{0}, \dots, α_{N - 1}}

return

λ

Algorithm 3 compute_x_vector(

p a r r a y

,

M = {M_{1}, \dots, M_{m}}

)

n =

size(

p a r r a y

)

- 1

for l = 0 to n do

if l = 0 then

x [l] = 1

else if

l \leq [n / 2]

then

s u m A r g = 0

s u m L n M o d = 0

for each

M \in M

do

if

| M | < 1

then

continue

end if

n o d e

= any node from cluster M

p = p a r r a y [n o d e]

q 1 = 1 - p + p cos (ω l)

q 2 = p sin (ω l)

s u m A r g = s u m A r g + a t a n 2 (q 2, q 1)

s u m L n M o d = s u m L n M o d + l n (\sqrt{q 1^{2} + q 2^{2}})

end for

d = exp (s u m L n M o d)

a = d cos (s u m A r g)

b = d sin (s u m A r g)

x [l] = a + b i

else

i d x = n - l + 1

a = R e l (x [i d x])

b = I m (x [i d x])

x [l] = a - b i

end if

end for

return x

We approximate the

P [d (U) \leq t | U \in M_{i}]

conditional distribution by

F_{M_{i}} (t)

:

F_{M_{i}} (t) \approx P [d (U) \leq t | U \in M_{i}] .

(32)

We denote the approximation of

Λ (t)

by

F (t)

. We express

F (t)

as the linear combination of the functions

F_{M_{i}} (t)

:

Λ (t) \approx F (t) = \frac{1}{n} \sum_{i = 1}^{m} \sum_{a \in M_{i}} F_{M_{i}} (t) = \frac{1}{n} \sum_{i = 1}^{m} | M_{i} | F_{M_{i}} (t) .

(33)

Based on this analysis, we present the scheme of the proposed approximation method in Algorithm 4. The input parameters of this algorithm are x, the

i r g_m x

edge probability matrix, and the

M

partition of the nodes. Algorithm 4 returns

F (x)

, the approximated value of

Λ (x)

. Note that we have not specified the computation of

F_{M_{i}}

in Algorithm 4. There are several possible ways to compute

F_{M_{i}}

. In the subsequent sections, we will describe some possible implementations. In Section 3.3.1, we will use the Poisson distribution; in Section 3.3.2, we will use the binomial distribution; and in Section 3.3.3, we wil use the Gaussian distribution to calculate

F_{M_{i}}

.

Algorithm 4 approximate_irg_CDF(x, irg_mx,

M = {M_{1}, \dots, M_{m}}

n =

number of rows in

i r g_m x

Initialize

r e t

to be 0

for each

M \in M

do

y = compute_cluster_CDF_approximation(x, irg_mx, M)

r e t = r e t + \frac{| M |}{n} \cdot y

end for

return

r e t

We will use the results of the following analysis to derive an upper bound for the approximation error in the special cases when the approximator distributions are Poisson, binomial or Gaussian. Suppose we are given an

(X, ∥ . ∥)

-normed space, and the function

Λ (t)

and its approximation

F (t)

are in X. Consider the distance function generated by the norm, defined as

d (x, y) = ∥ x - y ∥

for all

x, y \in X

. We also suppose that the functions

{G_{a} : a \in V}

and

{F_{a} : a \in V}

are in X. We can express the distance of

Λ (t)

and its approximation

F (t)

as:

d (Λ, F) = ∥ Λ - F ∥ = \frac{1}{n} ∥ \sum_{i = 1}^{m} \sum_{a \in M_{i}} (F_{a} - F_{M_{i}}) ∥ = \frac{1}{n} ∥ \sum_{i = 1}^{m} \sum_{a \in M_{i}} (F_{a} - G_{a} + G_{a} - F_{M_{i}}) ∥ .

(34)

We can think of

F_{M_{i}}

as the following: for all a nodes in

M_{i}

,

d_{a}

has its own

F_{a}

Poisson binomial distribution. We can approximate the distribution of

d_{a}

by the

G_{a}

local approximation function. We would like to aggregate these

{G_{a} : a \in M_{i}}

local approximation functions, and the aggregated approximator is

F_{M_{i}}

. From the norm triangle inequality:

d (Λ, F) \leq \frac{1}{n} \sum_{i = 1}^{m} \sum_{a \in M_{i}} ∥ F_{a} - G_{a} ∥ + \frac{1}{n} \sum_{i = 1}^{m} \sum_{a \in M_{i}} ∥ G_{a} - F_{M_{i}} ∥ .

(35)

If

∥ . ∥

is the total variation norm,

F_{M_{i}}

,

F_{a}

,

G_{a}

are discrete probability distributions for all

i \in {1, \dots, m}

and

a \in V

; then,

d (., .)

is the total variation distance. The 2 multiplicative factor comes from the connection between the total variation norm and the total variation distance given in (7).

d_{T V} (Λ, F) \leq \frac{2}{n} \sum_{i = 1}^{m} \sum_{a \in M_{i}} d_{T V} (F_{a}, G_{a}) + \frac{2}{n} \sum_{i = 1}^{m} \sum_{a \in M_{i}} d_{T V} (G_{a}, F_{M_{i}}) .

(36)

On the other hand, if

∥ . ∥

is the p-norm (

p \geq 1

integer), then

d (., .)

is the p-distance, and

d_{p} (Λ, F) \leq \frac{1}{n} \sum_{i = 1}^{m} \sum_{a \in M_{i}} d_{p} (F_{a}, G_{a}) + \frac{1}{n} \sum_{i = 1}^{m} \sum_{a \in M_{i}} d_{p} (G_{a}, F_{M_{i}}) .

(37)

We will use the notations:

μ_{a} = E [d_{a}]

and

σ_{a}^{2} = V a r [d_{a}]

. For all

M_{i}

node clusters, we denote

μ_{M_{i}}

and

σ_{M_{i}}^{2}

as the common mean and variance used for the cluster

M_{i}

. Furthermore, we suppose, that for all

a \in M_{i}

and

a_{i}, A_{i}, b_{i}, B_{i} \geq 0

real numbers:

μ_{a} \in [μ_{M_{i}} - a_{i}, μ_{M_{i}} + A_{i}],

(38)

and

σ_{a} \in [σ_{M_{i}} - b_{i}, σ_{M_{i}} + B_{i}] .

(39)

We also suppose that

μ_{M_{i}} - a_{i} > 0

and

σ_{M_{i}} - b_{i} > 0

for all

i = 1, \dots, m

.

3.3.1. Approximation Using the Poisson Distribution

Consider the case when we use the Poisson distribution for the approximation of the degree distribution of an IRG. We discuss first how to compute the

F_{M_{i}}

cluster approximation functions using the Poisson distribution for all

M_{i}

node clusters (how to implement the compute_cluster_CDF_approximation function in Algorithm 4). We specify

F_{M_{i}}

as the CDF of the Poisson distribution function with parameter

μ_{M_{i}}

, where

μ_{M_{i}}

is the average of the expected degrees in

M_{i}

:

μ_{M_{i}} = Average \{μ_{a} : μ_{a} = \sum_{b \in V ∖ {a}} p_{a b}, a \in M_{i}\} .

(40)

We now derive an upper bound for the approximation error in the case when the

F_{M_{i}}

cluster approximation functions are defined by the Poisson distribution. For each

a \in V

node, the

G_{a}

local approximation function is specified as the CDF of the Poisson distribution with the parameter

μ_{a}

. We apply Theorem 1 to give an upper bound on the TV distance between the

F_{a}

actual degree distribution of node a and its local approximation function

G_{a}

:

\sum_{a \in M_{i}} d_{T V} (F_{a}, G_{a}) \leq \sum_{a \in M_{i}} \frac{1 - e^{- μ_{a}}}{2 μ_{a}} (μ_{a} - σ_{a}^{2}) .

(41)

Since we are restricted to the cluster

M_{i}

, we can use (38) and (39) to give an upper bound to the right-hand side of the inequality (41). For all

a \in M_{i}

:

\frac{1 - e^{- μ_{a}}}{2 μ_{a}} (μ_{a} - σ_{a}^{2}) \leq \frac{1 - e^{- (μ_{M_{i}} + A_{i})}}{2 (μ_{M_{i}} - a_{i})} ((μ_{M_{i}} + A_{i}) - (σ_{M_{i}}^{2} - b_{i}^{2})) .

(42)

Therefore:

\sum_{a \in M_{i}} d_{T V} (F_{a}, G_{a}) \leq | M_{i} | \frac{1 - e^{- μ_{M_{i}} - A_{i}}}{2 (μ_{M_{i}} - a_{i})} (μ_{M_{i}} - σ_{M_{i}}^{2} + A_{i} + b_{i}^{2}) .

(43)

Computing an upper bound for the total variation distance between two Poisson distributions, we can use Theorem 2. Suppose that

a \in M_{i}

and

μ_{a} \leq μ_{M_{i}}

. Then, from Theorem 2:

d_{T V} (G_{a}, F_{M_{i}}) \leq m i n (μ_{M_{i}} - μ_{a}, \sqrt{\frac{2}{e}} (\sqrt{μ_{M_{i}}} - \sqrt{μ_{a}})) .

(44)

Similarly, if

μ_{a} > μ_{M_{i}}

, then from Theorem 2:

d_{T V} (G_{a}, F_{M_{i}}) \leq m i n (μ_{a} - μ_{M_{i}}, \sqrt{\frac{2}{e}} (\sqrt{μ_{a}} - \sqrt{μ_{M_{i}}})) .

(45)

The right side of both (44) and (45) can be expressed as

m i n (|μ_{M_{i}} - μ_{a}|, \sqrt{\frac{2}{e}} |\sqrt{μ_{M_{i}}} -

\sqrt{μ_{a}}|)

, for which the following upper bound can be given:

d_{T V} (G_{a}, F_{M_{i}}) \leq m i n (|μ_{M_{i}} - μ_{a}|, \sqrt{\frac{2}{e}} |\sqrt{μ_{M_{i}}} - \sqrt{μ_{a}}|) \leq l (μ_{M_{i}}, a_{i}, A_{i}) .

(46)

where:

l (μ_{M_{i}}, a_{i}, A_{i}) = m i n (m a x (a_{i}, A_{i}), \sqrt{\frac{2}{e}} m a x (\sqrt{μ_{M_{i}}} - \sqrt{μ_{M_{i}} - a_{i}}, \sqrt{μ_{M_{i}} + A_{i}} - \sqrt{μ_{M_{i}}})) .

(47)

Finally, after substituting (43) and (46) into (36):

d_{T V} (Λ, F) \leq \frac{1}{n} \sum_{i = 1}^{m} | M_{i} | (\frac{1 - e^{- μ_{M_{i}} - A_{i}}}{μ_{M_{i}} - a_{i}} (μ_{M_{i}} - σ_{M_{i}}^{2} + A_{i} + b_{i}^{2}) + 2 \cdot l (μ_{M_{i}}, a_{i}, A_{i})) .

(48)

It is easy to see that if

a_{i}, A_{i}, b_{i}, B_{i} \to 0

, then (48) goes to:

d_{T V} (Λ, F) \leq \frac{1}{n} \sum_{i = 1}^{m} | M_{i} | (\frac{1 - e^{- μ_{M_{i}}}}{μ_{M_{i}}} (μ_{M_{i}} - σ_{M_{i}}^{2})) .

(49)

The inequality in (49) also holds when the distribution of the node degrees within the same node cluster is the same.

3.3.2. Approximation Using the Binomial Distribution

We discuss now the calculation of

F_{M_{i}}

using the binomial distribution (the implementation of the compute_cluster_CDF_approximation function in Algorithm 4). In this case, we specify

F_{M_{i}}

as the CDF of the binomial distribution, with parameters

n - 1

and

μ_{M_{i}} / (n - 1)

, where n is the number of nodes and

μ_{M_{i}}

is given in (40). Similarly, for all

a \in V

, the

G_{a}

local approximation function is defined by the CDF of the binomial distribution with parameters

n - 1

, and

μ_{a} / (n - 1)

. From Theorem 3, we conclude that the upper bounds (48) and (49) derived for the Poisson approximation are applicable for the binomial approximation as well.

3.3.3. Approximation Using the Gaussian Distribution

We discuss the use of the Gaussian distribution to approximate the degree distribution of the IRG model. For each

i \in {1, \dots, m}

, we define the

F_{M_{i}}

function (the compute_cluster_CDF_approximation in Algorithm 4) as the CDF of the Gaussian distribution with parameters

μ_{M_{i}}

and

σ_{M_{i}}^{2}

, where

μ_{M_{i}}

is given in (40) and

σ_{M_{i}}^{2}

is computed as:

σ_{M_{i}}^{2} = Average \{σ_{a}^{2} : σ_{a}^{2} = \sum_{a \neq b} p_{a b} (1 - p_{a b}), a \in M_{i}\} .

(50)

We now derive an upper bound for the approximation error in terms of 1-distance (p-distance with

p = 1

) in the case when the

F_{M_{i}}

cluster approximation functions are defined by the Gaussian distribution. For each

a \in V

, the

G_{a}

local approximation function is also defined by the Gaussian distribution:

G_{a}

is the CDF of the Gaussian distribution with parameters

μ_{a}

and

σ_{a}^{2}

. Let us apply now (37) for the Gaussian approximation. We denote the CDF of the standard normal distribution by

Φ (x)

. If

a \in M_{i}

, the 1-distance of

F_{M_{i}}

and

G_{a}

can be expressed as:

d_{1} (F_{M_{i}}, G_{a}) = (\int_{- \infty}^{\infty} |Φ (\frac{x - μ_{a}}{σ_{a}}) - Φ (\frac{x - μ_{M_{i}}}{σ_{M_{i}}})| d x) .

(51)

Introducing the notations

l_{i} (x) = \frac{x - μ_{M_{i}} - A_{i}}{σ_{M_{i}} + B_{i}}

and

u_{i} (x) = \frac{x - μ_{M_{i}} + a_{i}}{σ_{M_{i}} - b_{i}}

, it is easy to see that for all

a \in M_{i}

:

l_{i} (x) \leq \frac{x - μ_{a}}{σ_{a}} \leq u_{i} (x)

and

l_{i} (x) \leq \frac{x - μ_{M_{i}}}{σ_{M_{i}}} \leq u_{i} (x)

. Therefore, since

Φ (x)

is increasing in x, for any

a \in M_{i}

, we can apply the following upper bound:

|Φ (\frac{x - μ_{a}}{σ_{a}}) - Φ (\frac{x - μ_{M_{i}}}{σ_{M_{i}}})| \leq Φ (u_{i} (x)) - Φ (l_{i} (x)) .

(52)

We approximate the

Φ (u_{i} (x)) - Φ (l_{i} (x))

difference as:

Φ (u_{i} (x)) - Φ (l_{i} (x)) = \frac{1}{\sqrt{2 π}} \int_{l_{i} (x)}^{u_{i} (x)} exp (- \frac{t^{2}}{2}) d t \leq \frac{1}{\sqrt{2 π}} \int_{l_{i} (x)}^{u_{i} (x)} \frac{1}{1 + \frac{t^{2}}{2}} d t .

(53)

Substituting

x = \frac{t^{2}}{2}

into the inequality

e^{- x} \leq \frac{1}{1 + x}

(

x \geq 0

) [32], we obtain

e^{- \frac{t^{2}}{2}} \leq \frac{1}{1 + \frac{t^{2}}{2}}

, which proves (53). Since the antiderivative of

\frac{1}{1 + \frac{t^{2}}{2}}

is

\sqrt{2} arctan (t / \sqrt{2}) + c

:

Φ (u_{i} (x)) - Φ (l_{i} (x)) \leq \frac{1}{\sqrt{π}} (arctan \frac{u_{i} (x)}{\sqrt{2}} - arctan \frac{l_{i} (x)}{\sqrt{2}}) .

(54)

Therefore:

d_{1} (F_{M_{i}}, G_{a}) \leq \frac{1}{\sqrt{π}} (arctan \frac{u_{i} (x)}{\sqrt{2}} - arctan \frac{l_{i} (x)}{\sqrt{2}}) .

(55)

After substituting the definition of

u_{i} (x)

and

l_{i} (x)

into the right side of (55) and using the identity

\int_{- \infty}^{\infty} (arctan (a x + b) - arctan (c x + d)) d x

=

π (\frac{b}{a} - \frac{d}{c})

if

a, b > 0

(for derivation, see Appendix A):

\frac{1}{\sqrt{π}} \int_{- \infty}^{\infty} (arctan (\frac{x + a_{i} - μ_{M_{i}}}{\sqrt{2} (σ_{M_{i}} - b_{i})}) - arctan (\frac{x - (A_{i} + μ_{M_{i}})}{\sqrt{2} (σ_{M_{i}} + B_{i})})) d x = \sqrt{π} (a_{i} + A_{i}) .

(56)

As a result, we have:

d_{1} (F_{M_{i}}, G_{a}) \leq \sqrt{π} (a_{i} + A_{i}) .

(57)

It is clear that

d_{1} (F_{M_{i}}, G_{a})

goes to zero as

a_{i}, A_{i} \to 0

. Furthermore, for any

M_{i}

node cluster:

\sum_{a \in M_{i}} d_{1} (G_{a}, F_{M_{i}}) \leq | M_{i} | \sqrt{π} (a_{i} + A_{i}) .

(58)

Applying Theorem 4 with

p = 1

:

\sum_{a \in M_{i}} d_{1} (G_{a}, F_{a}) \leq \sum_{a \in M_{i}} \frac{C}{σ_{a}} \leq C \sum_{a \in M_{i}} \frac{1}{σ_{M_{i}} - b_{i}} \leq | M_{i} | \frac{C}{σ_{M_{i}} - b_{i}},

(59)

where C is a universal constant from Theorem 4. After substituting (58) and (59) into (37):

d_{1} (Λ, F) \leq \frac{1}{n} \sum_{i = 1}^{m} | M_{i} | (\sqrt{π} (a_{i} + A_{i}) + \frac{C}{σ_{M_{i}} - b_{i}}) .

(60)

If

a_{i}, A_{i}, b_{i} \to 0

, then the right side of (60) goes to

d_{1} (Λ, F) \leq \frac{1}{n} \sum_{i = 1}^{m} | M_{i} | \frac{C}{σ_{M_{i}}} .

(61)

4. Numerical Experiments

We demonstrate the developed methods with numerical experiments. In Section 4.1 we compute the degree distribution of the ER random graph using Algorithm 2. In Section 4.3 and Section 4.4, we experimentally test the precision of the approximation method. In Section 4.3, we observe how the approximation error changes as the network size changes. For this test, we use two IRG types: in the first type, the network has a block structure, and within one block, the degree distribution of the nodes is the same. In the second type, there is no such a structure; for any different a and b nodes,

d_{a}

and

d_{b}

follow different distributions. We group the a and b nodes to the same M cluster only if

d_{a}

and

d_{b}

have the same distribution; therefore, for the second IRG type, every group contains only a single node. In Section 4.4, we fix an IRG with the second type: for any different a and b nodes,

d_{a}

and

d_{b}

have different distributions. For this experiment, we create partitions

C_{S} (V)

of the V nodes, where S denotes the common cluster size in the

C_{S} (V)

partition. Therefore, for a fixed

C_{S} (V)

partition and any

M \in C_{S} (V)

node cluster, the distribution of

d_{a}

and

d_{b}

are different if

a, b \in M

and

a \neq b

. We observe how approximation precision changes as the cluster size changes. We use the biased static edge voting model [14] to generate the test IRGs with appropriate structures. Hence, in Section 4.2, we briefly discuss the biased static edge voting model.

4.1. The ER Test

In the special case when all the

p_{a b}

edge probabilities of an

I R G_{n} (P)

are equal to the same p value, we obtain the ER random graph with the parameter p. We know that the degree distribution of the ER random graph model with a fixed n node number and p link probability is binomial [15] with parameters

n - 1

and p. Therefore, the degree distribution of the ER random graph with parameters n and p can be expressed as:

p_{n, p}^{E R} (k) = P (d (U_{n}) = k) = (\binom{n - 1}{k}) p^{k} {(1 - p)}^{n - 1 - k} for each k \in {0, \dots, n - 1},

(62)

where

U_{n}

is a uniform random variable on the

V_{n} = [n]

node set. Let us denote the IRG model on the node set

V_{n} = [n]

by

I R G_{n} (p)

, where all the

p_{a b}

edge probabilities are set to the same p probability. It is clear that

E R_{n} (p)

and

I R G_{n} (p)

denote the same random graph model; therefore, we expect that if we compute the degree distribution of

I R G_{n} (p)

using Algorithm 2, we obtain exactly the degree distribution of the

E R_{n} (p)

model given in (62). We experimentally tested this statement setting the n network size to be

n = 1000

and computed the degree distribution of

I R G_{n} (p)

using Algorithm 2 for each

p \in {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}

. Let us denote the degree distribution of

I R G_{n} (p)

computed by Algorithm 2 with

{p_{n, p}^{I R G} (k) : k \in {0, \dots, n - 1}}

, and calculate the total variation distance between the degree distributions

{p_{n, p}^{E R} (k) : k \in {0, \dots, n - 1}}

and

{p_{n, p}^{I R G} (k) : k \in {0, \dots, n - 1}}

:

T V_{n, p} = d_{T V} (p_{n, p}^{E R}, p_{n, p}^{I R G}) .

(63)

The magnitude of the

T V_{n, p}

total variation distance values for each p = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 parameter value was

10^{- 13}

, which means that we can consider the distributions

p_{n, p}^{E R}

and

p_{n, p}^{I R G}

to be the same.

4.2. The Biased Static Edge Voting Model

We used the biased static edge voting model [14] to generate the appropriate IRG parameterizations for the experiments in Section 4.3 and Section 4.4. The model with N nodes is defined by the parameter set

{D_{1}, \dots, D_{N}}

and a single positive real value

η

. For any

a \in V

, the parameter

D_{a}

controls the local behaviour of node a, while

η

is a model-level control parameter. We can group the nodes based on their

D_{a}

parameter values: nodes a and b belong to the same S group if and only if

D_{a} = D_{b}

. This naturally defines a partition of the nodes. We suppose that the

D_{a}

parameters are in the set

{0, 1, \dots, N - 1

}, and we index a cluster with the common parameter of the nodes within the cluster:

S_{i} = {a : D_{a} = i, a \in V}

. Denote as

V_{a b}

a random variable that represents the vote of the a node for the

{a, b}

edge candidate. We assume that for all

a \in S_{i}

and

b \in S_{j}

,

V_{a b}

follows the same probability distribution. For any pair of different nodes a and b, the probability that there will be an edge between the nodes a and b depends on the incoming votes, and it is given by

s (V_{a b}, V_{b a})

, where s is the edge probability function. The biased edge voting model specifies this definition in the following way: for any

a \in S_{i}

and

b \neq a

nodes, the

V_{a b}

random variable is Bernoulli distributed with a parameter of

\frac{i}{N - 1}

, i.e.,

V_{a b} \sim B e r n o u l l i (\frac{i}{N - 1})

. The edge probability function is given as:

s (V_{a b}, V_{b a}) = 1 - e^{- η v (V_{a b}, V_{b a})},

(64)

where

η > 0

is the control parameter of the model, and

v (V_{a b}, V_{b a}) = \frac{D_{a}}{N - 1} V_{a b} + \frac{D_{b}}{N - 1} V_{b a} .

(65)

The

p_{a b}

probability that the

{a, b}

link is drawn is given by the formula [14]:

p_{a b} = p_{a}^{*} + p_{b}^{*} - p_{a}^{*} p_{b}^{*},

(66)

where:

p_{z}^{*} = \frac{D_{z}}{N - 1} (1 - e^{- η \frac{D_{z}}{N - 1}}) for all z \in V .

(67)

The model is defined by the parameters

η

and

D_{1}, \dots, D_{N}

, or equivalently by the

η

number and the partition

V = [N] = S_{0} \cup \dots \cup S_{N - 1}

of the nodes, where for each

a \in S_{i}

,

D_{a} = i

(in some cases,

S_{i}

can be empty). Given these parameters, links are drawn independently, and the probability that the edge between nodes a and b is drawn is given by (66). We use this model to generate different IRG parametrizations by applying Equation (66). It is clear that different parametrizations of the voting model lead to different

I R G

models. As we have seen, parametrization means to fix the value of the control parameter

η

and the sequence

D_{1}, \dots, D_{N}

. To generate the

D_{1}, \dots, D_{N}

sequence, we used two methods: range and lognormal. Range is defined as the first N non-negative integer:

R a n g e (N) = (0, 1, \dots, N - 1)

. We denote the lognormal sequence generator by

L o g n o r m a l S e q (μ, σ, N)

, where N is the length of the sequence, and

μ

and

σ

are the parameters of the lognormal distribution. A positive random variable X is log-normally distributed with parameters

μ

and

σ

if

l n (X)

is normally distributed with mean

μ

and standard deviation

σ

. For the lognormal sequence generator algorithm, we suppose that the rate of nodes with parameter k is approximately

r_{k} = (F (k + 0.5) - F (k - 0.5)) / F (N - 0.5)

, where

F (x)

is the cumulative distribution function of the lognormal distribution with parameters

μ

and

σ

. Therefore, the number of nodes with parameter k is approximately

N r_{k}

. The algorithm is given in Algorithm 5. Its input parameter

c d f

can be any cumulative distribution function. We plotted the empirical density function of the sequence

L o g n o r m a l S e q (5, 0.6, 3000)

in Figure 1. The

η

parameter controls the global behaviour of the model. In the rest of this paper, we fix the value of

η

to 2.0.

Algorithm 5 parameter_sequence_generator(node_nr, cdf)

parameter_sequence = empty list
not_finished_nodes = node_nr
max_param = node_nr − 1
normalizer = cdf(max_param + 0.5)
for param = 0 to max_param do
      m = param − 0.5
      M = param + 0.5
      p = (cdf(M) − cdf(m))/normalizer
      nr = min(round(

p \cdot n o d e_n r

), not_finished_nodes)
      Add param to the degree_parameter list nr times
      not_finished_nodes = not_finished_nodes − nr
      if not_finished_nodes ≤ 0 then
           Break
      end if
end for
return parameter_sequence

4.3. The Effect of Network Size on Approximation Accuracy

In this test, we experimentally observe the effect of network size on approximation accuracy. For a sequence of networks with increasing network size, we compare the degree distributions returned by the approximation method (Algorithm 4) to the exact degree distributions computed by Algorithm 2. We use the total variation distance for comparison. When the approximator uses the Poisson or the binomial distributions, we can directly use the total variation distance; however, when the Gaussian distribution is used for approximation, we use the following discretization to obtain a discrete probability distribution: if X is a continuous random variable with

F_{X}

cdf, then its discretized version

\bar{X}

has a pdf:

P [\bar{X} = k] = F_{X} (k + 0.5) - F_{X} (k - 0.5) for all integers k .

(68)

To create the test IRG parametrizations, we used the biased static edge voting model using lognormal and range parametrization methods. For both cases, the network sizes are 50, 100, 300, 500, 1000, 1500, 2000, 2500, and 3000.

We denote the biased static edge voting model with parametrization

R a n g e (n)

and

η = 2

by

E V_{n}^{R}

, and the IRG generated from

E V_{n}^{R}

using (66) by

I R G_{n}^{R} (P_{n}^{R})

. It is clear that in

I R G_{n}^{R} (P_{n}^{R})

, for all different nodes a and b,

L (a) \neq L (b)

. Similarly, we denote the static edge voting model with parametrization

L o g n o r m a l S e q (n, μ, σ)

by

E V_{n}^{L N}

, and the IRG generated from

E V_{n}^{L N}

using (66) by

I R G_{n}^{L N} (P_{n}^{L N})

. The values of

μ

and

σ

for the

L o g n o r m a l S e q

parametrization for each n can be found in Table 1. Because of the construction,

I R G_{n}^{L N} (P_{n}^{L N})

has a block structure. It contains clusters, and within a cluster, the nodes have the same degree distribution. In Table 1, we collected basic statistics about the clusters for the used

L o g n o r m a l S e q

parametrizations.

Let us denote the exact degree distribution of

I R G_{n}^{T}

computed with Algorithm 2 by

{p_{n, T} (k) : k \in {0, \dots, n - 1}}

, where T can be R or

L N

. Similarly, we denote the approximated degree distribution of

I R G_{n}^{T}

computed with Algorithm 4 by

{p_{n, T}^{D} (k) : k \in {0, \dots, n - 1}}

, where T can be R or

L N

and G stands for the used approximation distribution: P (Poisson), B (binomial), or G (Gaussian). For example, we denote the approximated degree distribution of

I R G_{n}^{R}

using the Poisson distribution by

{p_{n, R}^{P}}

. We calculated the total variation distance between the approximated and the exact degree distribution:

T V_{n, T}^{D} = d_{T V} ({p_{n, T}}, {p_{n, T}^{D}}) .

(69)

The calculated

T V_{n, T}^{D}

values are collected in Table 2 and Table 3. We also plotted the total variation distances in the function of the network size in Figure 2 and Figure 3. We can observe that in the case of the

R a n g e

parametrization, the approximation error monotonically decreases with the network size, and we achieve the best approximation using the Gaussian approximation. At the

L o g n o r m a l S e q

parametrization, we can observe an initial fluctuation in the approximation error, and after this, there is a monotone decreasing trend in the approximation error in function of the network size. In this case, we obtain the smallest approximation error when we use the binomial distribution.

4.4. The Effect of Cluster Size on Approximation Accuracy

In this experiment, we test how the accuracy of the approximation method depends on the cluster size.

I R G (P)

denotes the IRG model generated from the biased edge voting model with parametrization

R a n g e (3000)

using (66). This IRG is “very inhomogeneous” in the sense that all edge probabilities are different; therefore, the degree distribution of each node is different. We compute the exact degree distribution of

I R G (P)

using Algorithm 2 and denote the result by

{p (k) : k \in {0, 1, \dots, 2999}

. We plotted

{p (k)}

in Figure 4.

We denote the number of nodes by n (

n = 3000

in the current setting) and identify the nodes by their parameter in the static edge voting model used to generate the

I R G (P)

. This means that for any

a \in V

node, the

D_{a}

parameter of the node in the generator biased edge voting model was a. Let us fix a cluster size

S > 0

and suppose that n is divisible by S. We define a

C_{S} (V)

partition of

V = [n]

as:

C_{S} (V) = {{0, \dots, S - 1}, {S, \dots, 2 S - 1}, \dots, {n - S, \dots, n - 1}} .

(70)

The degree distributions of all nodes within a cluster of partition

C_{S} (V)

are different. We tested the approximation method described in Section 3.3, where the node clusters are given by

C_{S} (V)

, and the S cluster size is in

S

= {1500, 1000, 750, 600, 500, 300, 200, 100, 75, 60, 50, 30, 20, 10, 5, and 1}. For example, if the S cluster size is 1000, then we have 3 node clusters:

C_{1000} (V) = {M_{1} = {0, \dots, 999}, M_{2} = {1000, \dots, 1999}, M_{3} = {2000, \dots, 2999}}

. We computed the approximated degree distribution of

I R G (P)

using Algorithm 4 and denoted the result distributions by

{p_{S}^{D} (k) : k \in {0, 1, \dots, 2999}}

, where S denotes the cluster size and D represents the type of distribution used in the approximation, which can be P (Poisson), B (binomial), or G (Gaussian). For all S cluster sizes from

S

, we calculated the total variation distance between the exact degree distribution (plotted in Figure 4) and the approximated degree distributions:

T V_{S}^{D} = d_{T V} ({p (k)}, {p_{S}^{D} (k)}) .

(71)

The results are collected in Table 4 and plotted in Figure 5. We can observe that the approximation error decreases as the cluster shrinks (or the number of clusters increases). If the cluster size is huge, then the Poisson approximation gives the smallest approximation error, while if the clusters are small, the Gaussian approximation gives the best results, although in the case of small clusters, the difference between the approximators is small.

5. Discussion

Inhomogeneous random graph is a random graph model where the links of the graph are drawn independently and the link probabilities can be different. It can be seen as the generalization of the Erdős–Rényi random graph [15], where the edges are drawn independently, but the probability that any different two nodes are linked is a fixed p value. The degree distribution of a deterministic or random graph with n nodes is defined as the probability that the degree of a uniformly chosen node equals to k for all

k = 0, \dots, n - 1

. The degree distribution has a central role in network science, not only because it is needed to compute several other network properties [1], but also because the shape of the degree distribution mostly determines the outcome of many important network processes, such as the spread of viruses [5], diffusion of innovations [6,7], or attacks against critical infrastructure [2,3,4]. The degree distribution of many real-world networks have fat tails; therefore, the ER random graph model is not suitable to model real world networks, because its degree distribution is binomial [15]. Therefore, many alternative random graph models have been proposed to be able to model real world networks, such as the stochastic block model [8], generalized random graphs [12], Chung–Lu random graphs [16], the Norros–Reittu model [17], and the static edge voting model [14] (Section 4.2), which we used to generate the appropriate IRG parametrisations to test our algorithms in Section 4. Using different parametrizations of the IRG model, one can achieve random network models with very different degree distributions. IRG is interesting not only as the generalization of the ER random graph but also as a tool to analyse other random graph models, such as the stochastic block model, generalized random graphs, or the static edge voting model.

In this paper, we focused on the calculation of the degree distribution of the IRG model. In Section 3.2, we discussed an algorithm to compute the exact degree distribution utilizing the DFT-CF method [19] developed by Yili Hong. The proposed algorithm (Algorithm 2) is highly parallelizable since the sub-step given in Algorithm 3 can be called independently. Furthermore, if the IRG model has a block structure, Algorithm 2 can take advantage of it. In Section 3.3, we presented a method to approximate the degree distribution of the IRG model. There are several reasons why one would apply approximation even if an exact computational method is available. One reason is that approximation is computationally cheaper than the exact method. A second reason is that the approximation method may also be used in the case when the IRG is not fully defined. At the beginning of Section 3.3, we discussed the general scheme of the proposed approximation method, which is presented in Algorithm 4. The idea of the approximation method is simple: we group the nodes of the network according to their statistical behaviour. For each node group, we approximate the common behaviour of the nodes within the group, and finally, we aggregate these group approximations. As a result, we obtain a mixture model, which we use as an approximation of the exact degree distribution. Similarly to the exact algorithm, it can be implemented effectively in a multi-thread environment. In Algorithm 4, we did not specify how to approximate the common behaviour of a node group, because it can be done in many ways, but in the subsequent subsections, we analysed three possible ways: using the Poisson distribution (Section 3.3.1), the binomial distribution (Section 3.3.2), and the Gaussian distribution (Section 3.3.3). Furthermore, we derived an upper bound for the approximation error for all three cases: Equation (48) for the Poisson and the binomial approximations, and Equation (60) for the Gaussian approximation.

Determining which distribution to use for optimal results is a natural question. Unfortunately, we do not have a clear answer to this. During the numerical experiments in Section 4, we found that the structure of the IRG and the granularity of the clustering influences which distribution will lead to the most accurate approximation. In Section 4.3, we tested the approximation method on IRG models having a block structure (see Figure 2 and Table 2), where within each block, the nodes obey the same degree distribution. In this case, we could observe that using the binomial distribution gave the most accurate results. However, in the case where the degree distribution of each node was different and we did not apply grouping, using the Gaussian distribution gave the best results (see Figure 3 and Table 3). In Section 4.4, we tested the effect of the clustering granularity on the approximation precision. We fixed an IRG model where the degree distribution of each node is different and applied the approximation with clustering, where the cluster size was different for each test case. We found that for larger cluster sizes, the usage of the Poisson distribution gave the most accurate estimate, and for smaller cluster sizes, the Gauss distribution gave the best results (see Figure 5 and Table 4).

The approximation method can be extended or improved in several ways. In this study, we analysed the usage of Poisson, binomial and Gaussian distributions. However, there are other distributions that we could use in a similar way. One obvious possibility is using the PB distribution itself. Another candidate for this is the translated Poisson distribution [20,33] or the Pólya approximation of the PB distribution [34]. Another direction can be the optimal selection of the group approximation method. We have seen in Section 4.3 and Section 4.4 and in the previous paragraph that the structure of the IRG and the granularity of the clustering influences which distribution will lead to the most accurate approximation. It is an open question if we can implement a selection method to find the optimal approximator distribution.

Author Contributions

Conceptualization, R.P.; methodology, R.P.; software, R.P.; validation, R.P.; investigation, R.P.; writing—original draft preparation, R.P.; writing—review and editing, L.K. and R.P.; supervision, L.K.; project administration, L.K.; funding acquisition, L.K. All authors have read and agreed to the published version of the manuscript.

Funding

Project no. 2019-1.3.1-KK-2019-00007 has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the 2019-1.3.1-KK funding scheme. This project has been supported by the Hungarian National Research, Development and Innovation Fund of Hungary, financed under the TKP2021-NKTA-36 funding scheme.

Data Availability Statement

The source code of the numerical experiments is available at https://github.com/rpethes/IRG (accessed on 11 February 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

We calculate

\int_{- \infty}^{\infty} (arctan (a x + b) - arctan (c x + d)) d x

if

a, b > 0

. First, derive the antiderivative of

arctan (a x + b) - arctan (c x + d)

. From linearity of integral:

Q (x) = \int (arctan (a x + b) - arctan (c x + d)) d x = \int arctan (a x + b) d x - \int arctan (c x + d) d x .

(A1)

Let’s continue with the indefinite integral

\int arctan (a x + b) d x

. Applying the substitution

u = a x + b

:

\int arctan (a x + b) d x = \frac{1}{a} \int arctan (u) d u .

(A2)

The antiderivative of

arctan (u)

is

u arctan (u) - \frac{ln (u^{2} + 1)}{2} + C

[35], therefore:

R (x; a, b) : = \int arctan (a x + b) d x = \frac{(a x + b) arctan ((a x + b))}{a} - \frac{ln ({(a x + b)}^{2} + 1)}{2 a} + C .

(A3)

Therefore:

Q (x) = R (x; a, b) - R (x; c, d) .

(A4)

Since

{lim}_{x \to \infty} arctan (x) = \frac{π}{2}

and

{lim}_{x \to - \infty} arctan (x) = - \frac{π}{2}

:

\int_{- \infty}^{\infty} (arctan (a x + b) - arctan (c x + d)) = l i m_{x \to \infty} (Q (x) - Q (- x)) = π (\frac{b}{a} - \frac{d}{c}) .

(A5)

References

Barabási, A.-L. Network Science; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
Albert, R.; Jeong, H.; Barabási, A.-L. Attack and error tolerance of complex networks. Nature 2000, 406, 378. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cohen, R.; Erez, K.; ben-Avraham, D.; Havlin, S. Resilience of the Internet to random breakdowns. Phys. Rev. Lett. 2000, 85, 4626. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cohen, R.; Erez, K.; ben-Avraham, D.; Havlin, S. Breakdown of the Internet under intentional attack. Phys. Rev. Lett. 2001, 86, 3682. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Pastor-Satorras, R.; Vespignani, A. Epidemic spreading in scalefree networks. Phys. Rev. Lett. 2001, 86, 3200–3203. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Valente, T.W. Network Models of the Diffusion of Innovations; Hampton Press: Cresskill, NJ, USA, 1995. [Google Scholar]
Rogers, E.M. Diffusion of Innovations; Simon and Schuster: New York, NY, USA, 2010. [Google Scholar]
Holl, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar]
Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. J. Phys. Rev. E 2011, 83, 016107. [Google Scholar] [CrossRef] [Green Version]
Barabási, A.; Oltvai, Z.N. Network biology: Understanding the cell’s functional organization. Nat. Rev. Genet. 2004, 5, 101–113. [Google Scholar] [CrossRef]
Buhl, J.; Gautrais, J.; Reeves, N.; Solé, R.V.; Valverde, S.; Kuntz, P.; Theraulaz, G. Topological patterns in street networks of self-organized urban settlements. Eur. Phys. J.-Condens. Matter Complex Syst. 2006, 49, 513–522. [Google Scholar] [CrossRef]
Tom, B.; Deijfen, M.; Martin-Löf, A. Generating simple random graphs with prescribed degree distribution. J. Stat. Phys. 2006, 124.6, 1377–1397. [Google Scholar]
Van Der Hofstad, R. Random Graphs and Complex Networks; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
Róbert, P.; Kovács, L. Voting to the link: A static network formation model. Acta Polytech. Hung. 2020, 17, 207–228. [Google Scholar]
Erdős, Paul and Alfréd Rényi On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 1960, 5, 17–60.
Chung, F.; Chung, F.R.; Graham, F.C.; Lu, L. Complex Graphs and Networks; No. 107; American Mathematical Soc.: Providence, RI, USA, 2006. [Google Scholar]
Ilkka, N.; Reittu, H. On a conditionally Poissonian graph process. Adv. Appl. Probab. 2006, 38, 59–75. [Google Scholar]
Barabási, A.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hong, Y. On computing the distribution function for the Poisson binomial distribution. Comput. Stat. Data Anal. 2013, 59, 41–51. [Google Scholar] [CrossRef]
Tang, W.; Tang, F. The Poisson binomial distribution—Old & New. Stat. Sci. 2022, 1, 1–12. [Google Scholar]
Cooley, J.W.; Tukey, J.W. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965, 19, 297–301. [Google Scholar] [CrossRef]
Saks, S. Theory of the Integral; Warszawa–Lwów: G.E. Stechert & Co.: New York, NY, USA, 1937. [Google Scholar]
Total Variation. Available online: https://handwiki.org/wiki/Total_variation (accessed on 21 January 2023).
Rudin, W. Functional Analysis, 2nd ed.; McGraw-Hill: New York, NY, USA, 1991. [Google Scholar]
Barbour, A.D.; Hall, P. On the rate of Poisson convergence. In Mathematical Proceedings of the Cambridge Philosophical Society; Cambridge University Press: Cambridge, UK, 1984; Volume 95. [Google Scholar]
Janson, S. Coupling and Poisson approximation. Acta Appl. Math. 1994, 34, 7–15. [Google Scholar] [CrossRef]
Adell, J.A.; Jodrá, P. Exact Kolmogorov and total variation distances between some familiar discrete distributions. J. Inequalities Appl. 2006, 2006, 1–8. [Google Scholar] [CrossRef] [Green Version]
Ehm, W. Binomial approximation to the Poisson binomial distribution. Stat. Probab. Lett. 1991, 11, 7–16. [Google Scholar] [CrossRef]
Choi, K.P.; Xia, A. Approximating the number of successes in independent trials: Binomial versus Poisson. Ann. Appl. Probab. 2002, 12, 1139–1148. [Google Scholar] [CrossRef]
Billingsley, P. Probability and Measure; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Petrov, V.V. Sums of Independent Random Variables; De Gruyter: Berlin, Germany, 2022. [Google Scholar]
Spivak, M. Calculus, 4th ed.; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Röllin, A. Translated Poisson approximation using exchangeable pair couplings. Ann. Appl. Probab. 2007, 17, 1596–1614. [Google Scholar] [CrossRef] [Green Version]
Skipper, M. A Pólya approximation to the Poisson-binomial law. J. Appl. Probab. 2012, 49, 745–757. [Google Scholar] [CrossRef] [Green Version]
Inverse Trigonometric Functions. Available online: https://en.wikipedia.org/wiki/Inverse_trigonometric_functions (accessed on 21 January 2023).

Figure 1. The empirical density function of the sequence

L o g n o r m a l S e q (5, 0.6, 3000)

.

Figure 1. The empirical density function of the sequence

L o g n o r m a l S e q (5, 0.6, 3000)

.

Figure 2. Total variational distance between the approximated and the exact degree distribution in the function of network size when the IRGs are generated by

L o g n o r m a l S e q

parametrization of the biased static edge voting model.

Figure 2. Total variational distance between the approximated and the exact degree distribution in the function of network size when the IRGs are generated by

L o g n o r m a l S e q

parametrization of the biased static edge voting model.

Figure 3. Total variational distance between the approximated and the exact degree distribution in the function of network size when the IRGs are generated by

R a n g e

parametrization of the biased static edge voting model.

Figure 3. Total variational distance between the approximated and the exact degree distribution in the function of network size when the IRGs are generated by

R a n g e

parametrization of the biased static edge voting model.

Figure 4. The exact degree distribution of the IRG generated by the biased static edge voting model with

R a n g e (3000)

parametrization.

Figure 4. The exact degree distribution of the IRG generated by the biased static edge voting model with

R a n g e (3000)

parametrization.

Figure 5. Total variational distance between the approximated and the exact degree distribution in the function of cluster numbers, where the IRG is generated using the

R a n g e (3000)

parametrization and the clusters are given by

C_{S} (V)

.

Figure 5. Total variational distance between the approximated and the exact degree distribution in the function of cluster numbers, where the IRG is generated using the

R a n g e (3000)

parametrization and the clusters are given by

C_{S} (V)

.

Table 1. Node cluster size statistics of different

L o g n o r m a l S e q

parametrizations. Size: number of nodes. Parametrization: Parameters of the lognormal sequence parametrization methods. Nr of clusters: number of node clusters. Mean: mean size of node clusters. Sd: standard deviation of node cluster size. Min; max: minimum and maximum node cluster size.

Table 1. Node cluster size statistics of different

L o g n o r m a l S e q

parametrizations. Size: number of nodes. Parametrization: Parameters of the lognormal sequence parametrization methods. Nr of clusters: number of node clusters. Mean: mean size of node clusters. Sd: standard deviation of node cluster size. Min; max: minimum and maximum node cluster size.

Size	Parametrization	Nr of Clusters	Mean	Sd	Min; Max
50	logN(1.5, 0.6, 50)	12	4.2	2.8	1; 9
100	logN(2, 0.6, 100)	21	4.6	3.4	1; 11
300	logN(2.3, 0.6, 300)	35	8.5	7.8	1; 24
500	logN(2.7, 0.6, 500)	55	9.1	8.8	1; 27
800	logN(3, 0.6, 800)	76	10.4	10.3	1; 32
1000	logN(4, 0.6, 1000)	172	5.7	4.7	1; 15
1500	logN(4.1, 0.6, 1500)	206	7.2	6.5	1; 20
2000	logN(4.3, 0.6, 2000)	257	7.7	7.0	1; 22
2500	logN(4.5, 0.6, 2500)	315	7.8	7.2	1; 22
3000	logN(5, 0.6, 3000)	482	6.1	5.2	1; 16

Table 2. Total variational distance between the approximated and the exact degree distributions when the IRG is created using

L o g n o r m a l S e q

parametrization.

Table 2. Total variational distance between the approximated and the exact degree distributions when the IRG is created using

L o g n o r m a l S e q

parametrization.

Parametrization	Gauss ( ${TV}_{n, LN}^{G}$ ))	Poisson ( ${TV}_{n, LN}^{P}$ )	Binomial ( ${TV}_{n, LN}^{B})$
logN(1.5, 0.6, 50)	0.058032	0.009510	0.002926
logN(2, 0.6, 100)	0.039668	0.006708	0.002313
logN(2.3, 0.6, 300)	0.062106	0.001709	0.000731
logN(2.7, 0.6, 500)	0.045169	0.001290	0.000548
logN(3, 0.6, 800)	0.041433	0.000954	0.000425
logN(4, 0.6, 1000)	0.011296	0.002092	0.000905
logN(4.1, 0.6, 1500)	0.013723	0.001317	0.000617
logN(4.3, 0.6, 2000)	0.012329	0.001079	0.000518
logN(4.5, 0.6, 2500)	0.010431	0.000977	0.000472
logN(5, 0.6, 3000)	0.004713	0.001222	0.000565

Table 3. Total variational distance between the approximated and the exact degree distributions when the IRG is created using

R a n g e

parametrization.

Table 3. Total variational distance between the approximated and the exact degree distributions when the IRG is created using

R a n g e

parametrization.

Parametrization	Gauss( ${TV}_{n, R}^{G}$ )	Poisson ( ${TV}_{n, R}^{P}$ )	Binomial ( ${TV}_{n, R}^{B}$ )
range(50)	0.002910	0.074889	0.021244
range(100)	0.001472	0.063056	0.016119
range(300)	0.000487	0.044869	0.011275
range(500)	0.000301	0.037659	0.009719
range(800)	0.000195	0.031871	0.008521
range(1000)	0.000159	0.029427	0.008014
range(1500)	0.000111	0.025494	0.007185
range(2000)	0.000086	0.023055	0.006656
range(2500)	0.000070	0.021346	0.006274
range(3000)	0.000060	0.020053	0.005980

Table 4. Total variational distance between the approximated and the exact degree distribution when the IRG is created using

R a n g e (3000)

parametrization and the clusters are given by

C_{S} (V)

.

Table 4. Total variational distance between the approximated and the exact degree distribution when the IRG is created using

R a n g e (3000)

parametrization and the clusters are given by

C_{S} (V)

.

Nr of Clusters	Cluster Size (S)	Gaussian ( ${p_{S}^{G} (k)}$ )	Poisson ( ${p_{S}^{P} (k)}$ )	Binomial ( ${p_{S}^{B} (k)}$ )
2	1500	0.855596	0.768407	0.840941
3	1000	0.782809	0.658750	0.762074
4	750	0.710952	0.555995	0.684635
5	600	0.646058	0.468330	0.616409
6	500	0.590810	0.395641	0.558709
10	300	0.422019	0.208654	0.397337
15	200	0.274159	0.071488	0.251296
30	100	0.054490	0.019309	0.052811
40	75	0.015959	0.019647	0.019708
50	60	0.004814	0.019797	0.010066
60	50	0.001701	0.019877	0.007355
100	30	0.000329	0.019990	0.006188
150	20	0.000158	0.020025	0.006037
300	10	0.000072	0.020046	0.005958
600	5	0.000061	0.020051	0.005974
3000	1	0.000060	0.020053	0.005980

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pethes, R.; Kovács, L. An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution. Mathematics 2023, 11, 1441. https://doi.org/10.3390/math11061441

AMA Style

Pethes R, Kovács L. An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution. Mathematics. 2023; 11(6):1441. https://doi.org/10.3390/math11061441

Chicago/Turabian Style

Pethes, Róbert, and Levente Kovács. 2023. "An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution" Mathematics 11, no. 6: 1441. https://doi.org/10.3390/math11061441

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution

Abstract

1. Introduction

2. Preliminaries

2.1. Notations and Definitions

2.2. Computing the Poisson Binomial Distribution: The DFT-CF Algorithm

2.3. Approximation of the Poisson Binomial Distribution

2.3.1. Poisson Approximation

2.3.2. Binomial Approximation

2.3.3. Gaussian Approximation

3. Materials and Methods

3.1. Problem Formulation

3.2. Computing the Degree Distribution of Inhomogeneous Random Graph

3.3. Approximation of the Degree Distribution of Inhomogeneous Random Graph

3.3.1. Approximation Using the Poisson Distribution

3.3.2. Approximation Using the Binomial Distribution

3.3.3. Approximation Using the Gaussian Distribution

4. Numerical Experiments

4.1. The ER Test

4.2. The Biased Static Edge Voting Model

4.3. The Effect of Network Size on Approximation Accuracy

4.4. The Effect of Cluster Size on Approximation Accuracy

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI