Tail Index Estimation of PageRanks in Evolving Random Graphs

Markovich, Natalia; Ryzhov, Maksim; Vaičiulis, Marijus

doi:10.3390/math10163026

Open AccessArticle

Tail Index Estimation of PageRanks in Evolving Random Graphs

by

Natalia Markovich

^1,*,

Maksim Ryzhov

¹

and

Marijus Vaičiulis

²

¹

V.A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences, 117997 Moscow, Russia

²

Institute of Data Science and Digital Technologies, Vilnius University, Akademijos St. 4, LT-08663 Vilnius, Lithuania

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(16), 3026; https://doi.org/10.3390/math10163026

Submission received: 21 June 2022 / Revised: 14 August 2022 / Accepted: 15 August 2022 / Published: 22 August 2022

(This article belongs to the Special Issue New Advances and Applications of Extreme Value Theory)

Download

Browse Figures

Versions Notes

Abstract

:

Random graphs are subject to the heterogeneities of the distributions of node indices and their dependence structures. Superstar nodes to which a large proportion of nodes attach in the evolving graphs are considered. In the present paper, a statistical analysis of the extremal part of random graphs is considered. We used the extreme value theory regarding sums and maxima of non-stationary random length sequences to evaluate the tail index of the PageRanks and max-linear models of superstar nodes in the evolving graphs where existing nodes or edges can be deleted or not. The evolution is provided by a linear preferential attachment. Our approach is based on the analysis of maxima and sums of the node PageRanks over communities (block maxima and block sums), which can be independent or weakly dependent random variables. By an empirical study, it was found that tail indices of the block maxima and block sums are close to the minimum tail index of representative series extracted from the communities. The tail indices are estimated by data of simulated graphs.

Keywords:

random graph; evolution; PageRank; max-linear model; tail index; block maximum; block sum; community; preferential attachment

MSC:

62G32

1. Introduction

The evolution of random graphs has attracted the attention of researchers due to the necessity of modeling dynamic networks in numerous applications. The randomness of such graphs consists of random numbers of incoming and outgoing links of the nodes that are called in- and out-degrees, respectively. The evolution of random graphs reflects their dynamics in time and space. The evolution can be modeled by means of preferential or clustering attachment tools, see [1,2,3,4,5,6] among others.

In [6], a superstar node is determined as a node to which a large proportion of nodes attach. We assume that a superstar node within the community has incoming links from all nodes of the community. Such nodes may not exist in the network.

Random graphs may be divided into communities that constitute weak dependent subgraphs. A community is a subset of nodes that interact with each other more than the nodes outside that particular community. The nodes within a community are more interconnected, with more edges between themselves and fewer edges with other nodes outside the community [7,8]. The community can be characterized by the conductance [7] and the modularity [9,10].

There is literature regarding the community detection in undirected graphs. These important detection algorithms are based on the theory of a random graph coloring and stochastic decomposition methods of the symmetric adjacency matrix derived from the underlying undirected graph, see [11,12,13] among others. Here, we study directed graphs with possibly non-symmetric adjacency matrices. A survey of polynomial time algorithms that optimally color random k-colorable graphs (including the sparse ones) for a fixed

k \geq 3

with high probability is given in [12]. The sparsity is governed by a parameter p that specifies the edge probability. The latter probability may be determined by the preferential attachment (PA) schemes (see Appendix D). Another important problem is to determine or estimate the chromatic number

χ (G)

of a graph G that is the minimum number of colors in a proper vertex coloring [12,13]. The probable value of

χ (G)

is roughly comparable with the average degree of the random graph and it may therefore be much higher than a fixed number of expected colors [13]. Moreover,

χ (G)

may also be related to the selection of the number of communities in the random graph. Another approach that is widely employed as a model to study the community detection is the stochastic block model, which is a random graph model with different groups of vertices connecting differently, see [11]. A theoretical study of polynomial time algorithms of the community detection is out of the scope of this paper (evolving random directed graphs are our focus). The directed Louvain algorithm is used here to detect the community structures of directed graphs in a fast way [10].

The node indices in the communities may also be heterogeneous and non-stationary distributed. Since the nodes of the graph cannot be definitely enumerated, the definition and testing of the stationarity in the graphs remain an open problem. One can determine that a graph is stationary if for all finite sets of vertices with the same adjacency matrices the joint distributions of their in- and out-degrees are the same.

The node’s importance has a philosophical implication quantified as various centrality properties. As for what all of these properties have in common—they reveal the nodes with significant impacts on the information flow of the network—so- called central nodes [8]. The in-degree, out-degree, and PageRank (PR) are well known as node influence indices. The max-linear model (MLM) can also be used as the influence measure [14].

We generate evolving directed graphs by PA schemes proposed in [6] (see, Appendix D). In [15], the number of nodes of graph

G (t)

at time t with the in-degree i and the number of nodes with the out-degree i are assumed to be linearly increasing. The slopes of the latter lines are determined by tail indices (TIs) of the in- and out-degrees, respectively, see Theorem 3.1 in [15]. The formulae for the TIs of the in- and out-degrees are specified in [6], see (4).

To the best of our knowledge, there are no theoretical results for the TI of the PR and the MLM, including the case when node and edge deletion may occur during the evolution. We will investigate the latter TI by an empirical study.

The purpose of adding/deleting edges/nodes is to simulate the evolution of networks with an intention to implement simple control policies. An example is given by different COVID restriction policies or blocking the information spreading in networks. The uniform deletion at each step of the evolution proposed in the paper is the simplest strategy. This implies that a node (or an edge) is deleted with a probability equal to

1 / n

, where n denotes the number of nodes (or edges) existing in the network for the time being. Indeed, the rules for these deletions can be broader and they are beyond the scope of this paper. The evolution with and without node and edge deletion is studied, which may model the behavior of real systems. The node deletion corresponds, for example, to the removal of a document from the WWW. Similarly, the edge deletion corresponds to the removal of URLs between web pages [3].

A well-known feature of random graphs, such as web graphs, is that the in- and out-degrees are power law distributed [16,17]. Despite the in- and out-degrees being discrete random variables (r.v.s), their distributions can be approximated by regularly varying distributions, [18]. The distribution tail of a non-negative r.v. X is called regularly varying at ∞ with the TI

α > 0

if it holds

P {X > x} = x^{- α} ℓ (x),

(1)

where the function

ℓ (x)

is called slowly varying at ∞, i.e.,

{lim}_{x \to \infty} ℓ (t x) / ℓ (x) = 1

holds for any

t > 0

. The TI

α

indicates the heaviness of the distribution tail.

The PR is accepted as a more general measure of the node influence than the in-degree [19], see more in Appendix B. The PR of a random web page is derived to be regularly varying distributed [18,20]. It is proved in [2] that the tail of the PR distribution in a directed graph is heavier than the tail of the limiting in-degree distribution. Numerous empirical studies stated that the TIs of the in-degrees and PRs of web pages have values

α

smaller than 2. Due to the properties of regularly varying distributions, this means that the variances of the node indices are infinite [21].

Our objective was to answer the following question. How could the TIs of the distribution of superstar node influence indices, such as the PRs and the MLMs, be changed during the evolution with or without node and edge deletion, depending on the parameters of the evolution schemes? To the best of our knowledge, this is still an open problem. Indeed, the deletion of an influential node on an evolution step may destroy links between nodes. The edge deletion at each step of evolution with preserving the total number of edges as the number of nodes increases leads to the increasing of isolated nodes. Both deletion strategies may cause a change of the TI of the node influence measure. In [6], a proportion of edges has been deleted only at the final stage of the evolution. The change of the TI of the PR and the MLM with respect to the PA evolution steps has not been studied to date.

To evaluate the TIs of the PRs and MLMs of superstar nodes in the evolving graphs, we use the block maxima and block sums of the node PRs built over communities.

To this end, we attract results of the extreme value theory regarding sums and maxima of the non-stationary random length sequences obtained in [22,23]. We estimate the TIs by sequences of the block maxima and block sums calculated over each community and find a minimal estimate of the TI among the so called representative series. The latter series contain nodes taken in the communities as their representatives. By the theory [22,23], the TIs of the block maxima and block sums have to be equal to the minimum TI mentioned above. Our next objective is to check this result for random graphs by simulation.

To estimate the TI, we use the well-known QQ–plot estimator proposed in [24] and the Hill estimator coupled with tools to estimate their sample fractions, see Appendix A.

Another important item involves how to delete old nodes and edges. One can delete an existed node at each step of evolution together with appending a new edge in order to fix the number of nodes in the network. This can be conducted uniformly when an existing node is deleted with an equal probability. Indeed, when a new edge is created between two existing nodes, an old node is not deleted. In the same way, one can delete an old edge, but the number of nodes increases. We also consider the evolved graphs without node and edge deletions; that is quite unrealistic in practice since “old” nodes may lose their popularity and, hence, their incoming links from newly appended nodes.

The paper is organized as follows. In Section 2 we represent methods and algorithms related to our results. In Section 3, the main results are presented. The network evolution by the PA schemes and the selection of representative series are discussed. The TIs of the block maxima and block sums of the PRs of graphs evolved by the linear PA schemes by [6] are estimated. In Section 3.3, special attention is devoted to three evolution models, namely, without node and edge deletion, with a uniform node deletion or with a uniform edge deletion during the evolution. We finalize with our conclusions in Section 4. The materials are presented in the Appendix A, Appendix B, Appendix C, Appendix D and Appendix E.

2. Related Theoretical Work: Extremal Properties of Sums and Maxima of Random Length Sequences

In [18,20], the PR R of a randomly chosen web page (i.e., a vertex of a web graph) is presented as the solution of a fixed-point problem

\begin{matrix} R \overset{d}{=} \sum_{j = 1}^{N} A_{j} R_{j} + Q, \end{matrix}

(2)

assuming that

{R_{j}}

are independent identically distributed (IID) copies of R. N models a number of incoming links to a page (the in-degree). Q determines the user preference with the expectation

E (Q) < 1

. The

A_{j}

s are assumed to be independent and distributed as some r.v. A with

E (A) < 1

.

\overset{d}{=}

denotes equality in distribution. It is stated in [18,20] that the stationary distribution of R is regularly varying and its TI is determined by the most heavy-tailed distributed term in the regularly varying distributed pair

(N, Q)

. By replacing the sum by the maximum, one can obtain similar results with regard to the MLM that is the solution of the following equation

\begin{matrix} R \overset{d}{=} (⋁_{j = 1}^{N} A_{j} R_{j}) \lor Q . \end{matrix}

(3)

In [22,23], the TIs of the sums and maxima of random length weighted non-stationary sequences were found. The latter sequences can be considered as the rows of a doubly-indexed array of r.v.s

(Y_{n, i} : n, i \geq 1)

in which the “row index” n corresponds to time, and the “column index” i enumerates the series. The “column” series are assumed to be stationary distributed with regularly varying tails at ∞ and the TIs

{k_{i}}_{i \geq 1}

. A unique “column” series is assumed to have a minimum TI in [22]. An arbitrary (but pair-wise identical) dependence between the “column” series is allowed. It was found that the TI of both sums and maxima over rows is equal to the minimum TI [22]. The same is true if there are a random number of independent or weakly dependent most heavy-tailed “column” series with the minimum TI, [23]. The results remain true if the TIs of elements in the “columns” are different, apart of those “columns” with the minimum TI [22,23].

In [23]

A_{j} R_{j}

,

j \in {1, \dots, N_{i}}

in (2) and (3) are denoted as

z_{j} Y_{i, j}

with

z_{j} = c

. By the definition of Google’s PR, it follows that

A_{j} = c / D_{j}

holds, where

D_{j}

is the out-degree of the node j and

c > 0

is a damping factor, the only parameter of the personalized PR [18]. By Lemma A.1 (iii) in [18],

Y_{i, j} = R_{j} / D_{j}

has the same tail index as

R_{j}

since

R_{j}

and

D_{j}

are assumed to be mutually independent and

E (1 / D_{j}) < 1

holds. Hence, the tail of

Y_{i, j}

is dominated by the tail of

R_{j}

.

The latter results can be applied to the sums and maxima in the right-hand sides of (2) and (3) to predict the TIs of the PR and the MLM of superstar nodes in heterogeneous evolving random graphs both of newly appending nodes and existing ones due to newly created edges between them. In terms of graphs, communities of nodes can be considered as the random length “row” sequences. The “column” series consist of nodes that are taken from each community as their representatives, see Figure 1.

We aim to check whether the TI of sums and maxima of PRs over communities is close to the minimum TI of the series of representatives. Node characteristics in the communities may be non-stationary distributed and arbitrarily dependent, which is natural in practice. The representative series with the minimum TI must be stationary distributed, as far as the representative series with larger TIs may be non-stationary distributed. The stationarity may not be fulfilled in heterogeneous random graphs. One has to test the stationarity of only representative series with the minimum TI. Another problem is to form the representative series since nodes in the graph are not enumerated.

Let us recall the constraints used in [22,23], which are important for graphs and discuss how to form the representative series.

The community (block) size is random.
Representative series may be formed as indicated in Figure 1.
Elements of the representative series with a minimum TI may be arbitrarily mutually dependent, but they have to be stationary distributed.
The number of representative series with a minimum TI is random.
The node indices in the communities may be arbitrarily dependent and non-stationary distributed.
The block maxima and block sums calculated over the communities are weakly dependent r.v.s (due to a few links between the communities), which may be non-stationary distributed.
Since the results in [22,23] are asymptotic, the number has to be large enough.

The representative series (the “column” sequences) with the minimum TI determine the TI of both block maxima and block sums built over the communities and, hence, the TI of the PRs and MLMs of the superstar nodes by theorems proved in [22,23]. The elements of the representative series may be dependent since they may contain order statistics of the node PRs within the communities (Figure 1a) or the common roots (Figure 1b). The stationarity of representative series is checked in the present paper by the Mann–Whitney U test.

3. Main Results

3.1. Evolution of Random Graphs

We consider evolving graphs that are created by the linear PA schemes developed in [6] and recalled in Appendix D. The PA schemes are used since they generate directed, fully connected graphs with self-loops and multiple edges. The TIs of the in- and out-degrees of these graphs may be calculated by formula (2.9) in [6]:

\begin{matrix} α_{i n} & = & \frac{1 + δ_{i n} (α + γ)}{α + β}, α_{o u t} = \frac{1 + δ_{o u t} (α + γ)}{β + γ} . \end{matrix}

(4)

Here,

(α, β, γ, δ_{i n}, δ_{o u t})

is a set of nonnegative parameters of the PA schemes, where

α + β + γ = 1

and

δ_{i n}, δ_{o u t} > 0

. To the best of our knowledge, the TI of the PR is not yet theoretically obtained. We estimate it semi-parametrically, e.g., as a slope of the linear approximation of the QQ–plot by (A4) or by the Hill estimator (A1).

The evolution begins from a seed graph

G^{(1)} = (V^{(1)}, E^{(1)})

that is large enough to be divided into a sufficiently large number

n_{c} (1)

of communities

{σ_{k}^{(1)}}

, i.e.,

V^{(1)} = \cup_{k = 1}^{n_{c} (1)} σ_{k}^{(1)}

. The communities are used as blocks of random lengths. To build samples of the block maxima and block sums over communities, the PRs of all nodes of the seed graph are calculated. The latter samples contain weakly dependent r.v.s since the communities are mutually weakly connected due to a few links between them. This allows evaluating TIs of the block maxima and block sums better.

We study how these TIs may be changed by the evolution of the graph and also by a deletion of nodes and edges. To this end, we continue the evolution of the graph and divide the graph into communities at each evolution step up to the Kth one. To fix the number of nodes in the graph, we remove one of the existed nodes chosen uniformly at each step of the evolution when a new edge is appended together with a new node. When the PA

β -

scheme is used, a new edge is appended between two existing nodes. In this case, we do not remove an existed node. One may delete one of the existing edges at each step of the evolution when a new edge is appended. Then the number of edges will be fixed but the number of nodes increases. This leads to an increasing number of isolated nodes. The number of edges and nodes grow up without node and edge deletion when the

α -

or

γ -

PA schemes are used. The procedure is described by Algorithm 1.

Algorithm 1 Estimation of the TI of the maxima and sums over communities

1:

Create a seed graph

G^{(1)}

with

|V^{(1)}| = n

nodes and

|E^{(1)}| > n

edges (e.g., one can take

n = 10^{4}

) by a PA tool.

2:

Evaluate the PRs of nodes of the seed graph by the iteration formula (A6).

3:

Divide the seed graph into a random number

n_{c}

of communities.

4:

Obtain samples of the

n_{c}

maxima and

n_{c}

sums of the PRs over communities.

5:

Estimate the TIs of the obtained block maxima and block sums.

6:

To evolve the graph further, create a new edge and possibly a node at the ith step of evolution (

2 \leq i \leq K

) by the PA schemes where

(i): no one node or edge is deleted:

$\begin{matrix} |E^{(i)}| = |E^{(i - 1)}| + 1, & |V^{(i)}| = |V^{(i - 1)}| + \{\begin{matrix} 1, & α o r γ schemes are used, \\ 0, & β scheme is used; \end{matrix} \end{matrix}$
(ii): one of the existing nodes is uniformly deleted at each step of the evolution when a new edge is appended according to the $α -$ or $γ -$ PA schemes:

$\begin{matrix} |E^{(i)}| = |E^{(i - 1)}| + 1 - \{\begin{matrix} O + I, & α o r γ schemes are used; \\ 0, & β scheme is used \end{matrix}, |V^{(i)}| = |V^{(i - 1)}|, \end{matrix}$

where O and I denote out- and in-degrees of the deleted node, respectively, $[\cdot]$ denotes an integer part; or
(iii): one of the existing edges is uniformly deleted irrespective of the applied PA scheme:

$\begin{matrix} |E^{(i)}| = |E^{(i - 1)}|, & |V^{(i)}| = |V^{(i - 1)}| + \{\begin{matrix} 1, & α o r γ schemes are used; \\ 0, & β scheme is used . \end{matrix} \end{matrix}$

7:

Evaluate the PRs of the newly appended nodes by (A6).

8:

Divide the consolidated graph

G^{(i)} = (V^{(i)}, E^{(i)})

at evolution step i into the random number

n_{c} (i)

of communities by the directed Louvain algorithm (see Appendix C), i.e.,

V^{(i)} = \cup_{k = 1}^{n_{c} (i)} σ_{k}^{(i)}

and repeat steps 4–7.

9:

The procedure stops after K such repeats chosen by the researcher, e.g.,

K = 3 \cdot 10^{4}

.

3.2. Representative Series Selection

The formation of the representative series as all possible trees rooted at nodes of the communities as shown in Figure 1b leads to a large number of possible combinations. We propose a more appropriate choice of the representative series by Algorithm 2. Indeed, the choice of the ith maxima within the communities as the representative series leads to the selection of the series with a larger local minimum TI. Despite such representative series being mutually dependent as order statistics, the arbitrary dependence between “column series” is allowed by the theory in [22,23].

Algorithm 2 Generation of representative series

1:: Sort the node PRs in the communities in descending order.
2:: Select the ith representative series by the ith maxima of the communities, $i \geq 2$ .
3:: Estimate TIs of the representative series and find a minimum estimate among them.
4:: Check the stationarity of the representative series with the minimum TI, e.g., by the Mann–Whitney test (see, Appendix E) and in case the null hypothesis of the stationarity cannot be accepted, check a series with the second minimum TI, etc.
5:: Compare the obtained minimum TI with the TIs of the block maxima and block sums.

3.3. Tail Index Estimation in Evolving Graphs

3.3.1. The Description of the Experiment

In our experiment, the PA schemes with sets of parameters

(α, β, γ)

and

δ_{i n} = δ_{o u t} = 1

shown in Table 1 were applied. The evolving graphs used as seed networks were generated by

n = 3 \cdot 10^{4}

PA evolution steps starting from one node. For each set of the PA parameters, we repeatedly simulated 100 graphs.

The TIs of the in- and out-degrees and the PR are estimated by the Hill and QQ–plot estimators (see, Appendix A.1 and Appendix A.2) in all tables. The parameter k in (A1) is calculated by the bootstrap method where the number of bootstrap resamples were taken equal to 100, by the eyeball method and by the KS distance method, see Appendix A.1.1, Appendix A.1.2 and Appendix A.1.3.

To check the theory recalled in Section 2, we compare the TI estimates of the block sums, block maxima, and representative series. By this theory, the representative series with the minimal TI are to be stationary distributed. Then the block sums and block maxima and, hence, the PR and MLM have the same minimum TI. Since the theoretical results are asymptotical, one can expect such results to be valid for samples of moderate sizes only in an approximate sense.

3.3.2. Evolution without a Node and Edge Deletion

Let us consider the evolving graphs without a node and edge deletion. The number of nodes grows with each step of the evolution when the PA

α -

or

γ -

schemes are applied. In Figure 2, examples of the PA evolved graphs with two sets of the PA parameters are shown. The largest parameter

β

provides a few nodes with large PRs due to the increase of multiple links between existing nodes as far as the smallest

β

and the largest

α

lead to a large number of old nodes with sufficiently large PRs due to the increasing in-degree of the latter nodes.

The empirical bias, empirical standard deviation, and the root (RMSE) of the mean squared error (MSE) of the TI estimates of the in- and out-degrees and PRs calculated over

N = 100

simulated graphs after

3 \cdot 10^{4}

evolution steps and the TIs

(α_{i n}, α_{o u t})

of the in- and out-degrees calculated by (4) are presented in Table 1, Table 2, Table 3 and Table 4. Since

(α_{i n}, α_{o u t})

are known, one may calculate the RMSEs of their estimates. For

(α_{i n}, α_{o u t})

, we use the following formulae

\begin{matrix} b i a s (\hat{α}) = \bar{\hat{α}} - α, s t d (\hat{α}) = {(\frac{1}{N} \sum_{i = 1}^{N} {({\hat{α}}_{i} - \bar{\hat{α}})}^{2})}^{1 / 2}, \bar{\hat{α}} = \frac{1}{N} \sum_{i = 1}^{N} {\hat{α}}_{i}, \\ R M S E (\hat{α}) = {(\frac{1}{N} \sum_{i = 1}^{N} {({\hat{α}}_{i} - α)}^{2})}^{1 / 2} \end{matrix}

(5)

for empirical bias, empirical standard deviation, and the RMSE, respectively. The TI of the PRs is unknown, though it is close to the TI of in-degrees [25]. Thus, the bias and RMSE bootstrap estimates of the PRs are calculated. To this aim, the bootstrap algorithm (see, Algorithm A1) is applied to select the number of largest order statistics k in the Hill and QQ–plot estimators for each simulated sample. The latter estimates are then used as unknown

α

in (5). Both RMSEs are shown in Table 1, Table 2, Table 3 and Table 4.

By Table 1, one can conclude the following:

The QQ–plot method outperforms other methods in most of the investigated cases.
It cannot be said that the empirical bias and empirical variance dominate each other for the QQ–plot method: the impacts of these empirical characteristics on the RMSE are in close correspondence.
The Hill estimator coupled with the bootstrap method also demonstrates good empirical results, but does not outperform the QQ–plot method.
The TI estimates of the PRs are close to that for the in-degree.

TI estimates of representative series for some examples of the evolving graphs are shown in Figure 3. The graphs evolved by the PA schemes with different triples

(α, β, γ)

. The representative series were formed by Algorithm 2 as the first, second, third, etc., maxima of the communities. To check the stationarity, the representative series enumerated by three different ways were divided into two parts with

m = n

(or

m = n + 1

) in (A8) for an even (or odd) number of communities. The three Mann–Whitney U statistics with the interval (A8) for each plot are shown in Figure 3. One can see that there is an i-value. Starting from this, the representative series may be considered as stationary. The stability of the U statistics for large i can be explained by similar relatively small values of PRs in the corresponding representative series. We select such a representative series that has a minimum TI estimate and the corresponding U test statistics falls into the interval (A8).

Comparing Table 2, Table 3 and Table 4, one may conclude that the block maxima and block sums have the TI estimates that are close to the minimum TI estimate of the representative series.

The TIs of the in-degrees and PRs are close. The results of Table 1, Table 2, Table 3 and Table 4 are visualized in Figure 4 and Figure 5. In Figure 4, the TI estimates of the block sums and block maxima against the minimum TI estimate of the representative series provide a diagonal trend.

To find a degree to which two r.v.s in Figure 4 are linear dependent, we calculated the empirical correlations of n observed pairs

(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})

by the formula [26]

\begin{matrix} ρ (X, Y) & = & \frac{m_{x y}}{s_{x} s_{y}}, \end{matrix}

(6)

where

\begin{matrix} s_{x}^{2} & = & \frac{1}{n} \sum_{i = 1}^{n} X_{i}^{2} - {(\frac{1}{n} \sum_{i = 1}^{n} X_{i})}^{2}, s_{y}^{2} = \frac{1}{n} \sum_{i = 1}^{n} Y_{i}^{2} - {(\frac{1}{n} \sum_{i = 1}^{n} Y_{i})}^{2} \end{matrix}

are the empirical variances,

\begin{matrix} m_{x y} & = & \frac{1}{n} (\sum_{i = 1}^{n} X_{i} Y_{i} - \frac{1}{n} \sum_{i = 1}^{n} X_{i} \sum_{i = 1}^{n} Y_{i}) \end{matrix}

is the empirical covariance, see Table 5. If

ρ (X, Y) = \pm 1

, then the r.v.s X and Y are perfectly linear dependent and vice versa. If X and Y are independent, then

ρ (X, Y) = 0

, but the opposite statement is not necessarily true.

By Table 5, one can conclude the following. The correlations between the TI estimates of both pairs

(“ m x ”, “ r e p ”)

and

(“ s m ”, “ r e p ”)

are similar and close to 1. The exception is the case

(α, β, γ) = (0.2, 0.3, 0.5)

. Note that for the symmetrical case

(0.5, 0.3, 0.2)

, the correlations are nearly perfectly linear. It is notable that the correlations decrease with the increasing of the parameter

γ

.

In Figure 5, the results in the tables are shown by quantile intervals. The sample size of the maxima, sums, and representative samples is equal to the number of communities (about 100–250). That is much smaller than the number of nodes in the graph

3 \cdot 10^{4}

. Hence, their MSEs may be larger than the MSEs of the PRs estimated by the whole graph. To decrease the MSE, additional communities taken from past evolution steps were used for the estimation. Thus, samples may contain about 1000 elements, which allows to reduce the variance of the TI estimation.

In Figure 5 (left column), the maxima and sums have similar estimates of the TI for most of the estimators. In contrast, the eyeball estimates show large variations for all considered sets of the PA parameters. This can be seen by the MSE results in Figure 5 (right column).

Figure 6 confirms again that the TI estimates of the block sums, block maxima and the minimum TI estimate of the representative series are close. Note, that larger

γ

values lead to larger proportions of the outgoing links from existing nodes to new ones and to larger TI estimates. The

α_{i n} = α_{o u t}

(see upper line in Figure 6) causes more stable TIs as the evolution steps increase.

3.3.3. Evolution with the Uniform Node Deletion

The evolution without the node and edge deletion leads to other TIs, and another number of communities and their sizes than those ones with a node or edge deletion.

Let us consider the PA evolution, such that a node is uniformly deleted from the network when a new node is appended to the network. This may occur if the

α -

and

γ -

schemes are applied. If the

β

-scheme is in action then a new edge is created between two existing nodes and no nodes are deleted.

In [3], a uniform node deletion was considered in undirected graphs. We consider directed graphs with possible self-loops and multiple edges where

1 - β

denotes the node removal probability. We study the PRs instead of node degrees.

Here and in the next section, we start the evolution with the seed graph with

3 \cdot 10^{5}

nodes. In Figure 7, the PA evolution with uniform node deletion for two sets of PA parameters is considered. There are many isolated nodes that have no links to the main part of the graph with highly connected nodes when the removal probability

1 - β

is large, see Figure 7 (right). In contrast, the number of isolated nodes is relatively small when

1 - β

is small, Figure 7 (left). The isolated nodes are considered as communities containing single nodes. Their PRs are equal to a constant

(1 - c) / n

by (A5), where

q_{i} = 1 / n

for all i. Hence, the distribution of such PRs is discontinuous since it contains a probability mass in the point

(1 - c) / n

.

We exclude the isolated nodes before dividing the graphs into communities to estimate the TIs of the maxima, sums and representative series since the Hill and QQ–plot estimators require continuous distributions. In the same way as in Section 3.3.2, we compare the PR TIs of the sequences of the sums, maxima, and representative series over communities with excluded isolated nodes in Figure 8. The stationary representative series with the minimum TI estimates are selected. The sums and maxima have similar estimates of the TIs as the representative series.

By Table 6, it follows that the correlations are generally smaller than in Table 5. In the same way, larger values of

γ

lead to smaller correlations.

Figure 9 shows that the TIs of the block maxima and sums the minimum TIs of the representative series are comparable apart from the case with

β = 0.2

. Due to the small

β

, not many additional edges between existing nodes are created. By an evolution with the node deletion, the TIs become larger and the distribution tails lighter.

The uniform node deletion impacts the number and sizes of the communities, namely, the number increases and the size decreases, see Figure 10 (middle). The node deletion leads to larger values of PR TIs and, thus, to lighter distribution tails.

3.3.4. Evolution with an Uniform Edge Deletion

In [6], a random proportion of edges is deleted at the final time of the evolution. It is shown that with the deletion of some parts of the edges, the maximum likelihood estimates of the TIs of in- and out-degrees remain approximately the same. In contrast, we consider the PA evolution with a permanent uniform edge deletion when a new edge is appended. This may dramatically destroy the graph structure if an influential node with a large degree is deleted. The total number of edges remains constant.

In Figure 10, one can see the difference between the three cases: the PA evolution without a node and edge deletion, with the uniform node deletion, and with the uniform edge deletion. The number of communities grows up approximately linearly for the edge deletion. The smaller

β

becomes, the larger the slope of the line. The

β

value impacts in this case on the number of communities and their sizes. This outcome occurs since the number of isolated nodes grows up linearly by the evolution in contrast to the case of the node deletion, see Figure 11.

An isolated node is considered a community. Hence, the growth of the number of isolated nodes (see, Figure 11) leads to a large number of communities, Figure 10 (right). The isolated nodes are not created by the evolution without node and edge deletion, and in this case, the number of communities increases logarithmically slow. The node deletion leads to a stable number of communities and their sizes for a large enough number of steps of the evolution. This outcome is caused by the deletion of highly connected superstar nodes with a large number of edges. The edge deletion causes the decrease of the TI estimates of PRs to a stable value that is invariant on the evolution parameters, see Figure 10 (right). The latter value is close to the TI of the in-degrees, which is not shown here. The same result is valid for the PA evolution with the node deletion, Figure 10 (middle).

In Figure 12, the PA evolution with two sets of parameters is considered. As in Section 3.3.3, one can see that the larger the removal probability

1 - β

is, the more there are isolated nodes. Really, the small

β

leads to appending new nodes with a few edges because the total number of edges remains the same with the edge deletion.

Since the edge deletion leads to the graph degeneration to isolated nodes considered as communities, the TI estimates of the block sums and block maxima look similar in Figure 13. The TI estimates of the block sums, block maxima, and the representative series do not tend to a definitely linear trend as in Figure 8.

The correlations in Table 7 are smaller than in Table 5 and Table 6. In contrast to the latter cases without node and edge deletion and with node deletion, the largest

α = 0.5

leads to the smallest correlations, i.e., to the weakest linear dependence measures.

In Figure 14, the plots of the TI estimates of the block sums and block maxima coincide due to a large number of degenerated communities consisting of isolated nodes and tend to the TI estimates of the representative series as the evolution steps enlarge. The in-degree TIs are comparable with the latter TI estimates on sufficiently large steps of the evolution if the value of

γ

is large, Figure 14a,d. For small

γ

, the TI estimates are close at the beginning of the evolution and they grow in a logarithmic manner, Figure 14b,c. This means that the distribution tails of the considered PRs tend to be lighter by the evolution.

4. Conclusions and Discussion

The dynamics of the TIs of the PR and MLM by an evolution was studied. The evaluation of the TIs of the PRs and MLMs of superstar nodes in random graphs using results of extreme value theory are proposed. The superstar nodes constitute an extremal part of the graph. Our conclusions are based on a simulation study.

To this end, the graph is divided into communities considered as random-length blocks. The block sums and block maxima of the PRs over communities may serve as influence indices of the superstar nodes. A superstar node within the community is assumed to have incoming links from all nodes of the community. Such a superstar node may be artificial.

It is known that the TIs of the PR and the in-degree are close in ’unevolved’ networks. We found that this is also true for the PA evolved network. It follows from the results of the extreme value theory obtained in [22,23] that the TIs of the PRs and MLMs of the superstars are similar and they may be approximated by the minimum TIs of the PRs among series constructed by representative nodes of the communities. Generally, the representative series can be formed by taking one of the nodes of a community as a representative. To avoid numerous combinations, the ith representative series is chosen by the ith PR maxima within each community. Since communities have a random size, some nodes may fall into several representative series, leading to their dependence. The latter does not contradict the assumptions in [22,23]. A random number of the representative series may have a minimum TI or the value close to this.

The advantage of our approach is that the communities constitute weakly connected subgraphs and, hence, the r.v.s of the representative series are weakly dependent, which is mostly required for the TI estimation. The disadvantage is that the number of communities may be too moderate to estimate the TIs by the block maxima and sums precisely. By the theory [22,23], the representative series with minimum TIs have to be stationary distributed with regularly varying tails. The r.v.s of the rest of the representative series may be distributed with different TIs larger than the minimum one.

The evolution without node and edge deletion, with the uniform node deletion, and the uniform edge deletion was studied. By our empirical study, the following conclusions can be made:

(i)

Regarding the TI estimates of the PRs in the network:

(a): The TI estimates of the PRs tend to stable values (i.e., to the TI of the in-degree) by an evolution with rates depending on the deletion: (a) the TI estimates increase with a slow rate for an evolution without node and edge deletion; (b) the TI estimates increase with a faster rate for an evolution with a uniform node deletion; (c) the TI estimates decrease for an evolution with a uniform edge deletion.
(b): The edge deletion constitutes a specific case. The TI estimates of the PRs reach the same stable value by an evolution irrespective of the PA parameters; the tail of the PRs becomes heavier in contrast to the other cases without deletion and with a uniform node deletion.

(ii)

Regarding the TI estimates of the block maxima, block sums and representative series:

(a): The TI estimates of the block maxima and block sums are close to the minimum TI estimate of the representative series for the PA evolution without node and edge deletion. This is in agreement with the theory [22,23]. This closeness weakens when nodes or (especially) edges are deleted.
(b): The TI estimates of the block maxima, block sums, and the minimum TI estimate of the representative series tend to decrease during the PA evolution without node and edge deletion. This implies that the distribution tails of the PRs and MLMs of superstar nodes become heavier due to the “rich-get-richer” principle.
(c): The TI estimates of the block maxima, block sums, and the minimum TI estimate of the representative series tend to increase faster for the PA evolution with uniform node deletion than for the evolution with uniform edge deletion. For both cases, the distribution tails of the PRs and MLMs of superstar nodes become lighter by the evolution.
(d): The TI estimates of the block maxima and block sums of the PRs are smaller, larger, and similar to the TIs of the in-degrees for the PA evolution without node and edge deletion, with node deletion, and with edge deletion, respectively. This implies that the distribution tail of superstar nodes become heavier if nodes and edges are not deleted rather than the tail of the in-degree of the ’unevolved’ graph. The superstars become “richer” and rarer. When nodes and (especially) edges are deleted, the extremal part of the graph degradates since the evolution does not impact its tail. The superstars do not become “richer” and rarer.

The simulation is a first step to open new tendencies of the extremal properties in the graphs that evolve with node or edge deletions. This is an innovative aspect that requires a development of theory, which is not available yet. This is a topic for future mathematical research.

Author Contributions

Conceptualization, methodology, formal analysis, writing—original draft preparation N.M.; software, validation, visualization M.R.; writing—review and editing M.V. All authors have read and agreed to the published version of the manuscript.

Funding

The reported study was funded by the Russian Science Foundation RSF, project number 22-21-00177 (recipient N.M. Markovich, conceptualization, methodology development, formal analysis, writing–original draft preparation; recipient M.S.Ryzhov, software, data validation).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

IID	independent identically distributed
TI	tail index
PR	PageRank
MLM	max-linear model
PA	preferential attachment
MSE	mean squared error
RMSE	square root of MSE
KS	Kolmogorov–Smirnov
WWW	world wide web
URL	uniform resource locator

Appendix A. Tail Index Estimators

Appendix A.1. The Hill Estimator

Let

X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(n)}

denote the order statistics of the sample

X_{1}, X_{2}, \dots, X_{n}

. One of the first estimators (and the most popular one) for the TI

α > 0

is the Hill estimator (see [27])

\begin{matrix} {\hat{α}}_{n, k} & = & {(\frac{1}{k} \sum_{i = 1}^{k} ln X_{(n - i + 1)} - ln X_{(n - k)})}^{- 1}, \end{matrix}

(A1)

where a so-called sample fraction k is a natural number satisfying

1 \leq k \leq n - 1

.

It is well known that the Hill estimator, as well as other semi-parametric estimators, are sensitive to the choice of the sample fraction. As noted in [28], the variance of the estimate

{\hat{α}}_{n, k}

is large when k is small and the use of a large k leads to an asymptotic bias in the estimation of the parameter

α

. Thus, a balance between the variance and bias components is important in the applications of the TI estimation, because this may minimize the MSE. Next, we shortly recall some methods for choosing k, namely, a bootstrap method that minimizes the MSE [21], an eye-ball method [29] and the KS method [6].

The Hill estimator is biased. As a modification to reduce a bias, the generalized Jackknife estimator is proposed in [30]. In [31], the k is selected by minimizing of a smooth asymptotic MSE (SAMSEE), where the latter bias-reduced estimator is used as a preliminary estimate assuming that observations are IID.

The consistency of the Hill estimator for the node influence indices is not evident due to a possible complex dependence between the nodes. The Hill estimator is consistent for in- and out-degrees generated by the linear PA schemes [32,33]. However, its consistency for the PR TI and generally for non-IID network data is not justified rigorously.

Appendix A.1.1. Bootstrapping

Let us recall the bootstrap method to select k for (A1) in Algorithm A1 [21]. The idea of the bootstrap method is to replace the MSE

E {(\hat{α} - α)}^{2}

with an unknown

α

by its bootstrap analogue with the estimation of

α

by an empirical average of the TI estimates built over bootstrap resamples. In Algorithm A1, we use

c_{1} = 1 / 2

and

c_{2} = 2 / 3

, as it is recommended in [34].

Algorithm A1 Bootstrap method

1:: Generate resamples ${X_{1}^{(b)}, \dots, X_{n_{1}}^{(b)}}$ , $b = 1, 2, \dots, B$ of size $n_{1} < n$ with replacements from the original observations $X_{1}, \dots, X_{n}$ , where $n_{1}$ is defined as

$n_{1} = [n^{c_{1}}], 0 < c_{1} < 1 .$

The number of the largest order statistics $1 \leq k_{1} \leq n_{1} - 1$ corresponding to any resample relates to k and n as

$k = [k_{1} {(\frac{n}{n_{1}})}^{c_{2}}], 0 < c_{2} < 1,$

(A2)

where $[\cdot]$ denotes the integer part.
2:: By using the Hill estimator ${\hat{α}}_{n, k}$ , obtain estimates ${\hat{α}}_{n_{1}, k_{1}}^{(b)}$ , $k_{1} = 1, 2, \dots, n_{1} - 1$ for each B resample.
3:: Calculate the MSEs by resamples,

$\begin{matrix} M S E (n_{1}, k_{1}) = {(b i a s (n_{1}, k_{1}))}^{2} + v a r (n_{1}, k_{1}), k_{1} = 1, 2, \dots, n_{1} - 1, \end{matrix}$

where the bias and variance are the following

$b i a s (n_{1}, k_{1}) = \frac{1}{B} \sum_{b = 1}^{B} {\hat{α}}_{n_{1}, k_{1}}^{(b)} - {\hat{α}}_{n, k},$

$v a r (n_{1}, k_{1}) = \frac{1}{B - 1} \sum_{b_{1} = 1}^{B} {(\frac{1}{B} \sum_{b_{2} = 1}^{B} {\hat{α}}_{n_{1}, k_{1}}^{(b_{2})} - {\hat{α}}_{n_{1}, k_{1}}^{(b_{1})})}^{2},$

and find $k_{1}^{*} = {argmin}_{1 \leq k_{1} \leq n_{1} - 1} M S E (n_{1}, k_{1}) .$
4:: Using the obtained $k_{1}^{*}$ find the optimal $k^{*}$ by (A2) and then the corresponding estimate ${\hat{α}}_{n, k^{*}}$ by (A1).

Appendix A.1.2. Eyeball Method

The Hill-plot

{\hat{α}}_{n, k}

against k constitutes a simplest visual method to select the estimate of the TI

α

such that it corresponds to a stability interval of the plot. By the eyeball method [29], the first stability interval is found using a moving window. Then the following number of the largest order statistics is selected:

\begin{matrix} k_{e y e}^{*} & = & min \{k \in {2, \dots, n^{+} - ω} | h < \frac{1}{ω} \sum_{i = 1}^{ω} 1 {| {\hat{α}}_{n, k + i} - {\hat{α}}_{n, k} | < ε}\}, \end{matrix}

where

1 {A}

is a notation of the indicator function of the event A. Here,

ω

is the size of a moving window, e.g., this can be

1 %

of the full sample. Moreover,

1 \dots

is an indicator function that takes the value 1 when the argument is true. Furthermore,

n^{+}

is generally the number of positive observations in the data. In our context, n is the sample size. No less than

h %

of the estimates should be within the bounds

{\hat{α}}_{n, k} \pm ε

. One can take

h = 90 %

and

ε = 0.3

.

Appendix A.1.3. The KS Method

The KS method is based on minimizing the Kolmogorov–Smirnov (KS) distance between the empirical distribution function and a power law distribution with index

{\hat{α}}_{i n} (k)

[6]

\begin{matrix} D_{k} & = & sup_{y \geq 1} | \frac{1}{k} \sum_{j = 1}^{k} 1 (I_{(j)} / I_{(k + 1)} > y) - y^{- {\hat{α}}_{i n} (k)} |, 1 \leq k \leq n - 1 . \end{matrix}

Here,

{\hat{α}}_{i n} (k)

is calculated by the Hill estimate using the in-degree data

{I_{i}}_{1 \geq i \leq n}

and their order statistics. The optimal

k^{*}

is the value that minimizes the KS distance

\begin{matrix} k^{*} & = & arg min_{1 \leq k \leq n} D_{k} . \end{matrix}

The KS method may be applied to the out-degree and PR data.

Appendix A.2. QQ–Plots for the Tail Index Estimation

The classical QQ–plot is used to check a distribution model to the underlying data. It can be used to fit the TI by the data that are distributed by (1), [24,35]. To this end, we use Proposition 4.1 in [24]. Suppose we have a random sample

X_{1}, X_{2}, \dots, X_{n}

from the distribution satisfying (1) and

X_{(1)} \geq X_{(2)} \geq \dots \geq X_{(n)}

are the corresponding order statistics. By using the notation introduced in Appendix A.1, we define the set of points

\begin{matrix} S_{n} & = & \{(- log \frac{j}{n + 1}, log X_{(j)}), j \in {1, 2, \dots, k}\} \end{matrix}

(A3)

such that

k = k (n) \to \infty

and

k / n \to 0

as

n \to \infty

. Proposition 4.1 in [24]

S_{n}

converges in probability to

\begin{matrix} T & = & \{(x, \frac{x}{α}); 0 \leq x < \infty\} \end{matrix}

as

n \to \infty

. The slope

c = 1 / α

of the least squares line through the points

({x_{i}, y_{i}}, 1 \leq i \leq n)

is

\begin{matrix} c & = & \frac{n \sum_{i = 1}^{n} x_{i} y_{i} - \sum_{i = 1}^{n} x_{i} \sum_{j = 1}^{n} y_{i}}{n \sum_{i = 1}^{n} {(x_{i})}^{2} - {(\sum_{i = 1}^{n} x_{i})}^{2}} . \end{matrix}

Hence, a QQ–plot estimator may be calculated through points (A3) using the least squares regression line is

\begin{matrix} {\hat{α}}_{n, k} & = & \frac{k \sum_{i = 1}^{k} {(log (\frac{i}{n + 1}))}^{2} - {(\sum_{i = 1}^{k} log (\frac{i}{n + 1}))}^{2}}{(\sum_{i = 1}^{k} log (\frac{i}{n + 1})) (\sum_{j = 1}^{k} log (X_{(j)})) - k \sum_{i = 1}^{k} log (\frac{i}{n + 1}) log (X_{(i)})} . \end{matrix}

(A4)

To select k, we use Algorithm A1, where the estimate obtained by the QQ–plot is applied instead of the Hill estimate.

Appendix B. Calculation of the PageRank

Let

G = (V, E)

be the directed graph of a scale-free web network with

n = | V |

vertices

v \in V

, i.e., web pages, and a growing number

| E |

of edges

e \in E

, i.e., hyperlinks among these pages.

The in-degree remains the simplest measure of the node influence due to its availability as gathered statistics. The PR is accepted as a popular basic measure to rank the web pages that are provided by a search engine after a request to the web. Google’s PR vector

R = {(R_{1}, \dots, R_{n})}^{T} \in {(0, \infty)}^{n}

[19] is the unique solution of the following system of linear equations

\begin{matrix} R_{i} & = & c \sum_{j : (j, i) \in E} \frac{R_{j}}{D_{j}} + (1 - c) q_{i}, i \in {1, \dots, n} . \end{matrix}

(A5)

The sum takes over a number of pages j with incoming links to page i (in-degree),

D_{j}

is the number of outgoing links of page j (out-degree), and

c \in (0, 1)

is a damping factor, which was originally set equal to

0.85

by Google. Moreover,

q = (q_{1}, q_{2}, \dots, q_{n})

is a personalization probability vector, such that

0 \leq q_{i} \geq 1

and

\sum_{i = 1}^{n} q_{i} = 1

.

To calculate the PR

R_{i}

of a randomly chosen page

v = i \in V

in a web graph

G = (V, E)

one can use the following iteration

\begin{matrix} {\hat{R}}_{i}^{(n, 0)} = 1, {\hat{R}}_{i}^{(n, k)} = \sum_{j \to i} \frac{c}{D_{j}} {\hat{R}}_{j}^{(n, k - 1)} + (1 - c), k \in N, \end{matrix}

(A6)

proposed in [36] for a given uniform personalization vector

q_{i} = 1 / n, 1 \leq i \leq n = | V |

. The iteration (A6) proceeds until the difference between two consecutive iterations

| {\hat{R}}_{i}^{(n, k)} - {\hat{R}}_{i}^{(n, k - 1)} |

is small enough, which is sufficient for a moderate number of iterations k. Here,

j \to i

implies that node j is linked to node i, i.e.,

(j, i) \in E

. We will further use this iteration method.

Another important method is the mean field approach [25]. Web pages can be aggregated in the degree classes according to their degrees

k \equiv (k_{i n}, k_{o u t})

, where

k_{i n}

and

k_{o u t}

denote in- and out-degrees, respectively, see [25]. By the mean field approach, the PR of the degree class is calculated as an average of PRs of the nodes belonging to this degree class.

Appendix C. Directed Louvain Algorithm

Directed Louvain Algorithm To divide a graph into non-overlapping and weakly connected communities, one can use a directed Louvain algorithm [10]. The latter is an improvement of the well-known Louvain Algorithm. The method is based on the maximization of the modularity

\begin{matrix} Q = \frac{1}{m} \sum_{i j} (A_{i j} - \frac{k_{o u t} (i) k_{i n} (j)}{m}) δ (σ_{i}, σ_{j}) . \end{matrix}

Here,

m = | E |

is a number of edges in the directed graph,

k_{o u t} (i)

and

k_{i n} (j)

refer to the out-degree and in-degree of node i, respectively. Moreover,

(A)

denotes an adjacency matrix, where

A_{i j}

refers to an edge from node i to node j. Furthermore,

δ (σ_{i}, σ_{j})

are equal to 1 when nodes i and j belong to the same community, i.e.,

σ_{i} = σ_{j}

holds.

The algorithm is convenient for large directed networks. In the paper, we use this algorithm to divide graphs into communities.

Appendix D. Linear Preferential Attachment

Linear Preferential Attachment Let us recall schemes of the linear PA developed in [6]. The schemes create new directed edges. The edge can be created as a link directed from a newly appended node to an existing node (the

α -

scheme), from an existing node to a newly appending (the

γ -

scheme), or an edge between two existing nodes (the

β -

scheme).

The types of the schemes are selected by flipping a three-sided coin with probabilities

α

,

β

and

γ

, such that

α + β + γ = 1

. To this end, an IID sequence of trinomial r.v.s with values 1, 2, and

3,

and the corresponding probabilities

α

,

β

, and

γ

are generated.

Let

G (n) = (V (n), E (n))

be a graph at the nth step of the evolution. The edge

v \to w \equiv (v, w) \in E (n)

directed from the new nodes

v \in V (n) ∖ V (n - 1)

,

N (n) = | V (n) |

to an existing node

w \in V (n - 1)

is created with probability

α

. The existing node

w \in V (n - 1)

is chosen with probability

\begin{matrix} {(P_{α})}_{v, w} = P {v \to w} & = & \frac{I_{n - 1} (w) + δ_{i n}}{n - 1 + δ_{i n} N (n - 1)} \end{matrix}

by the

α -

scheme.

An edge

(v, w)

is added to

E (n - 1)

with probability

β

and the existing nodes

v \in V (n - 1) = V (n)

,

w \in V (n - 1)

are chosen independently from

G (n - 1)

with probability

\begin{matrix} {(P_{β})}_{v, w} & = & P {v \to w} = (\frac{I_{n - 1} (w) + δ_{i n}}{n - 1 + δ_{i n} N (n - 1)}) (\frac{O_{n - 1} (v) + δ_{o u t}}{n - 1 + δ_{o u t} N (n - 1)}) \end{matrix}

by the

β -

scheme.

An edge

(w, v)

from the existing nodes

w \in V (n - 1)

to

v \in V (n) ∖ V (n - 1)

is created and the node w is chosen with probability

\begin{matrix} {(P_{γ})}_{w, v} = P {w \to v} & = & \frac{O_{n - 1} (w) + δ_{o u t}}{n - 1 + δ_{o u t} N (n - 1)} \end{matrix}

by the

γ -

scheme.

The TIs of both in- and out-degrees are given by formula (2.9) in [6] and recalled by (4). The latter formulae allow to calculate the TIs precisely when the PA parameters

(α, β, γ, δ_{i n}, δ_{o u t})

are known. The parameters

δ_{i n}, δ_{o u t}

can be estimated by the semi-parametric extreme value method (EV) based on the maximum likelihood method or by the snapshot (SN) method proposed in [37].

Appendix E. Stationarity Tests

Stationarity Tests To check a stationarity of an underlying sample, we will apply the Mann–Whitney U test. The null hypothesis is that r.v.s from two parts of the representative series have equal distributions. Note that the r.v.s in the representative series can be considered as weakly dependent (because so are the communities). The representative series are divided into two sub-samples.

Let us recall that the Mann–Whitney U statistic is defined as

\begin{matrix} U & = & \sum_{i = 1}^{n} \sum_{j = 1}^{m} S (X_{i}, X_{j}), \end{matrix}

(A7)

with

\begin{matrix} S (X, Y) & = & \{\begin{matrix} 1, & X > Y, \\ 1 / 2, & X = Y, \\ 0, & X < Y . \end{matrix} \end{matrix}

Here,

X_{1}, \dots, X_{n}

and

Y_{1}, \dots, Y_{m}

are assumed to be independent identically distributed r.v.s of the first and second samples, respectively. Since for large samples the U statistic is approximately normal distributed, the critical area for the null hypothesis is defined as

\begin{matrix} U & \leq & m_{U} - σ_{U} \cdot z, \\ U & \geq & m_{U} + σ_{U} \cdot z, \end{matrix}

where the mean and standard deviation of U are

\begin{matrix} m_{U} & = & \frac{m n}{2}, σ_{U} = \sqrt{\frac{m n (m + n + 1)}{12}}, \end{matrix}

respectively. For the

5 %

critical level, the normal quantile

z = 1.96

is taken. If the U-value falls into the interval,

\begin{matrix} (m_{U} - σ_{U} \cdot z, m_{U} + σ_{U} \cdot z), \end{matrix}

(A8)

then the null hypothesis is not rejected.

References

Bagrow, J.P.; Brockmann, D. Natural Emergence of Clusters and Bursts in Network Evolution. Phys. Rev. X 2013, 3, 021016. [Google Scholar] [CrossRef] [Green Version]
Banerjee, S.; Olvera-Cravioto, M. Pagerank asymptotics on directed preferential attachment networks. arXiv 2021, arXiv:2102.0889. [Google Scholar] [CrossRef]
Ghoshal, G.; Chi, L.; Barabási, A.L. Uncovering the role of elementary processes in network evolution. Sci. Rep. 2013, 3, 2920. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Krapivsky, P.L.; Redner, S. Organization of growing random networks. Phys. Rev. E 2001, 63, 066123. [Google Scholar] [CrossRef] [Green Version]
Norros, I.; Reittu, H. On a conditionally poissonian graph process. Adv. Appl. Prob. (SGSA) 2006, 38, 59–75. [Google Scholar] [CrossRef]
Wan, P.; Wang, T.; Davis, R.A.; Resnick, S.I. Are extreme value estimation methods useful for network data? Extremes 2020, 23, 171–195. [Google Scholar] [CrossRef] [Green Version]
Leskovec, J.; Lang, K.J.; Dasgupta, A.; Mahoney, M.W. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Math. 2009, 6, 29–123. [Google Scholar] [CrossRef] [Green Version]
Mester, A.; Pop, A.; Mursa, B.-E.-M.; Grebla, H.; Diosan, L.; Chira, C. Network Analysis Based on Important Node Selection and Community Detection. Mathematics 2021, 9, 2294. [Google Scholar] [CrossRef]
Clauset, A.; Newman, M.E.; Moore, C. Finding community structure in very large networks. Phys. Rev. E 2004, 70, 066111. [Google Scholar] [CrossRef] [Green Version]
Dugué, N.; Prerz, A. Directed Louvain: Maximizing Modularity in Directed Networks. [Research Report] Université d’Orléans. hal-01231784. 2015. Available online: http://dx.doi.org/10.13140/RG.2.1.4497.0328 (accessed on 20 November 2021).
Abbe, E. Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 2017, 18, 6446–6531. [Google Scholar] [CrossRef]
Alon, N.; Kahale, N. A Spectral Technique for Coloring Random 3-Colorable Graphs. SIAM J. Comput. 1997, 26, 1733–1748. [Google Scholar] [CrossRef] [Green Version]
Coja-Oghlan, A.; Krivelevich, M.; Vilenchik, D. Why Almost All k-Colorable Graphs Are Easy. In Annual Symposium on Theoretical Aspects of Computer Science; Thomas, W., Weil, P., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar] [CrossRef]
Markovich, N.M.; Ryzhov, M.; Krieger, U.R. Nonparametric analysis of extremes on web graphs: Pagerank versus max-linear model. Commun. Comput. Inf. Sci. 2017, 700, 13–26. [Google Scholar]
Bollobás, B.; Borgs, C.; Chayes, J.; Riordan, O. Directed Scale-Free Graphs. In SODA ’03: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2003; pp. 132–139. [Google Scholar] [CrossRef]
Newman, M.E.J. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 2005, 46, 323–351. [Google Scholar] [CrossRef] [Green Version]
Newman, M.E.J. Networks: An Introduction, 2nd ed.; Oxford University Press: Oxford, UK, 2018. [Google Scholar]
Volkovich, Y.; Litvak, N. Asymptotic analysis for personalized web search. Adv. Appl. Probab. 2010, 42, 577–604. [Google Scholar] [CrossRef]
Brin, S.; Page, L. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. Isdn Syst. 1998, 30, 107–117. [Google Scholar] [CrossRef]
Jelenkovic, P.R.; Olvera-Cravioto, M. Information ranking and power laws on trees. Adv. Appl. Probab. 2010, 42, 1057–1093. [Google Scholar] [CrossRef]
Markovich, N. Nonparametric Analysis of Univariate Heavy-Tailed Data; Wiley: New York, NY, USA, 2007. [Google Scholar]
Markovich, N.M.; Rodionov, I.V. Maxima and sums of non-stationary random length sequences. Extremes 2020, 23, 451–464. [Google Scholar] [CrossRef]
Markovich, N.M. Extremes of Sums and Maxima with Application to Random Networks. In Proceedings of the 5th International Conference on Stochastic Methods 2020 ICSM5, Moscow, Russia, 23–27 November 2020; pp. 107–120. [Google Scholar] [CrossRef]
Das, B.; Resnick, S.I. QQ Plots, Random Sets and Data from a Heavy Tailed Distribution. Stoch. Model. 2008, 24, 103–132. [Google Scholar] [CrossRef] [Green Version]
Fortunato, S.; Bofuñá, M.; Flammini, A.; Menczer, F. On Local Estimations of PageRank: A Mean Field Approach. Internet Math. 2007, 4, 245–266. [Google Scholar] [CrossRef] [Green Version]
Smirnov, N.V.; Dunin-Barkovsky, I.V. Course of Probability Theory and Mathematical Statistics for Technical Applications; Nauka: Moscow, Russia, 1965. (In Russian) [Google Scholar]
Hill, B.M. A simple general approach to inference about the tail of a distribution. Ann. Statist. 1975, 3, 1163–1174. [Google Scholar] [CrossRef]
Danielsson, J.; de Haan, L.; Peng, L.; de Vries, C.G. Using a Bootstrap Method to Choose the Sample Fraction in Tail Index Estimation. J. Multivar. Anals. 2001, 76, 226–248. [Google Scholar] [CrossRef] [Green Version]
Danielsson, J.; Ergun, L.M.; de Haan, L.; De Vries, C. Tail Index Estimation: Quantile Driven Threshold Selection. SSRN Electron. J. 2016. [Google Scholar]
Gomes, I.; Martins, J.; Neves, M. Alternatives to a Semi-Parametric Estimator of Parameters of Rare Events—The Jackknife Methodology. Extremes 2000, 3, 207–229. [Google Scholar] [CrossRef]
Schneider, L.F.; Krajina, A.; Krivobokova, T. Threshold selection in univariate extreme value analysis. Extremes 2021, 24, 881–913. [Google Scholar] [CrossRef]
Wang, T.; Resnick, S.I. Consistency of Hill estimators in a linear preferential attachment model. Extremes 2019, 22, 1–28. [Google Scholar] [CrossRef] [Green Version]
Wang, T.; Resnick, S.I. Degree growth rates and index estimation in a directed preferential attachment model. Stoch. Process. Their Appl. 2020, 130, 878–906. [Google Scholar] [CrossRef] [Green Version]
Hall, P. Using the Bootstrap to Estimate Mean Squared Error and Select Smoothing Parameter in Nonparametric Problems. J. Multivar. Anal. 1990, 32, 177–203. [Google Scholar] [CrossRef] [Green Version]
Kratz, M.; Resnick, S.I. The qq-estimator and heavy tails. Stoch. Model. 1996, 12, 699–724. [Google Scholar] [CrossRef] [Green Version]
Chen, N.; Litvak, N.; Olvera-Cravioto, M. PageRank in Scale-Free Random Graphs. In Proceedings of the 11th International Workshop, WAW 2014, Beijing, China, 17–18 December 2014; Bonato, A., Graham, F.C., Pralat, P., Eds.; Springer: Cham, Switzerland, 2014; pp. 120–131. [Google Scholar]
Wan, P.; Wang, T.; Davis, R.A.; Resnick, S.I. Fitting the linear preferential attachment model. Electron. J. Statist. 2017, 11, 3738–3780. [Google Scholar] [CrossRef]

Figure 1. The scheme of communities as the “row” sequences and the representative node series as the “column” series where the maxima and sums are taken over node influence indices in the communities: representative series may be formed by a given order of nodes in the communities (a) or as all possible trees rooted at the nodes of the communities (b).

Figure 2. Examples of seed graphs generated by

3 \cdot 10^{5}

evolution steps of the PA evolution without a node and edge deletion, where

(α, β, γ)

are taken equal to

(0.2, 0.6, 0.2)

(left), and to

(0.5, 0.3, 0.2)

(right), the point sizes are proportional to the node PRs, the point colors relate to the evolution steps, i.e., the “older” the node, the darker the color.

Figure 2. Examples of seed graphs generated by

3 \cdot 10^{5}

evolution steps of the PA evolution without a node and edge deletion, where

(α, β, γ)

are taken equal to

(0.2, 0.6, 0.2)

(left), and to

(0.5, 0.3, 0.2)

(right), the point sizes are proportional to the node PRs, the point colors relate to the evolution steps, i.e., the “older” the node, the darker the color.

Figure 3. The QQ–plot estimates of the PR TIs of the ith representative series formed by the ith largest order statistics of the PRs within the communities,

i = 1, 2, 3, \dots

for examples of graphs evolved by the PA schemes without a node and edge deletion, and the Mann–Whitney U test statistics (A7) with intervals (A8).

Figure 3. The QQ–plot estimates of the PR TIs of the ith representative series formed by the ith largest order statistics of the PRs within the communities,

i = 1, 2, 3, \dots

for examples of graphs evolved by the PA schemes without a node and edge deletion, and the Mann–Whitney U test statistics (A7) with intervals (A8).

Figure 4. The QQ–plot estimates of TIs of the block maxima and block sums over node PRs of the communities against the minimum QQ–plot estimates of the PR TIs of the representative series, where each point corresponds to one of the 100 graphs evolved by the PA

(α, β, γ)

schemes without a node and edge deletion after

3 \cdot 10^{5}

evolution steps.

Figure 4. The QQ–plot estimates of TIs of the block maxima and block sums over node PRs of the communities against the minimum QQ–plot estimates of the PR TIs of the representative series, where each point corresponds to one of the 100 graphs evolved by the PA

(α, β, γ)

schemes without a node and edge deletion after

3 \cdot 10^{5}

evolution steps.

Figure 5. Medians with

(5 %, 95 %)

quantile intervals and outliers beyond this interval of the TI estimates (left) and their empirical MSEs (right) for the PR over all nodes (“all”), maxima (“mx”) and sums (“sm”) over communities, representative series (“rep”) after

3 \cdot 10^{4}

evolution steps of the PA evolution without a node and edge deletion. The in-degree TIs

α_{i n}

and

α_{o u t}

calculated by (4) are shown by horizontal lines.

Figure 5. Medians with

(5 %, 95 %)

quantile intervals and outliers beyond this interval of the TI estimates (left) and their empirical MSEs (right) for the PR over all nodes (“all”), maxima (“mx”) and sums (“sm”) over communities, representative series (“rep”) after

3 \cdot 10^{4}

evolution steps of the PA evolution without a node and edge deletion. The in-degree TIs

α_{i n}

and

α_{o u t}

calculated by (4) are shown by horizontal lines.

Figure 6. The dynamics of the QQ–plot TI estimates of the block maxima, the block sums, and the minimum QQ– plot estimates of representative series of the PRs averaged over 100 PA-evolving networks without a node and edge deletion, where the in-degree TIs

α_{i n}

calculated by (4) are shown by horizontal lines.

Figure 6. The dynamics of the QQ–plot TI estimates of the block maxima, the block sums, and the minimum QQ– plot estimates of representative series of the PRs averaged over 100 PA-evolving networks without a node and edge deletion, where the in-degree TIs

α_{i n}

calculated by (4) are shown by horizontal lines.

Figure 7. Examples of the graphs after

3 \cdot 10^{5}

steps of the PA evolution started from the seed graphs shown in Figure 2 with the uniform node deletion: the PA parameters are taken equal to

(0.2, 0.6, 0.2)

(left), and to

(0.5, 0.3, 0.2)

(right), the point sizes are proportional to the PRs, the point colors mean the same as in Figure 2.

Figure 7. Examples of the graphs after

3 \cdot 10^{5}

steps of the PA evolution started from the seed graphs shown in Figure 2 with the uniform node deletion: the PA parameters are taken equal to

(0.2, 0.6, 0.2)

(left), and to

(0.5, 0.3, 0.2)

(right), the point sizes are proportional to the PRs, the point colors mean the same as in Figure 2.

Figure 8. The QQ–plot estimates of TIs of the block maxima and block sums over communities against the minimum QQ–plot estimates of TIs of the representative series of PRs where points correspond to 100 graphs evolved by the PA

(α, β, γ)

schemes with the uniform node deletion after

3 \cdot 10^{5}

evolution steps.

Figure 8. The QQ–plot estimates of TIs of the block maxima and block sums over communities against the minimum QQ–plot estimates of TIs of the representative series of PRs where points correspond to 100 graphs evolved by the PA

(α, β, γ)

schemes with the uniform node deletion after

3 \cdot 10^{5}

evolution steps.

Figure 9. The dynamics of the QQ–plot estimates of the block maxima, the block sums, and the minimum QQ–plot estimates of representative samples of PRs calculated by the communities for 100 PA-evolving networks with the uniform node deletion after

3 \cdot 10^{5}

evolution steps.

Figure 9. The dynamics of the QQ–plot estimates of the block maxima, the block sums, and the minimum QQ–plot estimates of representative samples of PRs calculated by the communities for 100 PA-evolving networks with the uniform node deletion after

3 \cdot 10^{5}

evolution steps.

Figure 10. The dynamics of the number of communities, the community size, and the QQ–plot estimates of the PRs averaged over 100 PA-evolving networks without a node and edge deletion (left), with the uniform node deletion (middle), and with the uniform edge deletion (right).

Figure 11. The dynamics of the number of isolated nodes over 100 PA evolving networks with the uniform node deletion (left) and with the uniform edge deletion (right).

Figure 12. Examples of the graphs after

3 \cdot 10^{5}

steps of the PA evolution started from seed graphs in Figure 2 with the uniform edge deletion: the PA parameters are taken equal to

(0.2, 0.6, 0.2)

(left), and to

(0.5, 0.3, 0.2)

(right), the point sizes are proportional to the node PRs, the point colors mean the same as in Figure 2.

Figure 12. Examples of the graphs after

3 \cdot 10^{5}

steps of the PA evolution started from seed graphs in Figure 2 with the uniform edge deletion: the PA parameters are taken equal to

(0.2, 0.6, 0.2)

(left), and to

(0.5, 0.3, 0.2)

(right), the point sizes are proportional to the node PRs, the point colors mean the same as in Figure 2.

Figure 13. The QQ–plot estimates of TIs of the block maxima and block sums over communities against the minimum QQ–plot estimates of TIs of the representative series for PRs where the points correspond to 100 graphs evolved by the PA

(α, β, γ)

schemes with the uniform edge deletion after

3 \cdot 10^{5}

evolution steps.

Figure 13. The QQ–plot estimates of TIs of the block maxima and block sums over communities against the minimum QQ–plot estimates of TIs of the representative series for PRs where the points correspond to 100 graphs evolved by the PA

(α, β, γ)

schemes with the uniform edge deletion after

3 \cdot 10^{5}

evolution steps.

Figure 14. The dynamics of the QQ–plot estimates of the block maxima, block sums, and the minimum QQ–plot estimates of representative samples of PRs calculated over the communities averaging over 100 graphs evolved by the PA

(α, β, γ)

schemes with the uniform edge deletion, where the in-degree TIs

α_{i n}

calculated by (4) are shown by horizontal lines.

Figure 14. The dynamics of the QQ–plot estimates of the block maxima, block sums, and the minimum QQ–plot estimates of representative samples of PRs calculated over the communities averaging over 100 graphs evolved by the PA

(α, β, γ)

schemes with the uniform edge deletion, where the in-degree TIs

α_{i n}

calculated by (4) are shown by horizontal lines.

Table 1. The empirical bias, empirical standard deviation (in bold), and the

R M S E

(in brackets) of the Hill (A1) and QQ–plot (A4) estimates of the TI of the in- and out-degrees and the PR over 100 graphs generated after

3 \cdot 10^{4}

steps of the evolution by the PA schemes without a node and edge deletion.

Table 1. The empirical bias, empirical standard deviation (in bold), and the

R M S E

(in brackets) of the Hill (A1) and QQ–plot (A4) estimates of the TI of the in- and out-degrees and the PR over 100 graphs generated after

3 \cdot 10^{4}

steps of the evolution by the PA schemes without a node and edge deletion.

$(α, β, γ)$	Hill + Eyeball			Hill + KS			Hill + Bootstrap			QQ–Plot
$(α_{in}, α_{out})$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$
(0.4,0.2,0.4)	−0.078	-0.078	0.02	−0.075	−0.075	−0.014	−0.016	−0.018	0.012	−0.022	−0.023	0.002
(3, 3)	0.022	0.022	0.023	0.005	0.005	0.018	0.019	0.022	0.018	0.008	0.007	0.011
	[0.081]	[0.081]	[0.031]	[0.075]	[0.076]	[0.023]	[0.025]	[0.029]	[0.021]	[0.023]	[0.024]	[0.011]
(0.2,0.6,0.2)	−0.012	−0.012	−0.01	−0.016	−0.017	0.003	−0.002	−0.003	0.002	−0.008	−0.008	−0.002
(1.75, 1.75)	0.015	0.014	0.014	0.004	0.005	0.009	0.009	0.009	0.009	0.005	0.005	0.005
	[0.019]	[0.019]	[0.017]	[0.017]	[0.018]	[0.009]	[0.01]	[0.009]	[0.009]	[0.009]	[0.009]	[0.005]
(0.5,0.3,0.2)	0.015	−0.111	0.006	−0.025	−0.104	−0.015	0.008	−0.04	−0.002	0.005	−0.048	−0.007
(2.125, 3.4)	0.014	0.009	0.014	0.006	0.006	0.005	0.013	0.02	0.01	0.005	0.008	0.005
	[0.021]	[0.111]	[0.015]	[0.026]	[0.104]	[0.016]	[0.015]	[0.045]	[0.01]	[0.007]	[0.049]	[0.008]
(0.2,0.3,0.5)	−0.112	0.02	0.021	−0.102	−0.024	−0.025	−0.036	0.006	0.034	−0.046	0.005	0.023
(3.4, 2.125)	0.008	0.015	0.026	0.007	0.006	0.082	0.021	0.011	0.017	0.008	0.005	0.012
	[0.112]	[0.025]	[0.034]	[0.102]	[0.025]	[0.086]	[0.042]	[0.013]	[0.038]	[0.047]	[0.007]	[0.026]

Table 2. The empirical bias, empirical standard deviation (in bold), and the

R M S E

(in brackets) of the Hill and QQ–plot estimates of the block maxima TIs of the in- and out-degrees and PR over 100 graphs generated after

3 \cdot 10^{4}

steps of evolution by the PA schemes without a node and edge deletion.

Table 2. The empirical bias, empirical standard deviation (in bold), and the

R M S E

(in brackets) of the Hill and QQ–plot estimates of the block maxima TIs of the in- and out-degrees and PR over 100 graphs generated after

3 \cdot 10^{4}

steps of evolution by the PA schemes without a node and edge deletion.

$(α, β, γ)$	Hill + Eyeball			Hill + KS			Hill + Bootstrap			QQ–Plot
$(α_{in}, α_{out})$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$
(0.4,0.2,0.4)	−0.025	0.021	0.011	0.007	0.024	0.014	−0.008	0.002	0.017	0.007	0.009	0.003
(3, 3)	0.096	0.097	0.094	0.063	0.071	0.064	0.076	0.083	0.074	0.029	0.028	0.029
	[0.127]	[0.113]	[0.14]	[0.08]	[0.107]	[0.103]	[0.087]	[0.082]	[0.109]	[0.052]	[0.045]	[0.057]
(0.2,0.6,0.2)	−0.017	−0.016	−0.017	0.002	−0.001	0.003	−0.005	−0.011	0.003	0.002	0.002	0.002
(1.75, 1.75)	0.09	0.095	0.107	0.057	0.072	0.069	0.061	0.072	0.067	0.037	0.037	0.041
	[0.136]	[0.122]	[0.125]	[0.089]	[0.096]	[0.095]	[0.108]	[0.108]	[0.109]	[0.041]	[0.042]	[0.042]
(0.5,0.3,0.2)	−0.052	−0.029	−0.044	−0.005	0.007	−0.004	0.006	−0.011	−0.009	−0.001	0.006	0.001
(2.125, 3.4)	0.086	0.084	0.082	0.031	0.039	0.037	0.049	0.07	0.052	0.018	0.027	0.017
	[0.122]	[0.104]	[0.108]	[0.044]	[0.06]	[0.067]	[0.069]	[0.077]	[0.088]	[0.026]	[0.044]	[0.034]
(0.2,0.3,0.5)	−0.012	−0.011	0.012	0.002	0.002	0.02	0.014	−0.012	0.015	−0.011	−0.001	0.013
(3.4, 2.125)	0.099	0.092	0.113	0.039	0.054	0.09	0.074	0.062	0.084	0.034	0.021	0.055
	[0.14]	[0.111]	[0.147]	[0.055]	[0.071]	[0.128]	[0.105]	[0.069]	[0.116]	[0.048]	[0.025]	[0.076]

Table 3. The empirical bias, empirical standard deviation (in bold), and the

R M S E

(in brackets) of the Hill and QQ–plot estimates of the block sums TIs of the in- and out-degrees and PR over 100 graphs generated after

3 \cdot 10^{4}

steps of evolution by the PA schemes without a node and edge deletion.

Table 3. The empirical bias, empirical standard deviation (in bold), and the

R M S E

(in brackets) of the Hill and QQ–plot estimates of the block sums TIs of the in- and out-degrees and PR over 100 graphs generated after

3 \cdot 10^{4}

steps of evolution by the PA schemes without a node and edge deletion.

$(α, β, γ)$	Hill + Eyeball			Hill + KS			Hill + Bootstrap			QQ–Plot
$(α_{in}, α_{out})$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$	${\hat{α}}_{in}$	${\hat{α}}_{out}$	${\hat{α}}_{PR}$
(0.4,0.2,0.4)	0.027	0.025	0.039	0.019	0.026	0.033	-0.008	0.042	0.054	0.021	0.075	0.071
(3, 3)	0.088	0.095	0.096	0.071	0.079	0.071	0.056	0.057	0.052	0.068	0.068	0.066
	[0.124]	[0.167]	[0.168]	[0.101]	[0.161]	[0.166]	[0.079]	[0.124]	[0.127]	[0.096]	[0.154]	[0.157]
(0.2,0.6,0.2)	−0.004	−0.014	−0.015	−0.009	−0.012	−0.014	−0.002	−0.02	−0.009	0.002	0.002	0.002
(1.75, 1.75)	0.03	0.038	0.032	0.06	0.062	0.059	0.06	0.051	0.057	0.018	0.018	0.018
	[0.043]	[0.082]	[0.081]	[0.085]	[0.091]	[0.091]	[0.085]	[0.085]	[0.085]	[0.026]	[0.028]	[0.028]
(0.5,0.3,0.2)	−0.046	−0.065	−0.07	0.004	−0.006	−0.004	0.003	0.004	0.003	−0.002	0.003	0.002
(2.125, 3.4)	0.087	0.115	0.091	0.021	0.023	0.021	0.031	0.036	0.036	0.016	0.018	0.015
	[0.123]	[0.181]	[0.144]	[0.029]	[0.034]	[0.027]	[0.044]	[0.058]	[0.052]	[0.022]	[0.027]	[0.021]
(0.2,0.3,0.5)	0.007	−0.069	−0.008	−0.004	−0.002	−0.003	0.004	0.007	0.005	0.002	0.003	0.003
(3.4, 2.125)	0.117	0.079	0.112	0.024	0.023	0.025	0.038	0.035	0.044	0.017	0.016	0.018
	[0.165]	[0.107]	[0.128]	[0.034]	[0.037]	[0.041]	[0.053]	[0.054]	[0.058]	[0.024]	[0.025]	[0.03]

Table 4. The empirical bias, empirical standard deviation (in bold), and the

R M S E

(in brackets) of the Hill and QQ–plot minimum estimates of the TI of the stationary representative series for PRs in networks evolved by the PA schemes without a node and edge deletion.

Table 4. The empirical bias, empirical standard deviation (in bold), and the

R M S E

(in brackets) of the Hill and QQ–plot minimum estimates of the TI of the stationary representative series for PRs in networks evolved by the PA schemes without a node and edge deletion.

$(α, β, γ)$	$(α_{in}, α_{out})$	Hill + Eyeball	Hill + KS	Hill + Bootstrap	QQ–Plot
(0.4,0.2,0.4)	(3,3)	0.013, 0.008	0.012, 0.006	0.014, 0.005	0.013, 0.005
		[0.014]	[0.013]	[0.015]	[0.014]
(0.2,0.6,0.2)	(1.75, 1.75)	0.006, 0.007	0.013, 0.005	0.008, 0.005	0.008, 0.005
		[0.007]	[0.014]	[0.009]	[0.009]
(0.5,0.3,0.2)	(2.125, 3.4)	0.009, 0.001	0.008, 0.002	0.01, 0.001	0.007, 0.002
		[0.01]	[0.009]	[0.011]	[0.008]
(0.2,0.3,0.5)	(3.4, 2.125)	0.015, 0.005	0.017, 0.006	0.015, 0.004	0.014, 0.006
		[0.016]	[0.018]	[0.016]	[0.015]

Table 5. The empirical correlation values (6) corresponding to Figure 5 between the TIs of the block maxima (“mx”) (or the block sums (“sm”)) and representative series (“rep”).

	$ρ (X, Y)$
$(α, β, γ)$	$X = “ sm ”$ , $Y = “ rep ”$	$X = “ mx ”$ , $Y = “ rep ”$
$(0.4, 0.2, 0.4)$	0.774	0.751
$(0.2, 0.6, 0.2)$	0.975	0.987
$(0.5, 0.3, 0.2)$	0.987	0.966
$(0.2, 0.3, 0.5)$	0.406	0.473

Table 6. The empirical correlation values (6) corresponding to Figure 8 between the TIs of the block maxima (“mx”) (or the block sums (“sm”)) and representative series (“rep”).

	$ρ (X, Y)$
$(α, β, γ)$	$X = “ sm ”$ , $Y = “ rep ”$	$X = “ mx ”$ , $Y = “ rep ”$
$(0.4, 0.2, 0.4)$	0.526	0.471
$(0.2, 0.6, 0.2)$	0.672	0.668
$(0.5, 0.3, 0.2)$	0.742	0.732
$(0.2, 0.3, 0.5)$	0.691	0.683

Table 7. The empirical correlation values (6) corresponding to Figure 13 between the TIs of the block maxima (“mx”) (or the block sums (“sm”)) and representative series (“rep”).

	$ρ (X, Y)$
$(α, β, γ)$	$X = “ sm ”$ , $Y = “ rep ”$	$X = “ mx ”$ , $Y = “ rep ”$
$(0.4, 0.2, 0.4)$	0.407	0.441
$(0.2, 0.6, 0.2)$	0.525	0.518
$(0.5, 0.3, 0.2)$	0.375	0.354
$(0.2, 0.3, 0.5)$	0.631	0.645

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Markovich, N.; Ryzhov, M.; Vaičiulis, M. Tail Index Estimation of PageRanks in Evolving Random Graphs. Mathematics 2022, 10, 3026. https://doi.org/10.3390/math10163026

AMA Style

Markovich N, Ryzhov M, Vaičiulis M. Tail Index Estimation of PageRanks in Evolving Random Graphs. Mathematics. 2022; 10(16):3026. https://doi.org/10.3390/math10163026

Chicago/Turabian Style

Markovich, Natalia, Maksim Ryzhov, and Marijus Vaičiulis. 2022. "Tail Index Estimation of PageRanks in Evolving Random Graphs" Mathematics 10, no. 16: 3026. https://doi.org/10.3390/math10163026

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Tail Index Estimation of PageRanks in Evolving Random Graphs

Abstract

1. Introduction

2. Related Theoretical Work: Extremal Properties of Sums and Maxima of Random Length Sequences

3. Main Results

3.1. Evolution of Random Graphs

3.2. Representative Series Selection

3.3. Tail Index Estimation in Evolving Graphs

3.3.1. The Description of the Experiment

3.3.2. Evolution without a Node and Edge Deletion

3.3.3. Evolution with the Uniform Node Deletion

3.3.4. Evolution with an Uniform Edge Deletion

4. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Tail Index Estimators

Appendix A.1. The Hill Estimator

Appendix A.1.1. Bootstrapping

Appendix A.1.2. Eyeball Method

Appendix A.1.3. The KS Method

Appendix A.2. QQ–Plots for the Tail Index Estimation

Appendix B. Calculation of the PageRank

Appendix C. Directed Louvain Algorithm

Appendix D. Linear Preferential Attachment

Appendix E. Stationarity Tests

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI