**2. Methodology**

In practical applications, the computation of a global index to measure the importance to each node is a main task. If the system studied contains several types of relations between actors it is expected that the measures, in some way, consider the importance obtained from the different layers. A simple choice could be to combine the centrality of the nodes computed from the different layers independently according to some heuristic choice.

In this section, both the well-established methods and the proposed centrality for multiplex networks with data are described in detail.

### *2.1. The Adapted PageRank Algorithm (APA) Model*

Let us establish some notation that will be used in the following. Let G = ( N , E) be a graph where N = {1, 2, . . . , *n*} and *n* ∈ N. The link (*i*, *j*) belongs to the set E if and only if there exists a link connecting node *i* to *j*. The adjacency matrix of G is an *n* × *n* matrix *A* = *aij* , where

$$a\_{ij} = \begin{cases} 1 & \text{if } (i,j) \text{ is a link}, \\ 0 & \text{otherwise}. \end{cases}$$

The adapted PageRank algorithm (APA) proposed by Agryzkov et al. [25] provides us a model to establish a ranking of nodes in an urban network taking into account the data presented in it. This centrality was originally proposed for urban networks, although it may be generalized to spatial networks or networks with data. It constitutes a centrality measure for networks with the main characteristic being that it is able to consider the importance of data obtained from any source in the whole process of computing the centrality of the individual nodes. Starting from the basic idea of the PageRank vector concept, the construction of the matrix used for obtaining the classification of the nodes is modified.

In its original approach, PageRank was based on a model of a web surfer that probabilistically browses the web graph, starting at a node chosen at random according to a personalization vector whose components give us the probability of starting at node *v*. At each step, if the current node had outgoing links to other nodes, the surfer next browsed with probability *α* one of those nodes (chosen uniformly at random), and with probability 1 − *α* a node chosen at random according to the personalized vector. For the web graph, the most popular value of the dumping factor was 0.85. If the current node was a sink with no outgoing links, the surfer automatically chose the next node at random according to the personalized vector.

In the APA model, the data matrix was constructed following a similar reasoning from the original idea of the PageRank vector; a random walker can jump between connecting nodes following the local link given by the network or can jump between nodes (not directly connected) with the same probability, regardless the topological distance between them (number of nodes in the walk).

In the algorithm implemented to calculate the APA centrality (see [25], p. 2190), a new matrix *A*∗ = *pij* ∈ R*n*×*n* is constructed from the adjacency matrix *A*, as

$$p\_{ij} = \begin{cases} \frac{1}{c\_j} & \text{if } a\_{ij} \neq 0, \\\ 0 & \text{otherwise}, \end{cases} \quad 1 \le i, j \le n,\tag{1}$$

where *cj* represents the sum of the *j*-th column of the adjacency matrix.

Algebraically, *A*∗ may be obtained as

$$A^\* = A\Delta^{-1},\tag{2}$$

where Δ = *δij* ∈ R*n*×*n* is the degree matrix of the graph, that is, *δij* = *c*<sup>−</sup><sup>1</sup> *j* , for *i* = *j* and *δij* = 0, for *i* = *j*. We refer to *A*∗ as the transition matrix, and it represents, by columns, the probability to navigate from a page to other. In the literature related to this topic the matrix *A*∗ is also denoted as *P* or *PA*, so *A*<sup>∗</sup>, *P* or *PA* are the same matrix. Following the notation of Pedroche et al. [34] it will preferably be used *P*.

The transition matrix, *P* = *A*∗ has the following characteristics (see [25]):


The key point of the model is the construction of the so-called data matrix *D* of size *n* × *k*, with its *n* rows representing the *n* nodes of the network, and each of its *k* columns representing the attributes of the data object of the study. Specifically, an element *dij* ∈ *D* is the value we attach to the data class *kj* at node *i*.

However, not all the characteristics of data may have the same relevance or influence in the question object of the analysis. Therefore, a vector *v***0** ∈ R*k*×<sup>1</sup> is constructed, where the element that occupies the row *i* is the multiplicative factor associated with the property or characteristic *ki*. With this vector *v***0** a weighting factor of the data is introduced, in order to work with the entire data set or a part of it.

Then, multiplying *D* and *v***0**, *v* may be obtained as

$$
\boldsymbol{v} = \boldsymbol{D} \cdot \boldsymbol{v}\_{0,\prime}
$$

with *v* ∈ R*n*×1.

The construction of vector *v* allows us to associate to each node a value that represents the amount of data assigned to it. Thus, two different values are associated with every node; on the one hand, its degree, related to the topology and, on the other hand, the value of the data associated to it. For a more detailed description of how the data are associated to the nodes, see [25,35].

After normalizing *v*, denoted as *<sup>v</sup>∗*, it is possible to define the matrix *MAPA* as

$$M\_{APA} = (1 - a)P + aV,\tag{3}$$

where *V* ∈ R*n*×*n* is a matrix in which all of its components in the *i*-th row are equal to *v∗i* . The parameter *α* is fixed and it is related to the teleportation idea. The value that is traditionally used is *α* = 0.15.

In practice, vector *v<sup>∗</sup>* is repeated (*n* times) in every column of the matrix *V*.

The matrix *MAPA* was used to compute the ranking vector for the network.

With these considerations, the APA algorithm proposed in [25] may be summarized by the Algorithm 1:

**Algorithm 1:** (Adapted PageRank algorithm (APA)). *Let G* = (*<sup>V</sup>*, *E*) *be a primary graph representing a network with n nodes*.


The main feature of this algorithm is the construction of the data matrix *D* and the weighted vector *v***0**. The matrix *D* allows us to represent numerically the dataset. Vector *v***0** determines the importance of each of the factors or characteristics that have been measured by means of *D*.

The Perron–Frobenius theorem is of grea<sup>t</sup> importance in this problem, since it constitutes the theoretical base that ensures that there exists an eigenvector *x* associated with the dominant eigenvalue *λ* = 1, so that all its components are positive, which allows establishing an order or classification of these elements. In our case, due to the way in which *P* and *V* have been constructed, it can be seen that *MAPA* is a stochastic matrix by columns, which assures us of the spectral properties necessary for the Perron–Frobenius theorem to be fulfilled. Therefore, the existence and uniqueness of a dominant eigenvector with all its components positive is guaranteed. See [36,37] for further study of spectral and algebraic properties of the models based on PageRank.

Vector *x* constitutes the adapted PageRank vector and provides a classification or ranking of the pages according to the connectivity criterion between them and the presence of data.

### *2.2. The Biplex Approach for Classic PageRank*

Pedroche et al. [34] propose a two-layer approach for the classic PageRank classification vector based on the idea that we now briefly expose. The two-layer approach is to consider the PageRank classification of the nodes as a process divided into two clearly differentiated parts. The first part is related to the topology of the network, where the connections of the nodes are basically taken into account by means of their adjacency matrix. There is a second part regarding to the teleportation from one node to another, following a criterion of equiprobability.

They affirm that the PageRank classification for a graph *G* with personalized vector *v* can be understood as the stationary distribution of a Markov chain that occurs in a network with two layers, which are:

*l*1**, physical layer** , it is the network *G*.

*l*2**, teleportation layer** , it is an all-to-all network, with weights given by the personalized vector.

Under this perspective, it is easy to construct a block matrix *MA* based on these two layers where each of the diagonal blocks is associated to a given layer. Therefore, we can construct

$$M\_A = \left(\begin{array}{c|c} \alpha P\_A & (1-\alpha)I \\ \hline 2\alpha I & (1-\alpha)ev^T \end{array}\right) \quad \in \mathbb{R}^{2n \times 2n},\tag{4}$$

where *MA* defines a Markov chain in a network with two layers.

Due to the good spectral characteristics of *MA* (it is irreducible and primitive), they arrive to the conclusion that given a network with *n* nodes, whose adjacency matrix is *A*, the two-layer approach PageRank of *A* is the vector

$$
\mathfrak{H}\_A = \pi\_\mathfrak{u} + \pi\_\mathfrak{d} \in \mathbb{R}^n,
$$

where *πTu πTd T* ∈ R2*n* is the unique normalized and positive eigenvector of matrix *MA* given by the expression (4).

In [38], the authors propose a new centrality measure for complex networks with geo-located data based on the application of the two-layer PageRank approach to the APA centrality measure for spatial networks with data. They design an algorithm to evaluate this centrality and show the coherence of this measure regarding the original APA by calculating the correlation and the quantitative difference of both centralities using different network models. This coherence in the results obtained for the APA and the proposed centrality using the two-layer approach is absolutely mandatory in our objective to extend the two-layer approach for multiplex networks with data.

Therefore, the two-layer approach may be extended to the case of multiplex networks, where we have several networks with the same nodes and with different topologies and connections between nodes. Following the notation used by Pedroche et al. [34], let us consider a multiplex network M = (N , E, S) with layers S = (*l*1, *l*2, ... , *lk*). Given a multiplex network M with several layers, a multiplex PageRank centrality can be defined by associating to each layer *li* a two-layer random walker with one the physical layer and a teleportation layer. In addition, transition between these layers must be allowed. The idea behind this process is the application of the two-layer approach to each layer of the multiplex network.

For now, let us consider our problem restricted to biplex networks M = (N , E, S), with layers S = (*l*1, *l*2) whose adjacency matrices are *A*1, *A*2 ∈ *Rn*<sup>×</sup>*n*, respectively. For convenience in the notation we will write *PA*, *P*1 and *P*2 instead of *A*<sup>∗</sup>, *<sup>A</sup>*<sup>∗</sup>1and *<sup>A</sup>*<sup>∗</sup>2, respectively.

The authors (see, [34]) construct a general matrix *M*2 as a new block matrix by associating to each layer *li* a two-layer multiplex defined, for *i* = 1, 2, as:

$$M\_{i,i} = \left(\begin{array}{c|c} \alpha P\_A & (1-\alpha)I \\ \hline 2\alpha I & (1-\alpha)\mathsf{e}\mathsf{v}^T \end{array}\right) \quad \in \mathbb{R}^{2n \times 2n}.$$

Reordering the blocks in such a way that the physical layers appear in the first block, the final matrix is

$$\begin{aligned} \_{1}M\_{2} &= \frac{1}{2} \begin{pmatrix} \ \ \ \ \alpha P\_{1} & I \\ \ \ \end{pmatrix} \begin{array}{c} \ \ \alpha P\_{2} \\ \ \end{array} & \begin{array}{c} \ \ \ \ \end{array} \begin{array}{c} \ \ \ \ \ \ \end{array} & \begin{array}{c} \ \ \ \ \ \end{array} \\\ \ \begin{array}{c} \ \ \ \ \end{array} & \begin{array}{c} \ \ \end{array} & \begin{array}{c} \ \ \end{array} & \begin{array}{c} \ \ \end{array} \\\ \end{aligned} & \begin{array}{c} \ \ \ \end{array} & \begin{array}{c} \ \ \ \end{array} & \begin{array}{c} \ \ \end{array} & \begin{array}{c} \ \ \end{array} \\\ \end{aligned} & \tag{5}$$

with *Pi*, for *i* = 1, 2 row stochastic matrices and *vi*, for *i* = 1, 2 the personalized vectors.

It is straightforward to check that all the spectral properties of *M*2 are essentially the same as the Google matrix in the PageRank model. Then, there exists an eigenvector

$$\mathfrak{H}\_2 = (\pi\_{\mathfrak{u}\_{1'}}, \pi\_{\mathfrak{u}\_{2'}}, \pi\_{\mathfrak{d}\_{1'}}, \pi\_{\mathfrak{d}\_2}) \in \mathbb{R}^{4n}$$

associated to the dominant eigenvalue *λ* = 1. This vector is the key to obtain the classification vector representing the nodes centrality.

Consequently, summarizing the main characteristic of the biplex PageRank approach, considering a biplex network M with *n* nodes, with two layers S = (*l*1, *l*2), and whose adjacency matrices are *A*1, *A*2 ∈ R*<sup>n</sup>*, it can be affirmed that the PageRank vector that classifies the nodes of this biplex network is the unique vector *π***2** such as

$$
\pi\_{\mathsf{T}} = \frac{1}{2} \left( \pi\_{\mathsf{u}\_{\mathsf{1}}} + \pi\_{\mathsf{u}\_{\mathsf{2}}} + \pi\_{\mathsf{d}\_{\mathsf{1}}} + \pi\_{\mathsf{d}\_{\mathsf{2}}} \right) \in \mathbb{R}^{n},
$$

with *π***2** normalized.

### *2.3. Constructing the APABI Centrality by Applying the Two-Layer Approach*

The idea of the treatment of the PageRank concept by means of two layers has a grea<sup>t</sup> sense within the idea of APA centrality, since the influence of the data in the network is measured separately in the original algorithm. Paying attention to the construction of *MAPA* given by (3), note that *V* is the matrix summarizing all the data information. But not only the application of this concept is interesting for our centrality, since may be also interesting to analyze the differences that occur between both techniques of calculating the importance of the nodes.

In this section, we describe how to adapt the APA centrality taking as a reference the two-layers technique, where a block matrix is used to distinguish the topology and the personalized vector.

The original APA centrality model, described in Section 2, presents some differences from the model described and implemented by Pedroche et al. [34], where the final matrix involved in the eigenvector computation is stochastic by rows. In our approach, the basis of the original APA model consists of the construction of a stochastic matrix by columns, where we reflect the topology of the network by the probability matrix *P* and the influence of the data, through the matrix *V*.

In order to build a 2 × 2 block matrix, the same approach used in [34] may be reproduced. The first upper diagonal block contains the information referring to the network topology, while the lower diagonal block is associated to the collected data in the network and assigned to each node of it.

Taking as a reference the APA algorithm, the matrix used to compute the eigenvector associated to the dominant eigenvalue *λ* = 1 is given by

$$\mathcal{M}\_{\text{APA}} = (1 - a)P + aV.$$

A new 2 × 2 block matrix is constructed as

$$M\_{APA2} = \left(\begin{array}{c|c} aP\_A & (1-a)I \\ aI & aV \end{array}\right) \quad \in \mathbb{R}^{2n \times 2n}.\tag{6}$$

The idea that underlies the construction of the matrix by blocks given by (6) is to maintain the spectral properties of the original matrix *MAPA*, with the aim that the numerical algorithms for *Symmetry* **2019**, *11*, 284

determining dominant eigenvalue and eigenvector are stable and fast. Note that we have doubled the size of the original matrix.

Following the same reasoning used in Section 3 to construct a model for biplex networks taking as a basis the classical PageRank vector, it is necessary to extend the two-layers APA approach given by the block matrix (6). Using the same notation, let us consider M = (N , E, S), with layers S = (*l*1, *l*2) be a biplex network.

Reordering the blocks in such a way that the physical layers appear in the first block, the final matrix is given by

$$M\_{BI} = \frac{1}{2} \left( \begin{array}{cc|cc} (1-a)P\_1 & I & 2(1-a)I & 0 \\ \hline I & (1-a)P\_2 & 0 & 2(1-a)I \\ \hline I & 0 & aV\_1 & aV\_2 \\ 0 & I & aV\_1 & aV\_2 \end{array} \right) \tag{7}$$

with *Pi*, for *i* = 1, 2 column stochastic matrices and *Vi*, for *i* = 1, 2, the matrices containing the data information.

Note the differences between matrices *M*2 (5) and *MBI* (7). The matrix *M*2 is stochastic by rows, however, in the APA centrality the basic matrix *MAPA* is stochastic by columns, so the definition of the matrix *MBI* is determined by the need to maintain the spectral properties suitable for obtaining the proper vector of the centrality. These desirable spectral properties are ensured by the way in which *MBI* has been built, being stochastic by columns, as well as irreducible.

Then, there exists an eigenvector

$$
\hat{\pi}\_{BI} = (\pi\_{u\_1}, \pi\_{u\_2}, \pi\_{d\_1}, \pi\_{d\_2}) \in \mathbb{R}^{4u} \tag{8}
$$

associated to the dominant eigenvalue *λ* = 1. This vector is the key to obtain the classification vector representing the nodes centrality. Therefore, it can be obtained a unique vector

$$\mathbf{x} = \frac{1}{2} \left( \pi\_{\mathfrak{u}\_1} + \pi\_{\mathfrak{u}\_2} + \pi\_{\mathfrak{d}\_1} + \pi\_{\mathfrak{d}\_2} \right) \in \mathbb{R}^n,\tag{9}$$

with *x* a normalized vector.

In Figure 1, a schematic representation of the extended APA model for biplex networks has been represented. All the graphs share the same *n* nodes, although the relationships between them in the two layers *l*1 and *l*2 are different, which produces two different adjacency matrices *A*1 and *A*2. Data are also different in each layer; consequently, two data matrices are constructed *D*1 and *D*2.

**Figure 1.** Scheme of the adapted PageRank algorithm (APA) extended model for biplex networks.

The existence of the weighted vectors *v*01 and *v*02 allows us to determine those data that are the object of our interest, being able to discard those that do not present real interest for the study that is carried out.

The model presented in this section may be summarized by the Algorithm 2.

**Algorithm 2:** (Adapted PageRank algorithm biplex (APABI)). *Let* M = (N , E, S)*, with layers* S = (*l*1, *l*2) *and adjacency matrices A*1, *A*2 *be a biplex network with n nodes.*


Algorithm 2 summarizes the steps necessary to calculate a centrality measure that will be denoted as the adapted PageRank algorithm biplex (APABI). This measure provides us with a vector for classifying the nodes of the network according to their importance within a biplex network. This classification is obtained from the importance of the nodes in two networks where what changes are the associations between the nodes and the data associated with them. This classification is obtained from the importance of the nodes in two layers where the nodes are the same and what changes are the associations (links) between the nodes and the data associated with them.

Note that the *MBI* matrix is built for biplex networks. However, it can be easily extended the same reasoning for a multiplex network with *k* layers {*l*1, *l*2,..., *lk*}, defining the adjacency matrices {*<sup>A</sup>*1, *A*2,... *Alk*} and a set of *k* data matrices {*<sup>D</sup>*1, *D*2,..., *Dk*}. The matrix *MBI* is extended to a multiplex network with *k* layers as

$$M\_{multi} = \frac{1}{k} \left( \begin{array}{c|c} M\_{1,1} & M\_{1,2} \\ \hline M\_{2,1} & M\_{2,2} \end{array} \right).$$

with

$$M\_{1,1} = \begin{pmatrix} (1-\alpha)P\_1 & I & \cdots & \cdots & I \\ & I & (1-\alpha)P\_2 & \cdots & \cdots & \cdots \\ & \cdots & \cdots & \cdots & \cdots & \cdots \\ & & I & & \cdots & (1-\alpha)P\_k \end{pmatrix},\tag{10}$$

$$M\_{2,2} = \begin{pmatrix} \begin{array}{cccc} aV\_1 & aV\_2 & \cdots & aV\_k \\ aV\_1 & aV\_2 & \cdots & aV\_k \\ \cdots & \cdots & \cdots & \cdots & \cdots \\ & aV\_1 & aV\_2 & \cdots & aV\_k \end{pmatrix},\tag{11}$$

where *M*1,2, *M*2,1 are diagonal matrices. More exactly, *M*1,2 is formed by *k* blocks 2(1 − *α*)*<sup>I</sup>* and *M*2,1 is formed by *k* identity blocks *I*.

In the approach made so far in this section, we have considered the same value for the parameter *α* in all the layers that make up the network. However, it could happen that the *α* value was different in the different layers, as a consequence of the need to differentiate the importance that must be assigned to the data in each of the layers. That leads us to consider various *αi*, for each layer *i*. Note that this

variant does not imply in any way modify the spectral properties of the matrices involved in the calculation of centrality. Consequently, matrix *MBI* should be written now as

$$M\_{BI} = \frac{1}{2} \left( \begin{array}{cc|cc} (1 - a\_1)P\_1 & I & \\ I & (1 - a\_2)P\_2 & 0 & 2(1 - a\_2)I \\ \hline I & 0 & a\_1V\_1 & a\_2V\_2 \\ 0 & I & a\_1V\_1 & a\_2V\_2 \end{array} \right) \tag{12}$$

The generalization of the matrix *MBI* to *k* layers with *k* parameters *αi* consists simply of replacing each *α* in the expressions (10) and (11) with its corresponding *αi* in the *i*-th row.

### *2.4. A Note About the Computational Cost*

We discuss certain general aspects of the computational cost of the proposed model. It should be noted that if we look closely at Algorithm 2, the most expensive algebraic operations that are carried out are the product of a scalar by a matrix, the matrix-vector product and the calculation of the dominant eigenpair (*<sup>λ</sup>*, **x**) of matrix *MBI*, given by the expression (8).

As it is well-known and can be seen in any linear algebra textbook, the product of a scalar by a square matrix of size *n* requires *n* × *n* multiplications, while the product of two square matrices of size *n* requires a computational cost of *<sup>O</sup>*(*n*<sup>2</sup>). In our case, we need to make the product *D* · *vi*, for *i* = 1, 2, where *D* is the data matrix of size *n* × *k* and *vi* is a column vector of size *n*. Therefore, the computational cost of *D* · *v*0 is of *<sup>O</sup>*(*nk*).

However, the most expensive part from the computational point of view is found in step 8 of the algorithm, in which, once the *MBI* matrix is constructed, it is necessary to obtain its dominant eigenpair (*<sup>λ</sup>*, **<sup>x</sup>**). The numerical problem of calculating the eigenvalues and eigenvectors of any matrix is very expensive, in general, if the matrix in question does not have a structure that simplifies its calculation in some way. In general, for matrices of low dimension (such as *N* < 150), there are efficient methods for finding all the eigenvalues and eigenvectors. For example, the Householder–QL–Wilkinson modification of the given method is built into the EISPACK routines and is routinely used. The computation time for any of these methods grows as *N*<sup>3</sup> and the memory requirement grows as *N*2. For large matrices, a very commonly used algorithm is Lanczos, that is an adaptation of power methods to find the *m* most useful eigenvalues and eigenvectors of an *n* × *n* Hermitian matrix. For a more detailed description of the numerical matrix eigenvalue problems, see [39].

Due to the way we have built the *MBI* matrix following the original idea of the Google matrix used in the original PageRank combined with the two-layer PageRank approach, we have ensured that this matrix inherits the spectral properties of the original Google matrix in the original PageRank model. It is a stochastic matrix by columns of which we can affirm, using a variant of the Perron–Frobenius theorem, that its own dominant eigenvector associated to eigenvalue *λ* = 1 corresponds to the stationary distribution of the Markov chain by the column normalized matrix *MBI*. This stationary vector *π***<sup>ˆ</sup>** *BI* verifies that

$$M\_{BI}\dot{\pi}\_{BI} = \dot{\pi}\_{BI}$$

and may be obtained by using the well-known power iteration method, applying it until the convergence of the iterative process

$$\begin{array}{rcl} \mathfrak{star}\_{\mathbb{k}} &=& M\_{BI} \pi\_{\mathbb{k}-1} \\ \mathfrak{m}\_{\mathbb{k}} &=& \mathfrak{H}\_{\mathbb{k}} / \max(\mathfrak{H}\_{\mathbb{k}})\_{\mathbb{k}} \end{array}$$

for *k* = 1, 2, . . ..

In addition, it should be noted that the use of the power method for the calculation of the dominant eigenvector is especially useful when applied to sparse matrices.
