2.2.2. Normalized Modularity

The equity index structure is strongly connected. We cannot say that events in Africa do not have any effect on European markets, hence we have to find methods which can be used to cluster dense graphs.

Let *<sup>G</sup>*(*VN*×1, *WN*×*<sup>N</sup>*) be a weighted graph, where *V* denotes the set of vertices and *W* represents the weights of the edges. A *k*-partition of graph *<sup>G</sup>*(*<sup>V</sup>*, *W*) can be defined as the partition of vertices such that ∪*<sup>k</sup> <sup>a</sup>*=<sup>1</sup>*Va* = *V* and *Vi* ∩ *Vj* = *<sup>δ</sup>i*,*jVi*, ∀*i*, *j* ∈ {1, . . . , *k*}.

The *Wi*,*<sup>j</sup>* value represents the strength of the connection between nodes (*i*, *j*). If we assume that nodes are independently connected, then the guess of weight *Wi*,*<sup>j</sup>* will be the product of the average connection strength of *i* and *j*. The average connection strength *di* and *dj* are given by *W*,

$$d\_i = \frac{1}{N} \sum\_{\mathfrak{u}=1}^N \mathcal{W}\_{i,\mathfrak{u}\sigma}$$

Thus, *Wi*,*<sup>j</sup>* − *didj* captures the information of the network structure (Bolla 2011). If we want to maximize the sum of information in each cluster, we get:

$$\max\_{P\_k \in \mathcal{P}\_k} \sum\_{a=1}^k \sum\_{i,j \in V\_a} (\mathcal{W}\_{i,j} - d\_i d\_j)\_{,} \tag{6}$$

where *Pk* stands for specific *k-*partition in P*k*, which represents the set of all possible *k*-partitions.

Let *M* := *W* − *dd<sup>T</sup>* denotes the modularity matrix of *<sup>G</sup>*(*<sup>V</sup>*, *<sup>W</sup>*). If we would like to ge<sup>t</sup> clusters with similar volumes, then we have to add a penalty to Equation (6), hence we ge<sup>t</sup> the normalized Newman–Girvan cut.

$$\max\_{P\_k \in \mathcal{P}\_k} \sum\_{a=1}^k \frac{1}{\text{Vol}(V\_a)} \sum\_{i,j \in V\_a} \left( \mathcal{W}\_{i,j} - d\_i d\_j \right),\tag{7}$$

where Vol(*Va*) = ∑*u*∈*Va du*.

> Let us define the so called normalized modularity matrix:

$$M\_D := D^{-1/2} M D^{-1/2},\tag{8}$$

If we would like to cluster a weighted graph *<sup>G</sup>*(*<sup>V</sup>*, *<sup>W</sup>*), then eigenvectors of its modularity (*M*) and normalized modularity matrices (*MD*) can be used. Modularity and normalized modularity matrices are symmetric and 0 is always in the spectrum of *MD*:

$$M\_D = \sum\_{i=1}^{N} \lambda\_i u\_i = \sum\_{i=1}^{N-1} \lambda\_i u\_{i\prime} \tag{9}$$

where 1 > *λ*1 ≥ *λ*2 ≥ ... ≥ *λN* ≥ −1 denote the eigenvalues of *MD*.

If we would like to maximize Equation (7), then we can use the *k*-means clustering algorithm on the optimal *k*-dimensional representation of vertices,

$$\left(D^{-\frac{1}{2}}u\_1, \ldots, D^{-\frac{1}{2}}u\_k\right)^T,$$

where *u*1, ... , *uk* denote the corresponding eigenvalues of |*<sup>λ</sup>*1(*MD*)|≥ ... ≥|*<sup>λ</sup>k*(*MD*)|. Moreover, if the normalized modularity matrix has large positive eigenvalues, then the graph has well-separated clusters, otherwise clusters are strongly connected.

Another natural approach is to minimize the normalized cut (Von Luxburg 2007).

$$\min\_{P\_k \in \mathcal{P}\_k} \sum\_{a=1, b=a+1}^{k-1,k} \left( \frac{1}{\text{Vol}(V\_a)} + \frac{1}{\text{Vol}(V\_b)} \right) \mathcal{W}\_{i,j}.\tag{10}$$

The optimization problem is similar to Equation (7). However, instead of the normalizedmodularity matrix the normalized Laplace matrix provides the solution (Shi and Malik 2000).

$$L\_D := D^{-\frac{1}{2}}(D - W)D^{-\frac{1}{2}},\tag{11}$$

This technique works when clusters are well separated, otherwise normalized modularity gives better results.
