1. Introduction
The research of network science is experiencing a blossom in the last decade, which provides profound implications in very different fields, from finance to social and biological networks [
1]. Considering the enormous data scale, most studies merely focus on a small group of influential nodes rather than the whole network. Take social networks for instance, influential nodes are those that have the most spreading ability, or playing a predominant role in the network evolution. Notably, a popular star in online social media may remarkably accelerate the spreading of rumors, and a few super spreaders [
2] could largely expand the epidemic prevalence of a disease (e.g., COVID-19) [
3]. The research of influencer identification is beneficial to understanding and controlling the spreading dynamics in social networks with diverse applications such as epidemiology, collective dynamics and viral marketing [
4,
5].
Nowadays, individuals interact with each other in more complicated patterns than ever. It is a challenging task to identify influencers in social networks for the various kinds of interactions. As we have known, the graph model is widely utilized to represent social networks, however, it is incapable of dealing with the multiple social links. For example, people use Facebook or WeChat to keep communication with family members or friends, use Twitter to post news, use LinkedIn to search for jobs, and use TikTok to create and share short videos [
6]. It is easy to represent each social scenario via a graph model separately, in spite of they are belonging to the same group of individuals. The neglect of the multiple relationships between social actors may lead to an incorrect result of the most versatile users [
7]. With the proposal of multilayer networks [
8,
9], we are able to encode the various interactions, which is of great importance and necessity of identifying influencers in multiple social networks.
In this paper, we design a novel node centrality measure for monolayer network, and then apply it to multilayer networks to identify influencers in multiple social networks. This method is solely based on the local knowledge of a network’s topology in order to be fast and scalable due to the huge size of networks, and thus suitable for both real-time applications and offline mining.
The rest of this paper is organized as follows.
Section 2 introduces the related works on influencers identification in monolayer network and multilayer networks.
Section 3 presents the mathematical model and the method for detecting influencers.
Section 4 exhibits the experiments and analysis, including comparison experiments on twenty-one real-world datasets, which verifies the feasibility and veracity of the proposed method.
Section 5 summarizes the whole paper and provides concluding remarks.
2. Related Works
The initial research on influencers identification may date back to the study of node centrality, which means to measure how “central” a focal node is [
10]. A plethora of methods for influencers identification are proposed in the past 40 years, which can be mainly classified into centrality measures, link topological ranking measures, entropy measures, and node embedding measures [
11,
12]. Some of these measures take only the local information into account, while others even employ machine learning methods. Nowadays, it has been one of the most popular research topics and yielded a variety of applications [
7] such as identifying essential proteins and potential drug targets for the survival of the cell [
13], controlling the outbreak of epidemics [
14], preventing catastrophic outages in power grids [
15], driving the network toward a desired state [
16], improving transport capacity [
17], promoting cooperation in evolutionary games [
18], etc. This paper investigates the problem of identifying influencers in social networks, by introducing a family of centrality-like measures and gives a brief comparison in
Table 1.
Degree Centrality (DC) [
19] is the simplest centrality measure, which merely counts how many social connections (i.e., the number of neighbors) a focal node has, defined as
where
N is the total number of nodes,
is the weight of edge
if
i is connected to
j, and 0 otherwise. The degree centrality is simple and merely considers the local structure around a focal node [
20]. However, this method is probably mistaken for the negligence of global information, i.e., a node might be in a central position to reach others quickly although it is not holding a large number of neighbors [
21]. Thus, Betweenness Centrality (BC) [
22] is proposed to assess the degree to which a node lies on the shortest path between two other nodes, defined as
where
is the total number of shortest paths,
is the shortest path between
s and
t that pass through node
i. The betweenness centrality considers global information and can be applied to networks with disconnected components. However, there is a great proportion of nodes that do not lie on the shortest path between any two other nodes, thereby the computational result receives the same score of 0. Besides, high computational complexity is also a limitation of applying for large-scale networks. Analogously, Closeness Centrality (CC) [
23] is proposed to represent the inverse sum of shortest distances to all other nodes from a focal node, defined as
where
N is the total number of nodes,
is the shortest path length from node
i to node
j. The closeness centrality is capable of measuring the core position of a focal node via the utilization of global shortest path length, while it suffers from the lack of applicability to networks with disconnected components, e.g., if two nodes that belong to different components do not have a finite distance between them, it will be unavailable. Besides, it is also criticized by high computational complexity.
Eigenvector Centrality [
24] (EC) is a positive multiple of the sum of adjacent centralities. Relative scores are assigned to all nodes in a network based on an assumption that connections to high-scoring nodes contribute more to the score of the node than connections to low-scoring nodes, defined as
where
depicts the eigenvalue of adjacency matrix
A,
depicts the eigenvector stable state of interactions with eigvenvalue
. This measure considers the number of neighbors and the centrality of neighbors simultaneously, however, it is incapable of dealing with non-cyclical graphs. In 1998, Brin and Page developed the PageRank algorithm [
25], which is the fundamental search engine mechanism of Google. PageRank (PR) is a positive multiple of the sum of adjacent centralities, defined as
where
N depicts the total number of nodes,
,
is the number of edges from node
j point to
i. Likewise, this method is efficient but also criticized by non-convergence in cyclical structures. As we have known, the clustering coefficient [
26,
27] is a measure of the degree to which nodes in a graph tend to cluster together, defined as
It is widely considered that a node with a higher clustering coefficient may benefit forming communities and enhancing local information spreading. However, Chen et al. expressed contrary views that the local clustering has negative impacts on information spreading. They proposed a ClusterRank algorithm for ranking nodes in large-scale directed networks and verified its superiority to PageRank and LeaderRank [
28]. Therefore, the effect of clustering coefficient on information spreading is uncertain, which may benefit local information spreading but prohibit global (especially directional network) information spreading. In 2016, Ma et al. proposed a gravity centrality [
29] (GR) by considering the interactions comes from the neighbors within three steps, defined as
where
and
are the
k-shell index of
i and
j, respectively.
is the neighborhood set whose distance to node
i is less than or equal to 3,
is the shortest path length between
i and
j. These methods consider semi-local knowledge of a focal node, i.e., the neighboring nodes within three steps, which are successful in many real-world datasets, such as Jazz [
30], NS [
31] and USAir network [
32], etc. However, they are also with high computational complexity by globally calculating
k-shell. In 2019, Li et al. improved the gravity centrality and proposed a Local-Gravity centrality (LGR) [
33] by replacing
k-shell computing and merely considering the neighbors within
R steps, defined as
where
and
are the degrees of
i and
j, respectively,
is the shortest path length between
i and
j. This method had been extremely successful in a variety of real-world datasets, however, the parameter
R requires the calculating of network diameter, which is also a time-consuming process.
The above-mentioned centrality measures have been utilized to rank nodes’ spreading abilities in monolayer networks. The ranking of nodes in multilayer networks is a more challenging task and is still an open issue. The information propagation process over multiple social networks is more complicated, and conventional models are incapable without any modifications. Zhuang and Yaǧan [
36] proposed a clustered multilayer network model, where all constituent layers are random networks with high clustering to simulate the information propagation process in multiple social networks. Likewise, Basaras et al. [
37] proposed an improved susceptible–infected–recovered (SIR) model with information propagation probability parameters (i.e.,
for intralayer connections and
for interlayer connections). Most of the recent endeavors concentrated on the multiplex networks, (e.g., clustering coefficient in multiplex networks [
38]), where all layers share the identical set of nodes but may have multiple types of interactions. Rahmede et al. proposed a MultiRank algorithm [
39] for the weighted ranking of nodes and layers in large multiplex networks. The basic idea is to assign more centrality to nodes that are linked to central nodes in highly influential layers. The layers are more influential if highly central nodes are active in them. Wang et al. proposed a tensor decomposition method (i.e., EDCPTD centrality) [
7], which utilize the fourth-order tensor to represent multilayer networks and identify essential nodes based on CANDECOMP/PARAFAC (CP) tensor decomposition. They also exhibited the superiority to traditional solutions by comparing the performance of the proposed method with the aggregated monolayer networks. In a word, it is of great significance in identifying influencers in multiplex networks. Our purpose in this work is to devise a measure that can accurately detect influential nodes in a general multilayer network.