Influence Maximization Based on Adaptive Graph Convolution Neural Network in Social Networks

Liu, Wei; Wang, Saiwei; Ding, Jiayi

doi:10.3390/electronics13163110

Open AccessArticle

Influence Maximization Based on Adaptive Graph Convolution Neural Network in Social Networks

by

Wei Liu

^*

,

Saiwei Wang

and

Jiayi Ding

College of Information Engineering, Yangzhou University, Yangzhou 225000, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(16), 3110; https://doi.org/10.3390/electronics13163110 (registering DOI)

Submission received: 11 July 2024 / Revised: 4 August 2024 / Accepted: 5 August 2024 / Published: 6 August 2024

(This article belongs to the Special Issue Data-Driven AI Approaches with Applications in Social Network, Media Analytics and Smart Cities)

Download

Browse Figures

Versions Notes

Abstract

:

The influence maximization problem is a hot issue in the research on social networks due to its wide application. The problem aims to find a small subset of influential nodes to maximize the influence spread. To tackle the challenge of striking a balance between efficiency and effectiveness in traditional influence maximization algorithms, deep learning-based influence maximization algorithms have been introduced and have achieved advancement. However, these algorithms still encounter two key problems: (1) Traditional deep learning models are not well-equipped to capture the latent topological information of networks with varying sizes and structures. (2) Many deep learning-based methods use the influence spread of individual nodes as labels to train a model, which can result in an overlap of influence among the seed nodes selected by the model. In this paper, we reframe the influence maximization problem as a regression task and introduce an innovative approach to influence maximization. The method adopts an adaptive graph convolution neural network which can explore the latent topology information of the network and can greatly improve the performance of the algorithm. In our approach, firstly, we integrate several network-level attributes and some centrality metrics into a vector as the presentation vector of nodes in the social network. Next, we propose a new label generation method to measure the influence of nodes by neighborhood discount strategy, which takes full account of the influence overlapping problem. Subsequently, labels and presentation vectors are fed into an adaptive graph convolution neural network model. Finally, we use the well-trained model to predict the importance of nodes and select top-K nodes as a seed set. Abundant experiments conducted on various real-world datasets have confirmed that the performance of our proposed algorithm surpasses that of several current state-of-the-art algorithms.

Keywords:

social networks; influence maximization; presentation vector; neighborhood discount strategy; AGCN

1. Introduction

With the rapid development of social media, people can communicate and share information more easily on platforms such as WeChat, Facebook, Twitter, etc., making social communication easier and faster. Most companies exploit word-of-mouth effects to advertise their products or brands on these social networks. This marketing strategy is commonly known as viral marketing. Domingo et al. modeled viral marketing as an Influence Maximization (IM) problem [1]

The influence maximization problem is a significant area of research within the field of social networks. The goal of the IM problem is to find a small group of influential users in the social network, through which more node users in the social network can accept the target product, idea policy, etc. A social network can be represented as an undirected graph, denoted as

G = (V, E)

, where V is the set of nodes, each representing a user within the social network, and E is the set of edges that symbolize the relationships between pairs of users. Given a positive integer

k (k < | V |)

and a propagation model M that is used to model the information diffusion process in social networks. The independent cascade (IC) model [2] and Susceptible-Infected-Recovered (SIR) model [3] are two well-known propagation models. In this paper, we conducted experiments with these two diffusion models. we can leverage M to calculate

σ_{M} (S)

, which is the expected number of nodes that are influenced by the seed set S, also known as the influence spread of S. Formally, the IM problem can be presented as follows:

S^{*} = a r g m a x_{| S | = k, S \subseteq V} σ_{M} (S)

(1)

Kempe et al. [2] proved that the IM problem was NP-hard and proposed a greedy framework to get the approximate optimal solution (

1 - \frac{1}{e}

approximation ratio). Since greedy algorithms are too time-consuming to be applicable to large networks, some centrality-based methods, such as degree centrality, betweenness centrality, and eigenvector centrality, are also widely used in IM problems. Although algorithms based on heuristics demonstrate high efficiency, the quality of the results from these algorithms is often poor. Recently, community detection, meta-heuristics, and deep learning have also been successfully applied to the IM problem, bringing new perspectives to the study of IM. The community-based method divides the graph into communities and then selects the local optimal seed nodes in the communities, respectively, which ignores the global importance of the nodes. The fitness function of meta-heuristic algorithms cannot use simulation methods to obtain accurate influence spread, so the accuracy of the algorithm is lower. Moreover, meta-heuristics algorithms often require a large number of random selection solutions, which causes lower efficiency of these algorithms. Due to the powerful representation capabilities of deep learning methods, it can learn the latent topology of the network to measure the importance of nodes from a global perspective. Moreover, by using deep learning techniques, the seed node selection process in IM problems can be transformed into a model prediction process, which greatly improves the efficiency of node selection.

Y. Li et al. [4] presented a survey on deep learning-based methods for influence maximization. They focus on summarizing the relevant background knowledge, basic principles, fundamental methods, and applications of deep learning-based methods for influence maximization. S. Kumar et al. [5] proposed a transfer learning-based approach for the influence maximization problem under the susceptible–infected–recovered (SIR) information diffusion model. The method calculated three node centralities as feature vectors of the nodes, and then the feature vectors were fed into a graph-based long-short-term memory (LSTM) network. The trained LSTM was subsequently employed to predict the potential influence of individual nodes and seeds in the target network were selected based on their estimated influences. Although influence maximization methods have been reported in recent years, they still have shortcomings in obtaining higher-quality results efficiently. Firstly, most existing deep learning-based IM algorithms are limited to training and prediction on a single graph, making knowledge transfer unattainable. Secondly, many existing methods do not consider the interaction and overlapping of the influences spread by different nodes. Since influence started from different sources may have mutual interference, it may reduce the precision of the influence maximization results if we consider influences from different nodes as independent ones. Finally, the input for the IM problem is graph data, and transforming graph data into Euclidean data acceptable by neural networks is a challenge that deep learning-based IM algorithms must solve.

To overcome the above shortcomings and challenges, we have provided a deep learning-based method for the IM problem to simultaneously ensure efficiency and improve effectiveness in solving the problem. In our algorithm, to facilitate knowledge transfer, we develop a novel model, NDD_AGCN, based on the adaptive graph convolution neural network (AGCN) [6]. This model is designed to extract node features from graphs with diverse topological structures and predict the spreading power of nodes. We extract four network-level attributes and five centrality metrics of nodes to construct a presentation vector of nodes and then embed the nodes into Euclidean space, which accelerates the convergence of the AGCN model. Furthermore, we advise a new label generation algorithm (NDD), which adopts neighborhood discount strategies to solve the overlapping influence problem. Meanwhile, the efficiency of NDD is notable as it is a based-heuristic approach. After obtaining the labels and representation vectors, we can use them to train the NDD_AGCN model. Finally, we employ the well-trained NDD_AGCN model to predict the spreading power of nodes and select a seed set. The contributions and innovations of this work are shown as follows:

To overcome the limitation of traditional algorithms that struggle to enhance both efficiency and effectiveness simultaneously, we transformed the IM problem into a regression problem and employed a deep learning-based approach to address it.
We adopt an adaptive graph convolution neural network (AGCN) to construct a new deep learning model called NDD_AGCN for the IM problem. The NDD_AGCN is designed to accommodate graphs of different sizes and topological structures as input, allowing it to be trained on known graphs and make predictions on unknown graphs.
We propose a novel label generation algorithm (NDD), which uses a neighborhood discount strategy to solve the influence overlapping problem. In addition, this algorithm is highly efficient as it is based on heuristics.
We perform abundant experiments over several real-world networks which prove that our algorithm outperforms other state-of-the-art algorithms for the IM problem.

The rest section of this paper is organized as follows: Section 2 reviews the existing work on influence maximization; Section 3 introduces the information diffusion model and the AGCN network; Section 4 describes the details of our proposed method; The experimental results are given and analyzed in Section 5 and Section 6. Section 7 concludes this work and discusses future research directions.

2. Related Work

The influence maximization has received a lot of attention because of its wide application. So far, there is a lot of research work accumulated on IM problems. Domingos and Richardson [1] first proposed the influence maximization problem and Kempe et al. [2] formulated it as a discrete optimization. In [2], Kemple et al. also defined the Independent Cascade model (IC) and Linear Threshold (LT) model and then developed a greedy algorithm to approximate the optimal solution of the IM problem. However, the greedy algorithm was time-consuming and inefficient. Therefore, many scholars focused on improving the efficiency of greedy methods. Leskovec et al. [7] reduced the number of simulations of the IC model by further exploring the submodularity of the objective function of influence maximization. They proposed a novel greedy algorithm called CELF that was 700 times faster than the original greedy algorithm. Li et al. [8] put forward CELF++ to further avoid unnecessary Monte Carlo simulations. Although CELF++ is faster than CELF, it still needs to calculate the influence spread of all nodes at the first iteration. The process is very time-consuming and exhausting. The literature [9] utilized matrix analysis to estimate an upper bound of the influence spread of node in UBLF, which successfully overcame the weakness of CELF++ and increased the efficiency. Furthermore, Chen et al. [10] presented a new way that removed edges with probability p to generate an activation subgraph. They used the activation subgraph to calculate the influence spread of nodes, which can reduce the number of Monte Carlo simulations. In this way, they proposed the NewGreedyIC algorithm and achieved more than six orders of magnitude faster than the traditional greedy algorithms.

Although lots of work has been conducted to improve the greedy algorithm, it cannot be applied to large-scale networks. Triggered by this situation, many scholars attempted to integrate heuristic strategies to improve the efficiency of algorithms. Degree centrality [11] only considers the local importance of nodes, which may cause influence overlapping. To overcome the influence of overlapping issues associated with degree-based algorithms, Chen et al. [10] introduced the DegreeDiscount algorithm. This algorithm discounts the degree of a seed node within a neighborhood. In [12], Brin presented the Pagerank algorithm, which is a link analysis algorithm designed to evaluate the importance of web pages through an analysis of their hyperlink structures. It can also assess the importance of nodes in the network from a global perspective. To enhance the performance of the algorithm, Kumar et al. [13] advised the NCVoteRank algorithm by extending the vote rank algorithm. The NCVoteRank algorithm takes into consideration the k-shell value of neighbors of a node to decide the voting power of this node. Heuristic algorithms are efficient but have low accuracy. Some researchers have proposed a two-stage seed selection strategy, which involves initially selecting candidate nodes by heuristic methods and subsequently employing optimization algorithms to find seed nodes. Biswa et al. [14] adopted an MCDM technique of simple Additive Weighting (SAW) [15] to integrate four centrality measures of degree, Betweenness, closeness, and eigenvalue into a new measure. They used the measure to find candidate nodes and employed a simulated annealing algorithm to select the final seed set.

Recently, many popular techniques have been applied to the IM problem. Community structure is an important feature of social networks, which describes the dense connections within communities and sparse connections between communities. In [16], Kumar et al. proposed an extended h-index measure to find a candidate seed set and then detected the community using a label propagation algorithm to assist in the selection of a seed set.

Additionally, meta-heuristics [17,18,19] are also often used to solve influence maximization. They can alleviate the problem that greedy algorithms tend to fall into local optimal solutions. Furthermore, deep learning methods have been successfully applied in many fields, including IM problems. Zhang et al. [20] established a leader fake labeling mechanism to automatically generate node labels and trained a dynamic GCN model to predict the influence spread of nodes. Kumar et al. [21] interpreted the influence maximization problem as a pseudo-regression task. They adopted struc2vec node embedding to generate a representation vector of each node in the network. The label of each node was the influence spread obtained by the SIR model. The representation vectors and labels were fed into a neural graph network to obtain a well-trained model, and then the model was employed to predict the influence spread of nodes on an unknown network. In [22], Ling et al. proposed a joint framework DeepIM, which achieved a more generalized and scalable influence maximization in social networks by learning latent representations of seed sets and end-to-end propagation models.

In this paper, we propose an effective deep learning method (NDD_AGCNIM) to solve the IM problem. In this method, firstly, we present a heuristic method (NDD) to generate labels and extract the topological features to construct the representation vector for nodes in a network. Secondly, we use labels and representation vectors to train an adaptive graph convolution neural network model to predict the importance of nodes in networks. We select the most important k nodes as a seed set. Experiments on real-world datasets have shown that the proposed algorithm outperforms several state-of-the-art algorithms in terms of performance.

3. Preliminaries

To better understand our work, this section introduces the independent cascade (IC) model, susceptible-infected-recovered (SIR), and the adaptive graph neural networks.

3.1. Information Diffusion Model

In this paper, we solve the IM problem in IC and SIR models. In the IC model, the weight of each edge represents activation probability

p \in [0, 1]

. A node in a network has two states, active and inactive. In the beginning, the seed set is set as activated, and then they try to activate their inactive neighbors with a probability p. Meanwhile, those newly activated nodes will have only one chance to activate their direct inactive neighbors. The above process is repeated until no more nodes are activated. The activation process is mutually independent for different neighbors of each node.

In the SIR model, each edge

(u, v)

in a network is associated with a weight, denoted as

p_{u, v}

, which represents the probability that node u infects node v. Each node v is assigned a weight

q_{v}

, indicating the likelihood of recovery for the node v after it becomes infected. The SIR model describes the spread of infectious diseases within a population by classifying individuals into three distinct categories: Susceptible (S), Infected (I), and Recovered (R). In the initial stage of spread, a subset of individuals, known as seed nodes, transitions from the S state to the I state. In each round, Each node u in the S state will infect each of its neighbors v with a certain probability

p_{u, v}

. Each node v in the I state will recover and transition to the R state with a certain probability

q_{v}

. Nodes in the R state will remain unchanged. The above process continues until no more changes occur in the states of any nodes.

3.2. Adaptive Graph Convolution Neural Network (AGCN)

The classical CNNs are extended to Graph Convolution Neural Networks (GCN) to exploit network topology information. Filters in most existing GCN models are designed to fix and share graph structures. As a result, it is difficult to handle graphs of different sizes and connectivity. To solve the above problems, Li et al. [6] proposed the adaptive GCN (AGCN) model that can accept different graph structures as input. The graph Laplacian matrix and representation vector of nodes are fed to the AGCN model. Then, the neighborhood topology information of nodes in social networks is extracted by the model to optimize the representation vector. The new representation vectors can improve the accuracy of downstream tasks.

4. The Proposed Method

This section illustrates our proposed algorithm in detail. To select influential nodes as a seed set, we have to measure the influence of nodes in social networks. Therefore, we use the deep learning framework to predict the influence diffusion of nodes by solving the regression problem. In general, we need to map nodes in graph data into vectors in Euclidean space without destroying the structural features of the nodes and the topological features in the network. Due to the complexity of social networks, it is no longer sufficient to rely on feature engineering to extract node features. Therefore, we employ a graph convolutional neural network (GCN) to automatically obtain the feature vector of the nodes incorporating node topological features in the network. The classical GCN can only take graphs with the same size and structure as input, which will greatly limit the generation of our algorithm. Therefore, the adaptive graph neural networks are used to solve this problem, so that our algorithm can be applied to a variety of networks. Most of the existing IM algorithms based on machine learning adopt the specific propagation model to calculate the influence propagation of nodes as the label of nodes [20,21,23]. However, some influence diffusion models tend to require many Monte Carlo simulations during influence diffusion, which is very time-consuming, especially when generating labels for many graphs. In addition, the distance between selected users should be considered so that the overlap between influences is minimal and the diffusion of influences is maximized. Inspired by the above analysis, we put forward a heuristic algorithm for generating labels based on neighborhood discount strategy in social networks.

As shown in Figure 1, our algorithm is divided into two phases: the training phase and the seed selection phase. In the training phase, we generate multiple BA networks and extract the representation vectors of the nodes in these networks. Then, we use a label generation algorithm to assign labels for the nodes. In Figure 1, the red node indicates a node that has been selected as a seed, while a yellow node represents one that is likely to be influenced by the seed nodes, and the influence of these nodes should be discounted when selecting the next seed. Afterward, we feed the representation vectors and labels into the AGCN model and use them to train the AGCN model. In the seed selection phase, we first extract the representation vectors of the nodes in the real network and then use these vectors as input to the well-trained model to predict the importance of nodes (pre_ndd_value). Finally, we select the top-k important nodes as the seed nodes.

4.1. Representation Vector

The AGCN model needs to take a unique representation of nodes as input. To gain the representation vector of nodes and accelerate the speed of the model, we meticulously selected several network characteristics, including the number of nodes and edges in the network, the average degree, and the average clustering coefficient of the network. These characteristics collectively provide information about the scale and density of the network. Moreover, to build a more comprehensive representation vector, we also selected several centrality metrics, specifically degree [11], betweenness [24], closeness [24], eigenvector [25], pagerank [12]. These centrality metrics allow for a quantitative assessment of node characteristics from both the global and local perspectives of the network. We aggregate these network characteristics and centrality metrics to form the node representation vector.

4.2. Label Generation

To complete the regression task, a well-defined label is needed to regress the eigenvalues of nodes. In this paper, a new heuristic algorithm called Neighbourhood discount (NDD) is proposed to calculate a new metric value (

n d d_v a l u e

) as the node’s label. This method can improve the efficiency of the IM algorithm and solve the overlapping influence problem well. The degree of a node can reflect the local connectivity of a node, which is an important metric to measure the importance of a node. However, selecting seed nodes based on the degree of nodes can lead to overlapping influence. For example, the degree of nodes u, v, and w are 500, 480, and 300, respectively. The nodes u and v have 480 common one-hop neighbors, while w and u have no common one-hop neighbors. However, when u is selected as the seed node, most of the neighbors of v can be activated by u, and if w is selected, there is a greater chance of activating new nodes. When a node s is selected as the seed node, our algorithm discounts its one-hop neighbor

N_{s}

. If

v \in N_{s}

, we can use Equation (2) to discount the remaining degree

d d_{v}

after the previous round of discount of v:

d d_{v} = d d_{v} - | {v | v \in ((N_{s} \cap N_{v}) ∖ h i s t o r y_{v} |)} |

(2)

where

h i s t o r y_{v}

represents the set of nodes in the neighbors of v that have been discounted, and it is updated by Equation (3):

h i s t o r y_{v} = h i s t o r y_{v} \cup (N_{s} \cap N_{v})

(3)

In our algorithm, we consider discounting the degree of nodes that have the same neighbors as the seed nodes to mitigate the issue of influence overlap. This is because if there is a high degree of neighbor overlap between seed nodes, the nodes activated by different seed nodes might be the same, which would reduce the overall influence spread of the seed set. By doing this, we can more effectively diffuse influence, avoid the waste of budget, and enhance the breadth and efficiency of influence propagation of the seed set on the network. We employed the NDD algorithm to guide the NDD_AGCN model in the prediction of node importance. The inherent properties of the NDD algorithm allow our model to consider the issue of influence overlap, which, in turn, leads to an improvement in the performance of the algorithm. Using the Fibonacci heap, the time complexity of our algorithm is

O (| V | l o g | V | + | E |)

. However, the time complexity of the label generation using the diffusion model is

O (| V | R | E |)

, where R represents the number of Monte Carlo simulations. By comparison, our algorithm demonstrates higher efficiency.

The pseudocode of the algorithm for label generation is given in Algorithm 1.

Algorithm 1: Neighbour domain discount (NDD)

4.3. Model

The representation vectors are generated by combining multiple network attributes and centrality metrics. In order to integrate more topological properties of nodes into the representation vector in Euclidean space, we use AGCN, which has superior performance in deep learning technology, to extract the structural features of nodes in the network. AGCN is an extended version of GCN that can accept networks of different sizes and structures as input. Based on this model, our proposed algorithm can be applied to the IM problem in different social networks.

The SGC-LL layer, which is the core of AGCN can be represented as follows.

Y = (U g_{θ} (Λ) U^{T} X) W + b

(4)

where U can be obtained by eigen decomposition of the updated graph Laplacian matrix

\hat{L}

.

W_{i} \in R^{d_{i - 1} \times d_{i}}

is a trainable transform matrix and

b_{i} \in R^{d_{i} \times 1}

is the bias. Spectral-domain convolution is carried out to convert the graph signal into a spectral domain signal. In order to represent the dependence of node features on different graphs, a transformation matrix and bias are introduced in the SGC-LL layer to the output features of spectral domain transformations. Specifically, the calculation of U is formulated as follows:

\hat{L} = U Λ U^{T}

(5)

L is a small shift from the original graph Laplacian L:

\hat{L} = L + α L_{r e s}

(6)

Here,

L_{r e s}

is a residual graph Laplacian matrix. To make feature extraction more suitable for downstream tasks, the Laplace matrix should be optimized by

L_{r e s}

. For graph-structured data, the Euclidean distance is no longer a good metric to measure vertex similarity. Therefore, based on the parameterized Mahalanobis distance, the most suitable distance indicator for the task is obtained through training. It can be obtained by:

D (x_{i}, x_{j}) = \sqrt{{(x_{i} - x_{j})}^{T} M (x_{i} - x_{j})}

(7)

D (x_{i}, x_{j})

denotes the Mahalanobis distance between

x_{i}

and

x_{j}

and is used to measure vertex similarity. In the AGCN model,

M = W_{d} W_{d}^{T}

is a symmetric positive semi-definite matrix, where

W_{d} \in R^{d \times d}

is a trainable matrix. Then, the Gaussian kernel can be obtained as follows:

G (x_{i}, x_{j}) = e x p (- D (x_{i}, x_{j}) / (2 σ^{2}))

(8)

After normalizing of

G

, we can obtain dense adjacency matrix

\hat{A}

and calculate the residual Laplace matrix

L_{r e s}

by:

L_{r e s} = I - D^{1 / 2} \hat{A} D^{- 1 / 2}

(9)

where D is a degree matrix extracted from

\hat{A}

. As can be seen from the calculation of

\hat{L}

,

\hat{L}

is a symmetric positive definite matrix. We can find the matrix U that satisfies Equation (5)

In Equation (4),

g_{θ} (Λ)

is a new spectral filter, which can be represented as:

g_{θ} (Λ) = \sum_{k = 0}^{K - 1} {(Λ)}^{k}

(10)

The SGC-LL layer leverages parameterized Mahalanobis distance to make the graph Laplacian matrix trainable. This innovation allows the layer to dynamically create a distinct graph for each input sample, regardless of its size or structure. The capability of the SGC-LL layer to generate these unique graphs ensures that our model can effectively learn the latent features of nodes in networks of different sizes and structures. As a result, the model is equipped to precisely predict the importance of nodes in networks.

In addition to the SGC-LL layer, there is a secondary layer called the graph max pooling layer. For feature

x_{v}

of node v, the pooling replaces the j-th feature

x_{v} (j)

with the maximum one among the j-th feature of nodes in

N_{v} \cup {v}

. Graph max pooling alleviates the sensitivity to the position of features, thereby enhancing the robustness of the model. In our model, we also add a batch normalization layer to standardize the input data distribution, thereby preventing gradient vanishing and accelerating the convergence of the model. The structure of AGCN is shown in Figure 2, we use the SGC-LL layers, graph max pooling layer, and batch normalization layer to construct our model called NDD_AGCN to solve the regression task. We integrate the SGC-LL layer, batch normalization layer, and graph max pooling layer as a combo. Through this integration, our model is able to predict the importance of nodes with both accuracy and speed, while also maintaining a high level of robustness. Five combos are used in the feature embedding phase, followed by a dense layer in the regression phase. Furthermore, we adopt the Mean-Squared Error (MSE) loss function to supervise model training, which is formulated as:

M S E = \frac{\sum_{i}^{n} {(y_{i} - l a b e l_{i})}^{2}}{n}

(11)

where n represents the number of samples.

y_{i}

and

l a b e l_{i}

are the predicted and real labels of the ith sample, respectively.

In this section, we propose a novel learning-based algorithm to illustrate our method for the IM problem. Our method consists of two phases, the training phase, and the seed section phase.

In the training phase, firstly, we generate three hundred BA networks as a training dataset. Secondly, we extract the representation vector of the nodes on each graph and calculate the label of the nodes by using the NDD algorithm. Our algorithm extracts explicit topological features of nodes in the network by constructing representation vectors. Finally, we feed representation vectors and labels to the NDD_AGCN model, which effectively captures the interactions between the nodes and the underlying network structure through its SGC-LL layers, graph max-pooling layers, and batch normalization layers. We utilize MSE loss to supervise the model training. The integration of explicit and implicit features, coupled with the guidance provided by the labels produced by the NDD algorithm, enables our algorithm, NDD_AGCNIM, to fully utilize the network’s topological characteristics for predicting the importance of nodes in networks. This training process is represented by Algorithm 2.

In the seed selection phase, we extract representation vectors of nodes in a target network by the NDD algorithm. The well-trained model NDD_AGCN^T is employed to predict the

p r e_n d d_v a l u e

in a target network. Having undergone training with labels as guidance, our model has developed the capability to map representation vectors to a measure of node importance. Consequently, we can conveniently utilize these vectors to obtain the

p r e_n d d_v a l u e

, which provides a straightforward measure of node importance. Once obtaining

p r e_n d d_v a l u e

, we select the top-k nodes as seed nodes based on the

p r e_n d d_v a l u e .

This process of predicting

p r e_n d d_v a l u e

is represented by Algorithm 3.

The framework of the NDD_AGCNIM algorithm is represented by Algorithm 4.

Algorithm 2: Model Training

Algorithm 3: Model Predicting

Input:

G (V, E)

,

N D D_A C G N^{T}

Output:

p r e_n d d_v a l u e

S = Ø

embedding ← extract representation vector of nodes in G

p r e_n d d_v a l u e = N D D_A C G N^{T}

(G, embedding)

return

p r e_n d d_v a l u e

Algorithm 4: NDD_AGCNIM Algorithm

5. Experiment Setup

5.1. Datasets

In this section, we perform experiments on eight real datasets to test the effectiveness of our methods. The basic topological properties of these datasets are shown in Table 1. Here,

< k >

means the average degree of a network. The number of nodes and edges in the selected datasets ranges from small-scale to large-scale networks, ensuring a diverse coverage of network sizes. Additionally, the average degree of these datasets varies, which reflects differences in their density. Furthermore, the networks are sourced from social platforms with distinct functions, indicating that they possess different topological structures. This variety in our dataset selection allows us to thoroughly evaluate the performance of our algorithm across a wide spectrum of social network characteristics. The description of these datasets is as follows.

(1): Filmtrust [26]: It is a small dataset crawled from the entire FilmTrust website in June 2011. FilmTrust is a platform that merges social networking with movie ratings and reviews, with its nodes corresponding to the users. the FilmTrust website and an edge represents a trust relationship between two users.
(2): Soc-wiki-vote [27]: The dataset contains all the Wikipedia voting data from the inception of Wikipedia till January 2008. Nodes in the network represent Wikipedia users and a directed edge $E (u, v)$ represents that user u voted for user v.
(3): Aren-email [28]: This is the email communication network at the University Rovira i Virgili in Tarragona in the south of Catalonia in Spain. Nodes are users and each edge indicates that at least one E-mail has been sent from one near to the other.
(4): stelzl [29]: This network represents the protein pairs that humans interact with.
(5): Hamster-friend [30]: This network contains friendships between users of the website hamsterster.com.
(6): Lastfm-asia [31]: A social network of LastFM users that was collected from the public API in March 2020. Nodes are LastFM users from Asian countries and edges are mutual follower relationships between them.
(7): CA-HepTh [32]: Arxiv HEP-TH (High Energy Physics—Theory) collaboration network is from the e-print arXiv and covers scientific collaborations between authors’ papers submitted to the High Energy Physics—Theory category. If an author u co-authored a paper with author v, the graph contains an undirected edge from u to v
(8): CA-HepPh [32]: it is from the e-print arXiv and covers scientific collaborations between authors’ papers submitted to High Energy Physics—Phenomenology category. If an author u co-authored a paper with author v, the graph contains an undirected edge from u to v.

In addition to the above datasets, we also generated 300 BA networks of different sizes and structures to train our model because most of the real-world social networks can be modeled using the BA network model [33].

5.2. Parameter Setting

For our proposed NDD_AGCN model, the dimension of the input layer is 9, while the output layer generates a one-dimensional output, representing the influence of the node corresponding to the representation vector. To obtain a good model, we use Adam optimizer to update the weights of our model, and the learning rate is set to 0.001. The epoch of training is set to 50. Our algorithm runs on the IC model and the SIR model. Furthermore, to demonstrate the stability of our algorithm, we categorize the Independent Cascade (IC) model into two categories for the specification of the activation probability p assigned on each edge:

Random IC(RIC)model: Each edge from u to v is stochastically assigned a probability $p \in [0, 0.5)$ as the probability for u to activate v.
Uniform IC(UIC)model: Each edge from u to v is assigned a fixed probability p for u to activate v

In our experiments, the number of seed set k varies from 5 to 50 and the number of Monte Carlo simulations for the selected seed sets is set to 10,000 in all cases. In the IC model, we use influence spread to assess the performance of all algorithms, while in the SIR model, we use influence scale to measure the performance of all algorithms. The meanings of these two metrics are as follows:

Influence spread. The expected number of nodes activated by the seed nodes.
Influence scale. The proportion of nodes infected by the seed nodes.

5.3. Baseline Methods

In this paper, we compare our method with five baseline methods to demonstrate the effectiveness of our algorithm. In order to enhance the diversity and reliability of the comparison algorithms, we have chosen four types of algorithms for our study: those based on deep learning, greedy strategies, metaheuristics, and heuristics. These categories represent a broad spectrum of the algorithms available for influence maximization. Their specific descriptions are as follows.

(1): SAW_ASA [14]: The algorithm uses the comprehensive scoring methodology MCDM to select important nodes as a candidate set on the dataset and simulated annealing algorithm to obtain optimal seed nodes from a candidate set.
(2): Pagerank [12]: This is an algorithm that obtains the node importance through iteration, which takes into account both one-hop and two-hop neighbors.
(3): degreeDiscount [10]: The method leverages discounted degrees to sort the nodes in datasets and select seed sets.
(4): NCVoteRank [13]: The algorithm extends the vote rank method, which considers the k-shell value of neighbors of a node to decide the voting power of this node.
(5): DeepIM [22]: The algorithm can learn latent representations of seed sets and end-to-end propagation models to solve the IM problem.
(6): CELF [2]: This is a greedy algorithm with performance close to the optimal solution, but it comes with a high time cost.

6. Experiment Results and Analysis

6.1. Performance Verification of Different Label Generation Algorithms

We employed PageRank, degree, and the NDD algorithm as label generation methods to produce large-scale labeled data for training the AGCN model. We denote the models trained with these three different label generation algorithms as PR_AGCN, DE_AGCN, and NDD_AGCN, respectively. Figure 3 shows the experimental results on the influence spread for these three models under the UIC model. It can be seen from Figure 3 that our proposed label generation algorithm performs better than DE_AGCN and PR_AGCN on most datasets. Especially, in CA-HepTh the initial seed set that is selected by our model (NDD_AGCN) has the lowest influence spread, but the performance of our model grows faster than the other two models as the number of seed nodes increases. Therefore, when the number of seed nodes exceeds 15, the influence of our model spreads more widely than the other algorithms. From the above discussion, it can be concluded that our label-generation algorithm is effective and suitable for our model.

6.2. Performance Comparison between Adaptive Graph Neural Networks and Regular Graph Neural Networks

To substantiate the superiority of AGCN over the standard GCN within the framework of our algorithm, we carried out a comparative experiment. The experiment involved labeling with NDD and subsequently training both the AGCN and GCN models. On each of the four datasets, the models were tasked with selecting 10, 20, 30, 40, and 50 seed nodes. The objective was to compare the influence spread of these seeds as measured by the UIC model. The result is shown in Table 2. AGCN demonstrates superior performance across all datasets compared to GCN. This is primarily because AGCN is capable of learning the latent features of nodes from training data with diverse sizes and structures, enabling it to effectively transfer knowledge. In contrast, GCN does not possess this adaptability, which is crucial for the observed performance differences.

6.3. The Comparison Results of the Influence Overlapping Index on Different Algorithms under UIC Model

To demonstrate that our algorithm can alleviate the issue of influence overlap, we have introduced a new metric called the Influence Overlapping Index, which quantifies the degree of influence overlap within a seed set. The formula of the Influence Overlapping Index is as follows:

I O I (S) = \frac{\sum_{s \in S} σ_{M} (s) - σ_{M} (S)}{\sum_{s \in S} σ_{M} (s)}

(12)

where

\sum_{s \in S} σ_{M} (s)

denotes the cumulative influence spread of all individual seeds within the seed set S, while

σ_{M} (S)

represents the collective influence spread of the seed set S as a whole. The difference between these two values yields the amount of overlapping influence. It is crucial to acknowledge that the total influence spread can differ substantially across various seed sets. Consequently, a direct subtraction to determine influence overlap might not provide a fair comparison. For example, consider a scenario where seed set S has a total individual influence spread of 100,000, with an overlap of

\sum_{s \in S} σ_{M} (s) - σ_{M} (S) = 10

, in contrast, seed set

S 1

has a total individual influence spread of 10, with an overlap of

\sum_{s \in S 1} σ_{M} (s) - σ_{M} (S 1) = 9

. Although

S_{1}

experiences less absolute overlap, the relative impact of the overlap on its influence spread is more significant. To address this issue, we advocate for the use of the influence overlapping Index, which normalizes the overlap by the total influence spread of the seed set. This approach ensures a more equitable and meaningful comparison of influence overlap across different seed sets.

Table 3 shows a clear comparison of the performance of our algorithm, NDD, against two other methods on the UIC model across four datasets. The methods compared are the degree centrality algorithm without a discount mechanism and diffusion model centrality (DMC). DMC is a method that assesses node importance by calculating the influence spread of individual nodes using the IC model, and it is frequently utilized for generating labels. The primary focus of the comparison is on the IOI of the seed nodes selected in quantities of 10, 20, and 30. From Table 3, it is evident that the IOI of our method is the smallest across all four datasets, indicating that our algorithm effectively mitigates the issue of influence overlapping. In contrast, the degree centrality without the discounting method shows a significantly higher IOI. Furthermore, the DMC algorithm, which is commonly used for labeling, suffers from severe influence overlap issues. This underscores the necessity for the development of our new labeling method.

6.4. The Comparison Results of the Influence Spread on Different Algorithms under IC Model

To prove the effectiveness of our proposed algorithm, we compared the influence spread of different algorithms under UIC and RIC models, respectively. Additionally, to validate the performance of our NDD label generation algorithm, we incorporated the performance of the top-k seeds, chosen according to the ndd_value, into the comparison under the UIC model. Figure 4 shows the influence spread of different algorithms on different real-world datasets under the UIC model. As we can see from Figure 4a, our algorithm NDD_AGCNIM outperforms all other algorithms. Our label generation algorithm NDD stands as the second-best, only surpassed by the NDD_AGCNIM algorithm. Due to the small size of the FilmTrust network, the search space for the IM problem is limited, so the performance of SAW_ASA catches with our algorithm. The CELF algorithm has been surpassed by several algorithms, which shows that improving algorithm accuracy on small datasets is easier to achieve. The performance of DeepIM is the poorest, primarily due to inadequate fitting of the propagation model within a limited number of iterations. Figure 4b shows the results obtained over the Soc-wiki-Vote network. For the Soc-wiki-Vote network, our algorithm NDD_AGCNIM achieved comparable results to CELF. Our algorithm NDD also has maintained a high level of effectiveness, placing third in performance, surpassed only by the CELF algorithm and our algorithm NDD_AGCNIM. It is followed by DeepIM, SAW_ASA, and DegreeDiscount. Although DeepIM initially exhibits poor performance, it shows the most significant improvement as the number of seed nodes increases. DegreeDiscount has good performance because of the low average degree and small size of the Soc-wiki-Vote network. For the Aren-email network, from Figure 4c, we can see that although the CELF and DeepIM algorithms are superior to our algorithm, they require more time. Additionally, the performance of the DeepIM algorithm is influenced by the randomness in the training set, leading to unstable performance. We can also find that our algorithms NDD_AGCNIM and NDD both perform best and have an absolute advantage over other algorithms, which suggests that our algorithm can overcome influence overlapping. Figure 4d shows that our algorithm NDD_AGCNIM is superior to the other algorithms on the Stelzl network. It is followed by CELF, SAW_ASA, and PageRank. Nonetheless, our algorithm NDD is not advantageous on this specific dataset, likely due to the less significant nature of the influence overlapping problem within this dataset. As can be investigated in Figure 4e, for the hamster friend network, our algorithm achieves the best influence spread as the seed set size increases and closely follows DeepIM. Conversely, regardless of how the seed set size changes, the influence spread of other algorithms has remained stable at around 822. Additionally, our algorithm NDD has surpassed the performance of all other heuristic algorithms. Figure 4f shows that for Lastfm_asia networks, both of our algorithms have shown stability and have outperformed the other algorithms. NCVoteRank, PageRank, and SAW_ASA achieved second, third, and fourth performance, respectively. None of these three methods are very stable. The worst performer is the DeepIM algorithm. Figure 4g shows that for Ca-HepTH networks, the influence spread obtained by our algorithm NDD_AGCNIM is not high at first, but with the increase in the number of seed nodes, the performance of our algorithm improves rapidly. When the size of the seed set

k > 40

, our algorithm obtains the highest influence spread. Our algorithm NDD continues to maintain a competitive level of performance. Finally, Figure 4h clearly shows that the performance of our algorithm NDD_AGCNIM and NDD continue to improve as the number of seed nodes increases. Compared to other methods, our algorithm achieves overwhelming results. This also suggests that our algorithm can overcome influence overlapping.

Figure 5 shows the influence spread of different algorithms on eight real-world datasets under the UIC model. From Figure 5a, it is clear that our algorithm achieves comparable performance to CELF’s algorithm and is superior to other algorithms. SAW_ASA catches up with degreeDiscount and PageRank achieves fourth performance. It is worth noting that the DeepIM algorithm exhibits the poorest performance. In addition, as can be seen from Figure 5b, the performance of our algorithm is still closely related to the CELF algorithm in the Soc-wiki-Vote dataset. However, DeepIM is ranked third, just behind our algorithm, due to its significant performance improvement. While CELF shows absolute competitiveness in the face of activation saturation scenarios as shown in Figure 5c, its efficiency is too low to be applied to large-scale datasets. As shown in Figure 5d, our algorithm ranks second only to CELF. Although the DeepIM algorithm initially performed better than ours, as the number of nodes increased, our algorithm outperformed DeepIM in performance. Figure 5e–h shows that all algorithms can achieve similar performance on Aren-email, Hamster-friend, Ca-HepPh, and Lastfm-asia networks because of high activation probability and high average degree. Figure 5g shows that our algorithm not only achieves the maximum influence spread but also the fastest growth rate on Stelzl and CA-HepTH. In particular, the degreeDiscount algorithm performs poorly on most datasets since it assumes that all edges are assigned the same random activation probability.

From Figure 4 and Figure 5, it can be inferred that our algorithm can achieve good performance under both models. Especially, under the UIC model our algorithm is superior to other algorithms because our algorithm makes full use of the topology of the graph. Our algorithm can achieve knowledge transfer, allowing training on known networks and predicting on unknown networks. Consequently, our algorithm exhibits high efficiency. Furthermore, DeepIM seeks to model the process of influence diffusion by using training data that consists of randomly sampled seed nodes and the resulting influence spread from the model as labels. Due to the enormous space of potential seed sets, the quality of this random sampling is unpredictable, which can result in performance instability. Unlike DeepIM, our algorithm is designed to estimate the importance of individual nodes without relying on random sampling for training data. Additionally, since DeepIM cannot achieve knowledge transfer, training a neural network model is required for each chosen fixed number of seed nodes on every graph, leading to a decrease in seed selection efficiency. CELF still achieves unquestionable effectiveness, but its efficiency is unacceptable.

Figure 4. The influence spread of different algorithms obtained by IC models on different real-world datasets, p = 0.1.

Figure 5. The influence spread of different algorithms obtained by IC models on different real-world datasets, where p is randomly assigned.

6.5. The Comparison Results of the Influence Spread on Different Algorithms SIR Model

To illustrate the generalizability of our algorithm, we conducted comparative experiments under the SIR model as well. Figure 6 shows the influence scale of different algorithms on different real-world datasets under the SIR model. As shown in Figure 6a, our algorithm achieves the best performance and was followed by degreeDiscount. SAW_ASA has shown competitively effective results. Figure 6b shows our algorithm exhibits the best performance on the Soc-wiki-Vote dataset. At the same time, DeepIM once again demonstrates its characteristic of improving effectiveness rapidly with the increase in seed set size. In Figure 6c for the Aren-email dataset, our algorithm continues to maintain the maximum advantage in terms of performance. It is noteworthy that the DeepIM algorithm exhibits the maximum improvement with the increase in the seed set. At a seed set size of 50, DeepIM’s performance is on par with our algorithm. As can be seen from Figure 6d, our algorithm achieves the best performance, with PageRank and degreeDiscount ranking second and third, respectively. SAW_ASA performs poorly because its fitness function is no longer suitable for the SIR model. Figure 5e–h show that when the activation probability on the edges is high, most algorithms face the issue of influence overlapping, resulting in deteriorated results. However, our algorithm takes into account the issue of influence overlapping, and with the increase in the number of seeds, the number of activated seeds continues to grow. DeepIM is also unaffected by the issue of influence overlapping, and its performance continues to improve. However, despite its initial poor performance, it ultimately succumbs to our algorithm. The comparison results are depicted in Figure 6.

6.6. The Comparison Results of the Influence Spread on Different Algorithms under Different Activation Probabilities

We analyzed the influence of the activation probability on the performance of the algorithm by calculating the influence spread of all algorithms using the IC model on four datasets. We varied the activation probability from 0.05 to 0.23 in increments of 0.03 and compared the results. The number of seed nodes is set to 40. The result is shown in Figure 7. The compactness of the lines in Figure 7b,c is due to the large range of the y-axis scale. Figure 7 shows that as the activation probability increases, the influence spread for all algorithms increases, but at a decreasing rate. This aligns with the observation that as the activation probability approaches 1, the influence spread for all algorithms tends towards the total number of nodes in the network. While our algorithm may not initially have a competitive edge at very low activation probabilities, its performance rapidly improves with the increase in activation probability and thereafter consistently maintains the best performance. Specifically, the degreeDiscount algorithm exhibits strong performance at lower activation probabilities, in line with the expectations set forth by its authors.

6.7. Running Time of All Algorithms

To verify the efficiency of our algorithm, we compared the running time of selecting 50 nodes in eight real datasets with the NDD_AGCNIM algorithm, NDD algorithm, and other comparison algorithms. From Figure 8, our algorithm NDD_AGCNIM exhibits lower efficiency than only degreeDiscount and PageRank heuristics across all datasets because these two algorithms sacrifice effectiveness for efficiency. CELF achieves good performance, but from Figure 8, it can be observed that its running speed is several orders of magnitude slower than other algorithms. We excluded the time for DeepIM to generate the dataset and only compared the time it took for seed selection. Its efficiency is still lower than that of most algorithms because it requires training a model for each seed selection. Additionally, the efficiency of our NDD algorithm is on par with the two heuristic algorithms. While it is somewhat less efficient than PageRank on smaller datasets, it demonstrates notably higher efficiency on larger datasets. Despite being less efficient than the DegreeDiscount algorithm, our NDD algorithm produces superior outcomes.

Figure 6. The influence scale of different algorithms obtained by SIR models on different real-world datasets, p = 0.1, q = 1.

Figure 7. The influence spread of different algorithms obtained by IC models with p from 0.05 to 0.23 incremented by 0.02.

Figure 8. Running time of all algorithms on eight real-world datasets.

6.8. Discussion

We provide a comprehensive discussion of our research. Our proposed influence maximization algorithm can achieve effective training on known networks and predictions on unknown networks. This characteristic makes our approach more suitable for addressing influence maximization problems on a large-scale network. We can train a well-performing model on a known small-scale social network and then apply this model to a large-scale network. Our algorithm is more suitable for scenarios with fixed activation probabilities because the model of our algorithm primarily focuses on extracting the topological features of the network, rather than emphasizing the activation probabilities between nodes. In future work, we will more thoroughly integrate activation probabilities into the extraction of latent features, allowing our algorithm to adapt to a wider range of application scenarios. Additionally, our algorithm employs a deep learning approach, where the primary time consumption is attributed to the training process. Theoretically, the larger the training dataset, the more time is required for training, which consequently leads to a more accurate model. Moreover, a common characteristic of our algorithm, as well as the other algorithms mentioned, is that the time consumed tends to increase with the size of the network.

7. Conclusions

Influence maximization is an important research problem in the field of information diffusion analysis in social networks. It is very crucial to find the top-k influential nodes in the IM problem. In this work, we transform the IM problem into a regression problem and propose an efficient algorithm (NDD_AGCNIM) based on an adaptive graph convolutional neural network to solve this problem. We first extract multiple attribute features of the network and some heuristic measures and then embed nodes in the network to get their representation vectors in the Euclidean space. Meanwhile, we propose a new heuristic algorithm (NDD) that uses discounted degrees as labels. Our proposed model is trained by combining embeddings and labels into a training dataset and then applying the trained model to predict the influence of nodes in the target network. We evaluate empirically our method on eight real-world datasets and compare it with five baseline methods. Experimental results have proved that NDD is suitable for our model in generating labels, and NDD_AGCNIM is more efficient and effective than state-of-the-art algorithms. Since our method is based on a greedy framework, if its objective function is monotonic and submodular, it will guarantee an approximate ratio of

1 - 1 / e

. It is our further work to study the monotonicity and the submodularity of the objective function to ensure the approximate ratio of the method. Moreover, as a part of future work, a deep reinforcement algorithm based on the proposed model is considered to solve the topic-aware IM problem under competitive social networks.

Author Contributions

Conceptualization, W.L.; Methodology, W.L.; Software, S.W.; Data curation, S.W.; Writing—original draft, S.W.; Writing—review and editing, J.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Chinese National Natural Science Foundation grant number 61971233, 61702441.

Data Availability Statement

The authors confirm that the data supporting the findings of this study are available within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Domingos, P.; Richardson, M. Mining the network value of customers. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001; pp. 57–66. [Google Scholar]
Kempe, D.; Kleinberg, J.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar]
Newman, M.E. Spread of epidemic disease on networks. Phys. Rev. E 2002, 66, 016128. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Gao, H.; Gao, Y.; Guo, J.; Wu, W. A survey on influence maximization: From an ml-based combinatorial optimization. ACM Trans. Knowl. Discov. Data 2023, 17, 1–50. [Google Scholar] [CrossRef]
Kumar, S.; Mallik, A.; Panda, B. Influence maximization in social networks using transfer learning via graph-based LSTM. Expert Syst. Appl. 2023, 212, 118770. [Google Scholar] [CrossRef]
Li, R.; Wang, S.; Zhu, F.; Huang, J. Adaptive graph convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; Glance, N. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 420–429. [Google Scholar]
Goyal, A.; Lu, W.; Lakshmanan, L.V. Celf++ optimizing the greedy algorithm for influence maximization in social networks. In Proceedings of the 20th International Conference Companion on World Wide Web, Hyderabad, India, 28 March–1 April 2011; pp. 47–48. [Google Scholar]
Zhou, C.; Zhang, P.; Zang, W.; Guo, L. On the upper bounds of spread for greedy algorithms in social network influence maximization. IEEE Trans. Knowl. Data Eng. 2015, 27, 2770–2783. [Google Scholar] [CrossRef]
Chen, W.; Wang, Y.; Yang, S. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 199–208. [Google Scholar]
Das, K.; Samanta, S.; Pal, M. Study on centrality measures in social networks: A survey. Soc. Netw. Anal. Min. 2018, 8, 1–11. [Google Scholar] [CrossRef]
Brin, S. The PageRank citation ranking: Bringing order to the web. Proc. ASIS 1998 1998, 98, 161–172. [Google Scholar]
Kumar, S.; Panda, B. Identifying influential nodes in Social Networks: Neighborhood Coreness based voting approach. Phys. A Stat. Mech. Its Appl. 2020, 553, 124215. [Google Scholar] [CrossRef]
Biswas, T.K.; Abbasi, A.; Chakrabortty, R.K. An MCDM integrated adaptive simulated annealing approach for influence maximization in social networks. Inf. Sci. 2021, 556, 27–48. [Google Scholar] [CrossRef]
Wang, P.; Zhu, Z.; Wang, Y. A novel hybrid MCDM model combining the SAW, TOPSIS and GRA methods based on experimental design. Inf. Sci. 2016, 345, 27–45. [Google Scholar] [CrossRef]
Kumar, S.; Singhla, L.; Jindal, K.; Grover, K.; Panda, B. IM-ELPR: Influence maximization in social networks using label propagation based community structure. Appl. Intell. 2021, 51, 7647–7665. [Google Scholar] [CrossRef]
Qiu, L.; Tian, X.; Zhang, J.; Gu, C.; Sai, S. LIDDE: A differential evolution algorithm based on local-influence-descending search strategy for influence maximization in social networks. J. Netw. Comput. Appl. 2021, 178, 102973. [Google Scholar] [CrossRef]
Singh, S.S.; Singh, K.; Kumar, A.; Biswas, B. ACO-IM: Maximizing influence in social networks using ant colony optimization. Soft Comput. 2020, 24, 10181–10203. [Google Scholar] [CrossRef]
Wang, B.; Ma, L.; He, Q. IDPSO for Influence Maximization under Independent Cascade Model. In Proceedings of the 2022 4th International Conference on Data-Driven Optimization of Complex Systems (DOCS), Chengdu, China, 28–30 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
Zhang, C.; Li, W.; Wei, D.; Liu, Y.; Li, Z. Network dynamic GCN influence maximization algorithm with leader fake labeling mechanism. IEEE Trans. Comput. Soc. Syst. 2022, 10, 3361–3369. [Google Scholar] [CrossRef]
Kumar, S.; Mallik, A.; Khetarpal, A.; Panda, B. Influence maximization in social networks using graph embedding and graph neural network. Inf. Sci. 2022, 607, 1617–1636. [Google Scholar] [CrossRef]
Ling, C.; Jiang, J.; Wang, J.; Thai, M.T.; Xue, R.; Song, J.; Qiu, M.; Zhao, L. Deep graph representation learning and optimization for influence maximization. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 21350–21361. [Google Scholar]
Hussain, O.A.; Zaidi, F. Influence maximization in complex networks through supervised machine learning. In Proceedings of the International Conference on Complex Networks and Their Applications, Madrid, Spain, 30 November–2 December 2021; Springer: Cham, Switzerland, 2021; pp. 217–228. [Google Scholar]
Freeman, L.C. Centrality in social networks: Conceptual clarification. Soc. Netw. Crit. Concepts Sociol. Lond. Routledge 2002, 1, 238–263. [Google Scholar] [CrossRef]
Zaki, M.J.; Meira, W. Data Mining and Analysis: Fundamental Concepts and Algorithms; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Guo, G.; Zhang, J.; Yorke-Smith, N. A Novel Bayesian Similarity Measure for Recommender Systems. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 3–9 August 2013; pp. 2619–2625. [Google Scholar]
Leskovec, J.; Huttenlocher, D.; Kleinberg, J. Signed networks in social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Atlanta, GA, USA, 10–15 April 2010; ACM: New York, NY, USA, 2010; pp. 1361–1370. [Google Scholar]
Guimerà, R.; Danon, L.; Díaz-Guilera, A.; Giralt, F.; Arenas, A. Self-similar Community Structure in a Network of Human Interactions. Phys. Rev. E 2003, 68, 065103. [Google Scholar] [CrossRef] [PubMed]
Stelzl, U.; Worm, U.; Lalowski, M.; Haenig, C.; Brembeck, F.H.; Goehler, H.; Stroedicke, M.; Zenkner, M.; Schoenherr, A.; Koeppen, S.; et al. A Human Protein–Protein Interaction Network: A Resource for Annotating the Proteome. Cell 2005, 122, 957–968. [Google Scholar] [CrossRef] [PubMed]
Kunegis, J. KONECT—The Koblenz Network Collection. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 1343–1350. [Google Scholar]
Rozemberczki, B.; Sarkar, R. Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), Virtual Event, Ireland, 19–23 October 2020; ACM: New York, NY, USA, 2020; pp. 1325–1334. [Google Scholar]
Leskovec, J.; Kleinberg, J.; Faloutsos, C. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data (TKDD) 2007, 1, 2-es. [Google Scholar] [CrossRef]
Barabási, A.L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The Framework of Influence Maximization Algorithm Based on Adaptive Graph Convolution Neural Network.

Figure 2. Overall model architecture.

Figure 3. The influence spread of the label training model produced by different algorithms in the IC model, p = 0.1.

Table 1. The detailed information of datasets, including dataset name, number of nodes, number of edges, and average degree.

Dataset	Nodes	Edges	$< k >$
Filmtrust	874	1853	4.2
Soc-wiki-Vote	889	2900	6
Aren-email	1122	5451	10
Stelzl	1706	6207	7.3
Hamster-friend	2952	12,534	8.5
Lastfm-asia	7624	27,806	5.1
CA-HepTh	9877	25,998	11
CA-HepPh	12,008	118,521	4.1

Table 2. The performance of AGCN and GCN.

k	Filmtrust		Soc-wiki-Vote		Aren-Email		Stelzl
	AGCN	GCN	AGCN	GCN	AGCN	GCN	AGCN	GCN
10	55.676	42.678	178.616	176.268	394.854	385.893	156.364	143.751
20	73.750	63.438	193.195	188.126	403.264	393.11	181.092	156.33
30	89.207	75.961	210.601	202.61	412.421	398.699	202.835	171.522
40	104.246	88.057	225.516	212.998	421.459	403.518	224.003	180.525
50	118.597	101.684	234.898	227.48	429.364	409.148	238.425	189.587

Table 3. The influence overlapping index of different label generation algorithms in the UIC model.

Method	Filmtrust			Soc-wiki-Vote
	10	20	30	10	20	30
NDD	0.523	0.710	0.703	0.838	0.919	0.940
DE	0.731	0.812	0.81	0.870	0.944	0.958
DMC	0.788	0.857	0.865	0.883	0.949	0.963
	Aren-email			Stelzl
	10	20	30	10	20	30
NDD	0.890	0.958	0.973	0.854	0.923	0.932
DE	0.894	0.962	0.975	0.856	0.924	0.935
DMC	0.894	0.963	0.976	0.864	0.942	0.955

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, W.; Wang, S.; Ding, J. Influence Maximization Based on Adaptive Graph Convolution Neural Network in Social Networks. Electronics 2024, 13, 3110. https://doi.org/10.3390/electronics13163110

AMA Style

Liu W, Wang S, Ding J. Influence Maximization Based on Adaptive Graph Convolution Neural Network in Social Networks. Electronics. 2024; 13(16):3110. https://doi.org/10.3390/electronics13163110

Chicago/Turabian Style

Liu, Wei, Saiwei Wang, and Jiayi Ding. 2024. "Influence Maximization Based on Adaptive Graph Convolution Neural Network in Social Networks" Electronics 13, no. 16: 3110. https://doi.org/10.3390/electronics13163110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Influence Maximization Based on Adaptive Graph Convolution Neural Network in Social Networks

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Information Diffusion Model

3.2. Adaptive Graph Convolution Neural Network (AGCN)

4. The Proposed Method

4.1. Representation Vector

4.2. Label Generation

4.3. Model

5. Experiment Setup

5.1. Datasets

5.2. Parameter Setting

5.3. Baseline Methods

6. Experiment Results and Analysis

6.1. Performance Verification of Different Label Generation Algorithms

6.2. Performance Comparison between Adaptive Graph Neural Networks and Regular Graph Neural Networks

6.3. The Comparison Results of the Influence Overlapping Index on Different Algorithms under UIC Model

6.4. The Comparison Results of the Influence Spread on Different Algorithms under IC Model

6.5. The Comparison Results of the Influence Spread on Different Algorithms SIR Model

6.6. The Comparison Results of the Influence Spread on Different Algorithms under Different Activation Probabilities

6.7. Running Time of All Algorithms

6.8. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI