Substitute Seed Nodes Mining Algorithms for Influence Maximization in Multi-Social Networks

Rao, Xuli; Zhao, Jiaxu; Chen, Zhide; Lin, Feng

doi:10.3390/fi11050112

Open AccessArticle

Substitute Seed Nodes Mining Algorithms for Influence Maximization in Multi-Social Networks

by

Xuli Rao

^1,2,

Jiaxu Zhao

^1,*

,

Zhide Chen

² and

Feng Lin

¹

Department of Computer Science, Fuzhou Polytechnic, Fuzhou 350108, Fujian, China

²

College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350007, Fujian, China

^*

Author to whom correspondence should be addressed.

Future Internet 2019, 11(5), 112; https://doi.org/10.3390/fi11050112

Submission received: 15 March 2019 / Revised: 10 April 2019 / Accepted: 5 May 2019 / Published: 10 May 2019

(This article belongs to the Section Techno-Social Smart Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Due to the growing interconnections of social networks, the problem of influence maximization has been extended from a single social network to multiple social networks. However, a critical challenge of influence maximization in multi-social networks is that some initial seed nodes may be unable to be active, which obviously leads to a low performance of influence spreading. Therefore, finding substitute nodes for mitigating the influence loss of uncooperative nodes is extremely helpful in influence maximization. In this paper, we propose three substitute mining algorithms for influence maximization in multi-social networks, namely for the Greedy-based substitute mining algorithm, pre-selected-based substitute mining algorithm, and similar-users-based substitute mining algorithm. The simulation results demonstrate that the existence of the uncooperative seed nodes leads to the range reduction of information influence. Furthermore, the viability and performance of the proposed algorithms are presented, which show that three substitute node mining algorithms can find suitable substitute nodes for multi-social networks influence maximization, thus achieves better influence.

Keywords:

multi-social networks; influence maximization; substitute mining algorithm

1. Introduction

In recent years, various social networks [1] offering different services have appeared on the Internet, e.g., Facebook, Twitter, YouTube and Foursquare. For the purpose of enjoying different services, some users register accounts on different social networks simultaneously. Different from the traditional connections with other users, they have not only social links among users within one social network, but also crossing links connecting accounts between different social networks. Multi-social networks are connected by these shared users [2]. Due to their obvious advantages in spreading information, as exemplified by the high speed, low cost and wide influence, social networks attract more and more attention of researchers, which results in many hot topics, involving social information spreading [3], public opinion [4], Internet marketing [5] and so on. The influence maximization in a social network has been extensively studied in the academic community [6,7,8,9,10], which is the problem of finding a small subset of nodes in a social network that could maximize the spread of influence [7]. However, the existing influence maximization research for the single social network cannot be directly adapted to the multi-social networks due to their heterogeneous architecture and complex connections among users. The users with multiple social network accounts can spread information from social network A to social network B. An example is given in Figure 1, a user who is registered on Twitter and Facebook has the ability to forward the information on Twitter to Facebook, which demonstrates the scope of information spreading no longer confined within the single social network. This factor leads to the different characteristics of information spreading as before and needs to be considered in the influence of maximization research.

To maximize the influence spreading in multi-social networks, the researchers in References [11,12,13,14,15,16] focus on finding k (

k \in N^{*}

) users to constitute the initial active seed node set S to spread information, the expected number of active nodes R(S) at the end of a diffusion process in multi-social networks can be maximized. Due to budgetary constraints, the size of S is usually fixed in existing research, i.e., k is fixed. However, when we get S through complex theoretical calculations, some initial seed nodes in the set S may be difficult to activate for some reasons. In the end, a broken S cannot get the maximum influence, resulting in the low influence spreading performance. For example, in social network advertising marketing, one user has high influence in social networks, but for some reasons, he is unwilling to post advertisements, which may lead to advertising marketing failure. Therefore, when some users in S cannot be activated, it is very necessary to find substitutes for these nodes to reduce the loss of influence.

Some researchers have already done some work on this subject. Li et al. [17] proposed the idea of finding “successors”. When a seed node cannot be activated, the neighbor node is selected as a substitute. Ma et al. [18] named this problem as the Substitutes Discovery in Influence Maximization and proposed three solving algorithms based on the Greedy algorithms. However, these two works aim at solving the substitute mining problem in a single social network, which cannot adapt to the information spread in multi-social networks directly. In this paper, we discuss the information spread across social networks and focus on solving the problem of multi-social networks influence maximization. Specifically, the contributions of this paper are summarized as follows:

(1): We analyze the characteristics of information spreading in multi-social networks and define the special use with multiple social network accounts as Bridge User, which plays an important role in forwarding information from one social network to another.
(2): We discuss the problem of substitutes mining for multi-social networks influence maximization (SMMNIM), which aims to find substitutes and to reduce the loss of information spread caused by the uncooperative seed nodes.
(3): Three algorithms are proposed to solve SMMNIM in this paper for different application scenarios.

The rest of the paper is organized as follows. Section 2 introduces the special type of nodes named as Bridge Users, and the multi-social networks influence maximization problem. Section 3 presents the substitutes mining for multi-social networks influence maximization. In Section 4, substitutes seed nodes mining algorithms are proposed. The viability and performance of the proposed algorithms are evaluated in Section 5. Finally, we conclude this paper in Section 6.

2. Influence Maximization in Multi-Social Networks

2.1. Definition and Characteristics of Bridge User

Suppose there are two social networks represented as

G_{1}

and

G_{2}

. As shown in Figure 2, in a previous social network, the scope of influence is limited in this social network. However, in today’s social networks, as we discussed earlier, there exists special social network users who register for accounts on multiple social networks can break this limit, we define this type of user as a Bridge User. The red node in Figure 2 and Figure 3 represents a Bridge User, he/she can forward the information cross-platform. The detailed process is shown in Figure 3, after receiving the information in

G_{1}

, the Bridge User forwards the information to

G_{2}

with a certain probability

p_{\to G_{2}}

, which makes information flow from

G_{1}

to

G_{2}

.

Assuming we have multiple social networks

G_{1} (V_{1}, E_{1})

,

G_{2} (V_{2}, E_{2})

,

G_{3} (V_{3}, E_{3})

, …,

G_{n} (V_{n}, E_{n})

, which are made of m social network users

U = {u^{1}, u^{2}, u^{3}, \dots, u^{m}}

and their relationships. We use

u_{i}^{j}

to indicate that user

u^{j}

participates in

G_{j}

(owns an account of

G_{j}

), so we have

V_{i} = {u_{i}^{1}, u_{i}^{2}, u_{i}^{3}, \dots, u_{i}^{| V_{i} |}

}. BU denotes the set of all Bridge Users.

Definition 1.

Self-spread of Bridge User: In multiple social networks, if

u^{j} \in B U

and participates in

G_{p}

and

G_{q}

at the same time, when

u_{p}^{j}

is activated by the information, it has a certain probability of self-spread

p_{\to G_{q}}^{u^{j}}

to spread the information to

G_{q}

and activate

u_{q}^{j}

.

Definition 2.

Self-spread probability matrix of Bridge User:

P_{m \times n}

indicates the self-spread probability matrix of Bridge User

u^{i}

, where

P_{i j}

is the probability of self-spread

p_{\to G_{j}}^{u^{i}}

. The value of

P_{i j}

depends on the historical self-spread actions, calculated as follows:

P_{i j} = \frac{N_{i j}}{\sum_{k = 0}^{n} N_{i k}}

(1)

where

N_{i j}

denotes the number of self-spread actions that

u^{i}

spread the information to

G_{j}

, and

\sum_{k = 0}^{n} N_{i k}

represents the total number of self-spread actions.

2.2. Multi-Social Networks Influence Maximization

In the background of various social networks coexisting on the Internet, information can spread between multiple social networks

G_{1} (V_{1}, E_{1})

,

G_{2} (V_{2}, E_{2})

,

G_{3} (V_{3}, E_{3})

, …,

G_{n} (V_{n}, E_{n})

. Each node in

G_{i}

has two states: activated or inactivated. The activated nodes are those who have affected by the information and will spread the information to their neighbors in the next time period. The inactivated nodes are those who have not heard of the information or rejected to adopt it [19]. A node whose state is inactive can be activated by activated nodes, and a node that is in an activated state cannot be changed into an inactivated state, that is, the activation process is irreversible.

The problem of Multi-Social Networks Influence Maximization is to find k (k is an integer and satisfies k >= 1) seed users from all the users

U = {u^{1}, u^{2}, u^{3}, \dots, u^{m}}

of

G_{1} (V_{1}, E_{1})

,

G_{2} (V_{2}, E_{2})

,

G_{3} (V_{3}, E_{3})

, …,

G_{n} (V_{n}, E_{n})

, to constitute the initial seed node set S, such that by activating these users in S, the number of users that are ultimately activated is the most.

R_{i} (S)

indicates the number of users that are ultimately activated on

G_{i}

. Hence, the problem of multi-social networks influence maximization can be defined as

S e e d = \underset{S \subseteq U}{a r g m a x} \sum_{i = 1}^{n} R_{i} (S)

(2)

where

S e e d

is expected set of the initial nodes and

| S e e d | = k, S e e d \subseteq U

.

2.3. Multi-Social Networks Aggregation Algorithm based on Bridge Users

In order to solve the problem of multi-social networks influence maximization, this section proposes a multi-social networks aggregation algorithm based on a Bridge User, to aggregate multiple social network graphs into one graph, and then we can apply some widely explored algorithms (such as Greedy [6], CELF [20], CELF++ [21] and so on) to obtain the initial seed node set

S

. The algorithm is described as Algorithm 1.

Algorithm 1. Multi-social networks aggregation algorithm based on Bridge Users.

Input:

G_{1} (V_{1}, E_{1})

,

G_{2} (V_{2}, E_{2})

,

G_{3} (V_{3}, E_{3})

, …,

G_{n} (V_{n}, E_{n})

,

U = {u^{1}, u^{2}, u^{3}, \dots, u^{m}}

, BU,

P_{m \times n}

;

Output: Aggregation graph

G^{*} (V^{*}, E^{*})

;

1) for i = 1 to n // Traversing n social network graphs

2) for j = 1 to |

V_{i}

| // Traversing all nodes of

G_{i}

3) if

u_{i}^{j} \in B U

4) if

u_{0}^{j} \in V^{*}

then

5) Create edge

(u_{i}^{j}, u_{0}^{j})

,

(u_{0}^{j}, u_{i}^{j})

;

6) Add activated probability of edge

p_{u_{i}^{j}, u_{0}^{j}} = 1

,

p_{u_{0}^{j}, u_{i}^{j}} = P_{i j}

;

7) Add edge

(u_{i}^{j}, u_{0}^{j})

and

(u_{0}^{j}, u_{i}^{j})

to

E^{*}

;

8) else

9) Create node

u_{0}^{j}

;

10) Create edge

(u_{i}^{j}, u_{0}^{j})

,

(u_{0}^{j}, u_{i}^{j})

;

11) Add activated probability of edge

p_{u_{i}^{j}, u_{0}^{j}} = 1

,

p_{u_{0}^{j}, u_{i}^{j}} = P_{i j}

;

12) Add edge

(u_{i}^{j}, u_{0}^{j})

and

(u_{0}^{j}, u_{i}^{j})

to

E^{*}

, add node

u_{0}^{j}

to

V^{*}

;

13) end if

14) end if

15) end for

16) end for

17) Output

G^{*} (V^{*}, E^{*})

;

3. Substitutes Mining for Multi-Social Networks Influence Maximization

Given multiple social networks

G_{1} (V_{1}, E_{1})

,

G_{2} (V_{2}, E_{2})

,

G_{3} (V_{3}, E_{3})

, …,

G_{n} (V_{n}, E_{n})

, which are made of m social network users

U = {u^{1}, u^{2}, u^{3}, \dots, u^{m}}

and the relationship between these users. Using multi-social networks influence maximization algorithms, we can find the initial user set represented as

S = S_{1} \cup^{​} S_{2}

. Assume that the user in the subset

S_{2}

(

| S_{2} | = t

) cannot be activated at the initial moment, it needs to find t substitute nodes in the set

U - S

to form a substitute user set

S_{2}^{'}

, constituting a new initial seed user set

S^{'} = S_{1} \cup^{​} S_{2}^{'}

, so that

S^{' *} = \underset{| S^{'} | = k}{a r g m i n} (R (S) - R (S^{'}))

(3)

e.g., the difference between the new initial set S’ and the original set S is smallest, ensuring that the reduction of influence is close to the minimum.

We name this as the problem of substitutes mining for multi-social networks influence maximization (SMMNIM), in the following part, we will propose algorithms to solve this problem.

4. Substitutes Mining Algorithms for SMMNIM

In order to mine the substitutes for influence maximization in multi-social networks, three substitutes mining algorithms for SMMNIM are proposed in this section.

4.1. Greedy-Based Substitutes Mining Algorithm for SMMNIM (G_S)

The literature [13] has proved that substitute nodes mining for influence maximization in a single social network is an NP-hard problem. SMMNIM can be regarded as finding t seed users in the set

U / S

that can maximize information influence, so it is also an NP-hard problem. The NP-hard problem can be approximated to the optimal based on the greedy algorithm. Therefore, we first design a Greedy-based substitutes mining algorithm for SMMNIM (G_S), which is shown in Algorithm 2.

Algorithm 2. Greedy-based Substitutes Mining Algorithm for SMMNIM (G_S).

Input:

G^{*} (V^{*}, E^{*})

, seed node set

S

, the subset of seed nodes that can be activated

S_{1}

, number of seeds that cannot be activated t;

Output: New initial seed seeds set

S^{'}

;

1)

S^{'} = S_{1}

;

2)

R (S)

: The number of active nodes obtained by

S

;

3) for

i = 1

to

t

4) for

v

in

V^{*} / (S \cup^{​} S^{'})

5) select

v = a r g m a x (R (S^{'} + v) - R (S^{'}))

6) end for

7)

S^{'} = S^{'} \cup^{​} v

8) end for

9) Output

S^{'}

;

4.2. Pre-Selected-Based Substitutes Mining Algorithm for SMMNIM(P_S)

Using G_S, the most suitable set of substitutes can be found. However, G_S needs to simulate and calculate the influence of all users, thus requires a large number of calculations, which leads to low algorithm efficiency. In this section, we propose an algorithm based on the idea of “prepare in advance”. When selecting initial seed nodes, additional nodes will be selected, we call these nodes as “pre-selected nodes”. When some nodes cannot be activated, pre-selected nodes will be used as substitute nodes. Assuming that k initial seed nodes need to be selected, we select

k + t^{'}

initial seed nodes form a preselected set

S_{p r e}

. When the number of non-cooperative nodes is t

(t < t^{'})

, the nodes from

k + 1

to

k + t

are selected as substitutes. If there are still uncooperative nodes in the substitutes, we will continue to select nodes from the pre-selected set

S_{p r e}

. The amount of calculation will be greatly reduced. This algorithm we named as pre-selected-based substitutes mining algorithm for SMMNIM (P_S), which is shown in Algorithm 3.

Algorithm 3. Pre-selected-based Substitutes Mining Algorithm for SMMNIM (P_S).

Input:

G^{*} (V^{*}, E^{*})

, pre-selected seed node set

S_{p r e}

, the number of additional selected nodes

t^{'}

;

Output: New initial seed seeds set

S^{'}

;

1)

S_{p r e} = \emptyset

,

S^{'} = \emptyset

;

2)

R (S)

: The number of active nodes obtained by

S

;

3) for i = 1 to

k + t^{'}

4) for

v

in

V^{*} / (S_{p r e})

5) select

v = a r g m i n (R (S_{p r e} + v) - R (S_{p r e}))

6) end for

7)

S_{p r e} = S_{p r e} \cup^{​} v

8) end for

9) for v in

S_{p r e}

10) if v is uncooperative node

11)

S_{p r e} = S_{p r e} - v

12) end for

13) for i = 1 to k

14) for v in

S_{p r e}

15)

S^{'} = S^{'} \cup^{​} v

16) end for

17) end for

18) Output

S^{'}

;

4.3. Similar-Users-Based Substitutes Mining Algorithm for SMMNIM (S_S)

In this section, we propose another solution to find substitutes based on the idea of “finding the most similar user of uncooperative user”. We focus on seeking the most similar users of the uncooperative nodes as the substitutes and propose a similar-users-based substitutes mining algorithm for SMMNIM (S_S), which is shown in Algorithm 4. We integrate structure similarity and neighbor attribute similarity to evaluate the similarity between two users.

Given an aggregation graph

G^{*} (V^{*}, E^{*})

of multi-social networks,

(u, v) \in E^{*}

indicates the edge from node u to node v. The structure of the node u is determined by the in-edges and out-edges of the node.

O_E (u)

indicates the out-edges and be defined as

O_E (u) = {v \in V^{*} | (u, v) \in E^{*}} .

(4)

I_E (u)

indicates the in-edges and be defined as

I_E (u) = {w \in V^{*} | (u, v) \in E^{*}} .

(5)

The structure of the node u is defined as

Ψ (u) = (O_E (u), I_E (u)) .

(6)

The structural similarity of node u and node v is defined as

S S (u, v) = \frac{1}{2} (\frac{| O_E (u) \cap^{​} O_E (v) |}{\sqrt{| O_E (u) | * | O_E (v) |}} + \frac{| I_E (u) \cap^{​} I_E (v) |}{\sqrt{| I_E (u) | * | I_E (v) |}}) .

(7)

The self-spread of a Bridge User makes it more likely to have a wider scope of influence spread than ordinary users. When choosing the most similar user of the uncooperative user, the other factor we need to consider is the similarity of neighbor attributes (Bridge Users or ordinary users).

In the neighbor node set of node u, the number of Bridge Users is denoted as

B_U (u)

, and the number of ordinary users is denoted as

N_U (u)

. The neighbor node attributes of node u are defined as vectors

A (u) = (B_U (u), N_U (u)) .

(8)

The Euclidean distance of the neighboring attributes of u and v is

E D_A (u, v) = \sqrt{(A (u) - A (v)) {(A (u) - A (v))}^{T}} .

(9)

Then we define the neighbor attributes similarity between node u and v is

A S (u, v) = \frac{E D_A (u, v) - E D_A_{m i n}}{E D_A_{m a x} - E D_A_{m i n}}

(10)

Finally, the similarity between node u and v is based on the similarity calculated by the structure similarity and the neighbor attributes similarity,

S i m (u, v) = S S (u, v) + A S (u, v) .

(11)

Algorithm 4. Similar-users-based Substitutes Mining Algorithm for SMMNIM (S_S).

Input:

G^{*} (V^{*}, E^{*})

, seed node set

S

, the subset of seed nodes that can be activated

S_{1}

, the subset of seed nodes that cannot be activated

S_{2}

;

Output: New initial seed seeds set

S^{'}

;

1)

S^{'} = S_{1}

;

2)

R (S)

: The number of active nodes obtained by

S

;

3) for i = 1 to |

S_{2}

|

4) for

u

in

S_{2}

5) for v in

V^{*} / S

6) select

v = a r g m a x (S i m (u, v))

7)

S^{'} = S^{'} \cup^{​} v

8) end for

9) end for

10) end for

11) Output

S^{'}

;

5. Simulation

5.1. Data Description

The experiments are based on the following three social network datasets:

NetHEPT: This dataset is derived from “High Energy Physics” and is a web-based data about authors of articles. Each node represents an author, and the number of edges between a pair of nodes is equal to the number of papers the two authors collaborated [7].

Epinions: This dataset comes from the social network “Epinions”. In the data set, if a user trusts another user, there is an edge between the users.

Slashdot: This dataset is derived from the “Slashdot” website, a consulting technology website and its users can post comments on the website. In this data set, if the user is a friend or an opponent, it is considered that there is a relationship between the users and there is an edge.

The detailed information of nodes and edges is shown in Table 1.

5.2. Analysis of Multi-Social Networks Influence Maximization

In the simulation, CELF++ is used to solve the problem of influence maximization and the propagation model we selected is the LT model. For the edge (u, v) existing in the graph G, the probability

p_{u v}

the node u to activate its neighbor node v is calculated by the reciprocal of the number of node v’s in-degrees, that is

p_{u v} = 1 / | i n (v) |,

v \in o u t (u)

. Each time the influence of the seed set was calculated, the propagation process was simulated by 10,000 Monte Carlo methods.

To compare with the information spread in multi-social networks, we first show the spread of influence in a single social network, as shown in Figure 4. From the experimental results we can see that, as the number of seed nodes increases, the influence increases.

Next, the experiments verify the information spread in multiple social networks. However, we cannot get the social network data sets with the annotated Bridge Users. In the experiments, we choose the node with the same ID from the three social networks and assume these nodes (with the same ID) belong to one user, this user is regarded as a Bridge User. In the following simulation, the BU is obtained by this method. Then the above three social networks

G_{1} (V_{1}, E_{1})

,

G_{2} (V_{2}, E_{2})

,

G_{3} (V_{3}, E_{3})

can be aggregated into one network

G (V, E)

by algorithm 1. When aggregation, the number of selected Bridge User is set as 0, 1000, 2000, 3000, 4000, and 5000 respectively. The information spread with a different number of Bridge Users in multi-social networks is shown in Figure 5.

As the result shown in Figure 5, we can find that, as the number of seed nodes increases, the spread of influence becomes larger. As discussed before, we have analyzed that Bridge Users can spread information from social network

G_{1}

to

G_{2}

. When comparing the influence spreading of a different number of Bridge Users, we can see that the more Bridge Users, the wider influence spreading, which indicates the self-spread of Bridge User expands the scope of information spreading. These results confirm that the problem of influence maximization is different from previous, multi-social networks can be connected by Bridge Users.

5.3. Comparison Results of Substitutes Mining Algorithms

The uncooperative seed nodes will lead to the failure of influence maximization, it is necessary to select appropriate substitute nodes to replace the uncooperative seed nodes. In this section, assume that the cooperation rate of the seed nodes is 80%, this means that if we expect to select 50 nodes as the initial seed nodes, there may be 10 nodes will be uncooperative seed nodes. The purpose of the three algorithms designed in this paper is to find 10 substitutes to replace the uncooperative seed nodes. To evaluates the performance of each algorithm, this paper compares the algorithm in three aspects: the influence spread range of the new seed set, the loss rate of the influence spreading, and the memory/time consumption of the algorithm running. Each algorithm is executed multiple times to avoid errors (in this experiment, the number of times is five). In the figure of experimental results, init-0 represents the initial set of seed nodes, G_S represents solving algorithms for SMMNIM based on Greedy, P_S represents pre-selected-based substitutes mining algorithm for SMMNIM, and S_S represents similar-users-based substitutes mining algorithm for SMMNIM.

Figure 6 shows the influence spreading of the new seed set obtained by the three algorithms G_S, P_S and S_S. Compare with the influence spreading of original initial seed set (Init_0), we find that the three new seed set get lower influence spreading. Although we have proposed algorithms to find substitutes, it still unable to make up for the losses caused by the uncooperative nodes. It indicates that the uncooperative nodes lead to the reduction of influence maximization in multiple social networks. In other aspects, the new seed set obtained by G_S algorithm can achieve the closest influence spreading to the original seed set can produce. This is because the G_S algorithm is equivalent to finding the optimal seed set in the rest nodes. Therefore, the new seed set solved by G_S can obtain a wider range of influence than the other two algorithms. Compared with the P_S algorithm, the new seed set obtained by S_S algorithm has a wider range of influence. This is because the S_S algorithm looks for the most similar node of the uncooperative node as the substitute node, the influence that substitute nodes can produce is similar to the original uncooperative node. P_S algorithm just pre-select some nodes for replacing the uncooperative node. Therefore, S_S has a better performance than the P_S algorithm.

Figure 7 illustrates the loss rate of influence spreading of the new seed set by three algorithms. While the G_S algorithm solves the better alternative nodes, the loss of influence is smaller. The S_S algorithm takes second place.

In terms of algorithm memory/time cost (Figure 8), P_S algorithm costs less memory than G_S algorithm, and the time cost is smaller than G_S algorithm. The main reason is that P_S algorithm will pre-select some nodes in advance as “standby” nodes. When some nodes do not cooperate, they can select substitute nodes from the “standby” nodes without recalculations. The G_S algorithm needs a long time to recalculate the new seed set, it’s about two times that of P_S algorithm, and the cost of memory is also larger than P_S. In addition, the S_S algorithm spends a number amount of calculations on user similarity, so its memory/time cost is the largest.

Therefore, all three algorithms can find the substitute nodes and reduce the influence loss caused by uncooperative nodes, we can choose one of them according to different requirements. The substitute nodes found by G_S algorithm can get closer influence to the original seed node set, but it cost more time and memory. That is, G_S is suitable for the scenes that are not sensitive to time or memory but require a wider range of influence; P_S algorithm can get the substitute nodes immediately, so it is more suitable for time-sensitive or memory-sensitive scenes; Due to the large number of calculations on user similarity, the S_S algorithm can be selected when the user similarity is known in advance.

6. Conclusions

In this paper, we first studied the problem of multi-social networks influence maximization. By defining the user with multiple social network accounts as a Bridge User, we discussed how a Bridge User affects the information spreading in multiple social networks. Then we considered a new and significant problem by analyzing that there may be some seed nodes cannot be activated in the process of influence maximization. Hence, it is necessary to find substitute nodes to reduce the losses caused by these uncooperative seed nodes. This brings up the problem of Substitutes Mining for Multi-Social Networks Influence Maximization (SMMNIM). In this paper, three substitute nodes mining algorithms were proposed (G_S, P_S and S_S). The experimental results showed that: (1) In multi-social networks, Bridge Users can make information spread across social networks and expand the range of information influence; (2) the uncooperative nodes will reduce the range of information influence; (3) three substitute node mining algorithms can find suitable substitute nodes and construct the new seed set, which makes the information influence as close as possible to the original seed node set; (4) according to different application scenarios, the three algorithms can be selected for mining the substitute seed nodes.

In the future research, we will further consider the attributes of nodes and information, such as the node’s interests, the subject of the information, etc., take these factors into the process of multi-social networks influence maximization, and propose more accurate and efficient substitute node mining algorithm.

Author Contributions

X.R. and J.Z. designed the proposed method and wrote the paper; J.Z. and F.L. wrote the code and performed the experiments; X.R. and J.Z. analyzed the data; Z.C. modified the paper and offered fund support;

Funding

This research was funded by Project of Fuzhou Science and Technology (No.2015-G-84) and National Natural Science Foundation of China (No. 61841701). And the APC was funded by Project of Fuzhou Science and Technology (No.2015-G-84).

Acknowledgments

The authors thank the fund of Project of Fuzhou Science and Technology (No.2015-G-84) and the fund of Natural Science Foundation of China (No.61841701) for covering the costs to publish in open access and the costs when writing this study. Besides, the authors thank the anonymous reviewers for their insightful comments that helped improve the quality of this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wasserman, S. Social Network Analysis Methods and Applications. Contemp. Sociol. 1995, 91, 219–220. [Google Scholar]
Zhan, Q.; Zhang, J.; Philip, S.Y. Integrated anchor and social link predictions across multiple social networks. Knowl. Inf. Syst. 2018, 1–24. [Google Scholar] [CrossRef]
Bakshy, E.; Rosenn, I.; Marlow, C.; Adamic, L. The role of social networks in information diffusion. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; ACM: New York, NY, USA, 2012; pp. 519–528. [Google Scholar]
Zhao, J.H.; Wan, K.W. Research on the Communication Dynamics Model of Social Network Public Opinion Based on the SIS Model. Inf. Sci. 2017, 12, 34–38. [Google Scholar]
Liu, S.; Jiang, C.; Lin, Z.; Ding, Y.; Duan, R.; Xu, Z. Identifying effective influencers based on trust for electronic word-of-mouth marketing. Inf. Sci. 2015, 306, 34–52. [Google Scholar] [CrossRef]
Kempe, D.; Kleinberg, J. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; ACM: New York, NY, USA, 2003; pp. 137–146. [Google Scholar]
Chen, W.; Wang, Y.; Yang, S. Efficient influence maximization in social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; ACM: New York, NY, USA, 2009; pp. 199–208. [Google Scholar]
Du, N.; Liang, Y.; Balcan, M.F.; Gomez-Rodriguez, M.; Zha, H.; Song, L. Scalable Influence Maximization for Multiple Products in Continuous-Time Diffusion Networks. J. Mach. Learn. Res. 2017, 18, 1–45. [Google Scholar]
Tong, G.; Wu, W.; Tang, S.; Du, D.Z. Adaptive Influence Maximization in Dynamic Social Networks. IEEE/ACM Trans. Netw. 2017, 25, 112–125. [Google Scholar] [CrossRef]
Wang, Y.; Vasilakos, A.V.; Jin, Q.; Ma, J. PPRank: Economically Selecting Initial Users for Influence Maximization in Social Networks. IEEE Syst. J. 2017, 11, 2279–2290. [Google Scholar] [CrossRef]
Li, G.L.; Chu, Y.P.; Feng, H.J.; Xu, Y.Q. Influence maximization on multiple social network. Chin. J. Comput. 2016, 39, 643–656. [Google Scholar]
Li, X.; Zhang, X.; Shu, H.; Sun, G. Influence Maximization Acros Multi-Channels in Social Network. J. Comput. Res. Dev. 2016, 53, 1709–1718. [Google Scholar]
Shen, Y.; Dinh, T.N.; Zhang, H.; Thai, M.T. Interest-matching information propagation in multiple online social networks. In Proceedings of the International Conference on Information and Knowledge Management, Maui, HI, USA, 29 October–2 November 2012; ACM: New York, NY, USA, 2012; pp. 1824–1828. [Google Scholar]
Zhang, H.; Nguyen, D.T.; Zhang, H.; Thai, M.T. Least cost influence maximization across multiple social networks. IEEE/ACM Trans. Netw. 2016, 24, 929–939. [Google Scholar] [CrossRef]
Nguyen, D.T.; Das, S.; Thai, M.T. Influence maximization in multiple online social networks. In Proceedings of the International Conference on Global Communications Conference, Atlanta, GA, USA, 9–13 December 2013; pp. 3060–3065. [Google Scholar]
Zhao, J.; Chen, Z.; Luo, J. Influence maximization on multi-social networks based on bridge users. Comput. Syst. Appl. 2017, 26, 199–204. [Google Scholar]
Li, C.T.; Hsieh, H.P.; Lin, S.D.; Shan, M.K. Finding influential seed successors in social networks. In Proceedings of the 21st International Conference Companion on World Wide Web, Lyon, France, 16–20 April 2012; ACM: New York, NY, USA, 2012; pp. 557–558. [Google Scholar]
Ma, Q.; Ma, J. Discovering the substitutes for the seeds in influence maximization problem. Chin. J. Comput. 2017, 40, 674–686. [Google Scholar]
Wang, Y.; Dong, W.; Dong, X. A novel ITÖ Algorithm for influence maximization in the large-scale social networks. Future Gener. Comput. Syst. 2018, 88, 755–763. [Google Scholar] [CrossRef]
Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; Glance, N.S. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 420–429. [Google Scholar]
Goyal, A.; Lu, W.; Lakshmanan, L.V.S. CELF++: Optimizing the greedy algorithm for influence maximization in social networks. In Proceedings of the International Conference on World Wide Web, Hyderabad, India, 28 March–1 April 2011; pp. 47–48. [Google Scholar]

Figure 1. User with multiple accounts forwards information across social networks.

Figure 2. Information spread within a single social network.

Figure 3. Bridge User forwards the information from G₁ to G₂.

Figure 4. The spread of influence in a single social network. (a) The spread of influence in NetHEPT; (b) The spread of influence in Epinions; (c) The spread of influence in Slashdot.

Figure 5. The information spread with a different number of Bridge Users in multi-social networks.

Figure 6. Influence spreading of the new seed set obtained by G_S, P_S and S_S.

Figure 7. Loss rate of influence spreading of the new seed set obtained by G_S, P_S and S_S.

Figure 8. Memory/time cost of G_S, P_S and S_S.

Table 1. Three social network datasets used in the simulation.

Dataset	NetHEPT	Epinions	Slashdot
Number of nodes	15,233	75,879	77,360
Number of edges	62,796	508,837	905,468

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rao, X.; Zhao, J.; Chen, Z.; Lin, F. Substitute Seed Nodes Mining Algorithms for Influence Maximization in Multi-Social Networks. Future Internet 2019, 11, 112. https://doi.org/10.3390/fi11050112

AMA Style

Rao X, Zhao J, Chen Z, Lin F. Substitute Seed Nodes Mining Algorithms for Influence Maximization in Multi-Social Networks. Future Internet. 2019; 11(5):112. https://doi.org/10.3390/fi11050112

Chicago/Turabian Style

Rao, Xuli, Jiaxu Zhao, Zhide Chen, and Feng Lin. 2019. "Substitute Seed Nodes Mining Algorithms for Influence Maximization in Multi-Social Networks" Future Internet 11, no. 5: 112. https://doi.org/10.3390/fi11050112

APA Style

Rao, X., Zhao, J., Chen, Z., & Lin, F. (2019). Substitute Seed Nodes Mining Algorithms for Influence Maximization in Multi-Social Networks. Future Internet, 11(5), 112. https://doi.org/10.3390/fi11050112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Substitute Seed Nodes Mining Algorithms for Influence Maximization in Multi-Social Networks

Abstract

1. Introduction

2. Influence Maximization in Multi-Social Networks

2.1. Definition and Characteristics of Bridge User

2.2. Multi-Social Networks Influence Maximization

2.3. Multi-Social Networks Aggregation Algorithm based on Bridge Users

3. Substitutes Mining for Multi-Social Networks Influence Maximization

4. Substitutes Mining Algorithms for SMMNIM

4.1. Greedy-Based Substitutes Mining Algorithm for SMMNIM (G_S)

4.2. Pre-Selected-Based Substitutes Mining Algorithm for SMMNIM(P_S)

4.3. Similar-Users-Based Substitutes Mining Algorithm for SMMNIM (S_S)

5. Simulation

5.1. Data Description

5.2. Analysis of Multi-Social Networks Influence Maximization

5.3. Comparison Results of Substitutes Mining Algorithms

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI