Next Article in Journal
Machine Learning Regularization Methods in High-Dimensional Monetary and Financial VARs
Next Article in Special Issue
Community Detection Based on Node Influence and Similarity of Nodes
Previous Article in Journal
On a New Family of Runge–Kutta–Nyström Pairs of Orders 6(4)
Previous Article in Special Issue
HWVoteRank: A Network-Based Voting Approach for Identifying Coding and Non-Coding Cancer Drivers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multiple Benefit Thresholds Problem in Online Social Networks: An Algorithmic Approach

by
Phuong N. H. Pham
1,2,*,
Bich-Ngan T. Nguyen
1,2,
Quy T. N. Co
1 and
Václav Snášel
2
1
Faculty of Information Technology, Ho Chi Minh City University of Food Industry, 140 Le Trong Tan Street, Ho Chi Minh 700000, Vietnam
2
Department of Computer Science, Faculty of Electrical Engineering and Computer Science, VŠB-Technical University of Ostrava, 17.listopadu 15/2172, 708 33 Ostrava, Czech Republic
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(6), 876; https://doi.org/10.3390/math10060876
Submission received: 16 December 2021 / Revised: 22 February 2022 / Accepted: 4 March 2022 / Published: 9 March 2022
(This article belongs to the Special Issue Complex Network Modeling: Theory and Applications)

Abstract

:
An important problem in the context of viral marketing in social networks is the Influence Threshold (IT) problem, which aims at finding some users (referred to as a seed set) to begin the process of disseminating their product’s information so that the benefit gained exceeds a predetermined threshold. Even though, marketing strategies exhibit different in several realistic scenarios due to market dependence or budget constraints. As a consequence, picking a seed set for a specific threshold is not enough to come up with an effective solution. To address the disadvantages of previous works with a new approach, we study the Multiple Benefit Thresholds (MBT), a generalized version of the IT problem, as a result of this phenomenon. Given a social network that is subjected to information distribution and a set of thresholds, T = { T 1 , T 2 , , T k } , T i > 0 , the issue aims to seek the seed sets S 1 , S 2 , , S k with the lowest possible cost so that the benefit achieved from the influence process is at the very least T 1 , T 2 , , T k , respectively. The main challenges of this problem are a #NP-hard problem and the estimation of the objective function #P-Hard under traditional information propagation models. In addition, adapting the exist algorithms many times to different thresholds can lead to large computational costs. To address the abovementioned challenges, we introduced Efficient Sampling for Selecting Multiple Seed Sets, an efficient technique with theoretical guarantees (ESSM). At the core of our algorithm, we developed a novel algorithmic framework that (1) can use the solution to a smaller threshold to find that of larger ones and (2) can leverage existing samples with the current solution to find that of larger ones. The extensive experiments on several real social networks were conducted in order to show the effectiveness and performance of our algorithm compared with current ones. The results indicated that our algorithm outperformed other state-of-the-art ones in terms of both the total cost and running time.

1. Introduction

In recent years, there has been a rapid development of the global economy thanks to the contribution of the Online Social Network (OSN), based on the provision of a powerful platform for communication and information dissemination in the field of marketing, media, and advertising, particularly in social networks with billions of users. The strong underpinnings of problems of social influences in OSNs are information diffusion models. Kempe et al. [1] first introduced two classic models, named Independent Cascade (IC) and Linear Threshold (LT), and formulated the Influence Maximization (IM) problem, which aims to select k nodes that may impact the largest number of users a social network. This work has inspired many studies on social influence [2,3,4,5,6,7,8,9,10], misinformation/rumors detection, and control [11,12,13,14,15].
In the context of viral marketing for product promotion, hosts (companies) often devise a marketing campaign including the distribution of product samples to selected users and expect that they persuade their friends, friends of friends, etc. The number of people who have been impacted reaches a certain level. Influence Threshold (IT) was inspired by this phenomenon and a slew of research backed it up; it looks for a node set with the smallest size possible so that the number of impacted nodes reaches or surpasses a predetermined threshold γ [8,16,17]. The value of γ can determine the scale of of the viral marketing. However, in some realistic scenarios, there is a distinct cost to persuade a user who promotes a sample product [4,18]. Besides, each influenced user often offers a different benefit when one is influenced after the marketing process. Customers with significant financial resources, for example, will be able to purchase more things than others. As a result, the existing algorithms for IT problem may offer an inaccurate solution of a marketing purpose. Moreover, the marketing strategies are often adjusted since the market can vary in a short time. Consequently, a particular solution for a benefit is insufficient to be the overall effective solution. This can be overcome by finding solutions for multiple thresholds and selecting the best one that suits their budget and current market.
For instance, assume that a company wants to come up with a strategy that can influence customers on an online social network. Nonetheless, or due to budget fluctuations or the instability of the market, they may consider strategies of spreading with the different number of influenced customers such as 1000, 2000, 3000, 5000, etc. In this case, the company wants to find solutions, where the benefit function of each is above the corresponding threshold and then that company can select a solution with a reasonable cost so as to execute its marketing plan well.
Our goal in this study is to develop an answer to a novel Multiple Benefit Thresholds (MBT) problem, which is expressed as follows. For a social network G = ( V , E ) given a set of k benefit thresholds T = { T 1 , T 2 , , T k } , each user u has a distinct cost price c ( u ) > 0 . The issue is to seek for the various seed sets { S 1 , S 2 , , S k } , in which each S i has the cheapest total cost c ( S i ) by a result of each seed set’s earned benefit S i , characterized by B ( S i ) , and is at least T i for i = 1 , k . There are two main challenges for solving MBT problem. First ones are to find MBT as #NP-Hard and to calculate the benefit function #P-Hard. Secondly, finding numerous seed sets for multiple thresholds needs more time and memory than other information propagation challenges, as well as the IT problem. It is necessary to run the existing algorithms for a single threshold k times to prove it is costly and, hence, not applicable to large networks. To overcome the challenges, in this paper, we propose a highly efficient algorithm to solve the problem. This not only guarantees a solution but also produces good results in practice. This work revised and extended the our conference paper [19] by providing all the proofs more detail and experiment evaluation.
The following is a list of our contributions as a whole:
  • The Multiple Benefit Thresholds (MBT) is first formulated with the Independent Cascade (IC) information diffusion model.
  • With a view to developing the solution, the Efficient Sampling for Multiple Seed Set Selection (ESSM) is proposed, a theoretical approximation algorithm bounds by developing a novel algorithmic framework that utilizes the sample technique to estimate the benefit function, denoted as B ( · ) , and leverages the seed set and the samples with smaller benefit threshold with the purpose of finding the seed set of the larger ones. Accordingly, our algorithm can find multiple seed sets in only one run. For solution guarantee, our algorithm returns multiple seed sets S i satisfying B ( S i ) 1 ϵ 1 + ϵ T i ϵ and the total cost c ( S i ) ( 1 + ln ( T i ϵ T i ) ϵ ) c ( S i * ) a strong possibility (w.h.p), where ϵ > 0 is an input and S i * is the best seed set in terms of threshold T i for all i = 1 , 2 , , k .
  • Extensive experiments on six real-world networks are performed, including Gnutella, Email-Enron, Net-Hept, Net-Phy, Amazon, and DBLP for the comparison of the efficiency between our algorithm and other state-of-the-art ones. The results of experiments indicated that our algorithm outperformed the state-of-the-art ones in respect of both the cost and the running time.
Organization. The rest of the paper is structured as follows. In Section 2, we review previous relevant works of influence maximization. Section 3 presents the model, problem definition, and main algorithm. The experiment results are shown and explained in Section 4. Finally, Section 5 brings the paper to the conclusion.

2. Related Works

In this section, we review previous studies related to our abovementioned problem, including Information propagation models, Influence Maximization, and Influence Threshold.
Information propagation models and Influence Maximization. Social networks provide a convenient environment for business marketing through the word-of-mouth effect. Influence Maximization (IM) [1], which seeks out k nodes (seed set) in a social network that can influence the greatest number of nodes is one of the most important challenges in social network influence. Kempe et al. originally investigated IM as an #NP-hard combinatorial optimization under two famous information diffusion models: Linear Threshold (LT) and Independent Cascade (IC). Furthermore, the challenge of solving IM also coming from calculating the influence function under two above models is #P-hard models—that is, it is impossible to calculate in polynomial time with input size [5,6]. However, due to the enormous application of IM in commerce, several efficient algorithms were proposed for solving the problem in large-scale networks, such as approximation algorithm [1,2,3,20,21] and heuristics without theoretical guarantee [7,22,23]. Notably, Borg et al. [24] made a theoretical breakthrough by proposing a ( 1 1 / e ϵ ) -approximation algorithm in O ( ϵ 3 k l 2 ( m + n ) log 2 n ) with a probability at least 1 n l . The main idea of Borg’ algorithm is that they proposed a sample technique, namely, Reverse Reachable (RR) set, to estimate the number of influenced nodes under stochastic information propagation models and an algorithmic framework that finds the solution in generated samples with theoretical bound. Tang et al. [2] proposed the TIM/TIM++ algorithms reducing the time complexity to O ( ϵ 2 ( k + l ) ( m + n ) log n ) while maintaining the performance guarantees and demonstrated the high efficiency of their algorithm in billion-scale networks. Later on, several algorithms have been devised in an attempt to reduce the sample complexity and running time but they still maintained an approximate ratio by modifying the RIS framework, including IMM [3], SSA/DSSA [21], OPIM [25], etc. Recently, Akram et al. mentioned finding influential communities in a social network with fuzzy competition hypergraphs notion [26,27].
In other directions, numerous studies were carried out on variations of IM for many scenarios of viral marketing. The authors in [28,29,30] considered IM under topic queries by introducing the information diffusion model that can enable many topics to spread. Additionally, the advance in geoposition enabled devices and services makes OSNs able to integrate a user’s location. The authors in [31] investigated the location-aware influence maximization (LIM) problem in which some nodes were selected and the largest number of nodes was influenced in a given distance; [32] considered the role of distance among users to promote the influence process of viral marketing. Moreover, several other variations of IM including competitive-aware [5,33] and time-aware [34] have been introduced and studied.
Recently, Nguyen et al. [35] has studied IM under the budget constraint where each node has the limited cost to adopt a sample product and the total budget was required. In the seminal paper, it showed that the greedy algorithm can achieve an approximation ratio of 1 1 / e and further proposed efficient heuristic algorithms without any performance guarantees. Later, Nguyen et al. [4] studied the Cost-aware Targeted Viral Marketing (CTVM) problem, a generalization of IM. In this problem, each node u has an arbitrary cost c ( u ) and a benefit b ( u ) . The goal of CTVM was to select a seed set within a given budget B so that the total benefit was maximized. They proposed a benefit sampling technique and a 1 1 e ϵ approximation algorithm with probability at least 1 δ in O ( ϵ 2 n log ( n k ) / δ ) . In this study, the sampling technique in [4] is adapted to estimate the benefit function. However, BCT could not adapt to solving our problem due to the difference between MBT and CTVM.
Influence Threshold. Influence Threshold (IT), which seeks the smallest size seed set S such that the influence spread, defined as σ ( S ) , is at least a specified threshold γ , is the problem that comes closest to ours. Goyal et al. [36] were the first to investigate the IT problem using IC models. Using the influence function’s monotone submodular characteristic, they proposed a greedy algorithm combining with Monte Carlo simulation method [1] to estimate σ ( S ) . The algorithm returns a seed set S satisfying σ ( S ) γ ϵ and | S | | S * | · ( 1 + ln γ ϵ ) in O ( n 2 R ) time complexity, where ϵ > 0 is an input, S * is the optimal solution, and R is number of Monte Carlo simulations with setting R = 10.000 . Due to its high time complexity, it is difficult to apply this algorithm to large networks. By utilizing the sampling technique method in [37], Kuhnle et al. [8] developed a ( 1 2 α , 1 + 4 α γ + log γ ) —bicriteria approximation algorithm for a special case of IT where cost of the vertices is the same (We call an algorithm is an ( α , β ) -bicriteria approximation for IT problem if it returns a solution S satisfying σ ( S ) α · T and | S | β · | S * | , where α , β > 0 and S * is the optimal solution.) in O ( α 2 ( m + n ) log ( n ) | S | ) time complexity, where α ( 0 , 1 ) is an input and n , m refer to the number of nodes, edges in the network.
The authors of [17] recently explored IT in a noisy model resembling a real-world situation, where we only estimate the influence spread function within an error bound. The greedy algorithm under noise with theoretical bound was proposed but it retained time complexity as in [38]. In these studies, they ignored the point that each affected user provided a different benefit in these experiments. The benefits of the nodes and different benefit thresholds are considered for identifying the appropriate seed sets in our MBT problem. In the case of the great similarity in benefits of nodes, the above algorithms can be used for each threshold T i , but it is imperative to run k times to find the k seed sets. On the other hand, our proposed algorithm not only provides theoretical bounds but also returns multiple seed sets for set of benefit thresholds at a single time.

3. Methodology

In this section, Independent Cascade (IC) model is presented, as the well-known original model related to the IM problems. [1,2,3,4,20,21]. Our notations and symbols are summarized in Table 1.

3.1. Independent Cascade Model

In this work, a social network is abstracted by a directed graph G = ( V , E ) . V and E represent the set of users and the set of links in the network, respectively. In this model, each edge e = ( u , v ) E has a probability p ( u , v ) ( 0 , 1 ) representing the influence transmission from u to v. Given a seed set S V , each node is in one of two states: active and inactive, which reflects whether it is influenced by the seed set or not. The diffusion process starts from S and works as follows:
  • At the beginning (step t = 0 ), all nodes in the seed set are active.
  • At the next steps (step t 1 ), an node u, which is activated in previous steps, has a single chance to influence each of its neighbors v with the probability of success p ( u , v ) .
  • All active nodes retain their status until the end of the diffusion process, and the process ends at step t if there is no new activated node in this step.
Kempe et al. [1] showed that the IC model was equivalent to sample graph model, defined as follows. The live-edge model first generates a sample graph g = ( E g , V g ) by selecting e = ( u , v ) E with probability p ( e ) = p ( u , v ) and not selecting e = ( u , v ) E with probability 1 p ( u , v ) . The sample graph g is generated with probability
Pr [ g G ] = e E g p ( e ) · e E \ E g ( 1 p ( e ) )
In our model setting, we will gain a benefit b ( u ) > = 0 if the node u becomes active, as in [4]. Benefit function B ( S ) , denoted as the total benefit over all influenced nodes, is calculated as follows:
B ( S ) = g G Pr [ g G ] u R ( g , S ) b ( u )
where R ( g , S ) is the set of nodes that can reach from any node in S in graph g. In additional, each node u V has a cost c ( u ) > 0 , which we have to pay to user u to initiate the influence process from u and c ( S ) = u S c ( u ) .

3.2. Problem Definition

We formally introduce our studied problem, Multiple Benefit Thresholds (MBT), as follows:
Definition 1
(MBT). Given a graph G = ( V , E ) under the IC model and the set of benefit thresholds T = { T 1 , T 2 , , T k } . For each T i T , the problem is required to find S i V with smallest cost c ( S i ) so that B ( S i ) T i .
In the case when b ( u ) = 1 , u V , the benefit function B ( · ) becomes the influence spread function [1]. Ref. [6] showed that it was #P-hard to compute the number of influence nodes (influence spread function) exactly, so calculating B ( · ) was also #P-hard. Besides, the IT problem [8,17,38], a special case of MBT problem with b ( u ) = c ( u ) = 1 , u V and k = 1 , is NP-hard, which implies that MBT is also #NP-hard.

3.3. Our Proposed Algorithm

In this section, the Efficient Sampling for Selecting Multiple seed sets (ESSM), an efficient algorithm for MBT problem with theoretical guarantee, is introduced. Our novel technique is to develop a method that combines two following ideas: (1) finds the candidate seed set for each threshold via the benefit sampling; (2) uses the seed set with a smaller threshold for finding the seed sets with bigger ones, which can improve the running time as well as memory usage. Moreover, the sampling technique with martingale theory is in use to estimate the benefit function effectively.

3.3.1. Benefit Sampling

We first recap the concept of Benefit Sample (BS) in [4] to estimate the B ( · ) .
Definition 2
(Benefit Sample). A BS is generated from G = ( V , E ) under the IC model by following steps: (1) Choose a source node u with probability b ( u ) Γ , (2) create a sample graph g from G, and (3) return R j as the set of nodes that can reach node u in g.
The Algorithm 1 in [4] can be used to generate a BS for IC model.
Algorithm 1: An algorithm for generating a BS under the IC model.
Input: Graph G = ( V , E ) under IC model
Output: A BS set R j
1: Choose a source node u with probability b ( u ) Γ
2: Initialize a queue Q = { u } and R j = { u }
3: while Q is not empty do
4:    v Q . p o p ( )
5:   for u N i n ( v ) \ ( R j Q ) do
6:    With probability p ( u , v ) do: Q . p u s h ( u ) , R j R j { u } ;
7:   end for
8: end while
9: return R j
Given R is a collection of BSes, a seed set S, we define a random variable X j ( S ) as follows:
X j ( S ) = 1 , If R j S 0 , Otherwise
We can estimate the benefit function B ( S ) by the following Lemma in [4].
Lemma 1
(Lemma 2, [4]). For any set of nodes S V , we have: B ( S ) = Γ · E [ X j ( S ) ]
The function B ( · ) is monotone and submodular [4], i.e., for any S T V , and v T , we have
B ( T ) B ( S )
B ( S + { v } ) B ( S ) B ( T + { v } ) B ( T )
We can calculate an estimation B ^ ( S ) of B ( S ) via a collection R of BSes as follows:
B ^ ( S ) = Γ | R | R j R X j ( S )
It can be seen that X j ( S ) [ 0 , 1 ] . We define a random variable Y i = j = 1 i ( X j ( S ) μ ) , i 1 , where μ = E [ X j ] and a sequence random variables Y 1 , Y 2 , , we have
E [ Y i | Y 1 , , Y j 1 ] = E [ Y i 1 ] + E [ Y i ( S ) μ ] = E [ Y i 1 ]
Therefore, Y 1 , Y 2 , are a form of martingale [39]. Thus, we have the following Lemma [39].
Lemma 2
([39]). Given a collection R with T = | R | and λ > 0 , we have
Pr j = 1 T X j ( S ) T · μ λ exp λ 2 2 λ 2 3 + μ T
Pr j = 1 T X j ( S ) T · μ λ exp λ 2 2 μ T
Let λ = ϵ T μ in Lemma 2, we obtain
Pr [ B ^ ( S ) ( 1 + ϵ ) B ( S ) ] exp ϵ 2 μ T 2 + 2 3 ϵ
Pr [ B ^ ( S ) ( 1 ϵ ) B ( S ) ] exp ϵ 2 μ T 2
If the number of BSs is at least T ( 2 + 2 3 ) 1 μ 1 ϵ 2 ln ( 1 δ ) for δ ( 0 , 1 ) , B ^ R ( S ) is an ( ϵ , δ ) -approximation of B ( S ) , i.e.,
Pr [ ( 1 ϵ ) B ( S ) B ^ ( S ) ( 1 + ϵ ) B ( S ) ] 1 δ
The characteristics of the martingale sequence play an important role in devising our algorithm in the next subsection.

3.3.2. ESSM Algorithm

Our proposed algorithm is now described. On a high level, our algorithm combines two methods: (1) We provide a ( δ , ϵ ) -approximation of the benefit function via martingale theory. (2) In each iteration, we propose the algorithmic framework that finds some candidate seed sets for a threshold and then choose the final seed set, which guarantees the solution quality by checking static evidence. (3) We reuse the seed set for smaller threshold for finding the seed sets with the larger threshold. Our proposed algorithm is presented in Algorithm 2.
Algorithm 2: ESSM algorithm.
Input: A graph G = ( V , E ) , T = { T 1 , , T k } , ϵ , δ ( 0 , 1 )
Output: S 1 , S 2 , , S k
1: Generate R 0 containing ( 2 + 2 3 ϵ ) Γ ϵ 2 ( T i ϵ T i ) ( ln n + ln ( 1 / δ ) ) BSs by using Algorithm 1
2:  S 0
3: for i = 1 to k do
4:    R i R i 1
5:    S i S i 1
6:    Calculate B ^ ( S i ) by Equation (6)
7:   while B ^ ( S i ) < T i ϵ T i ϵ do
8:     u arg max v V \ S i min ( B ^ ( S i v ) , T i ϵ T i ϵ ) B ^ ( S i ) c ( v )
9:     S i S i { u }
10:     j | S i |
11:     N ( i , j ) ( 2 + 2 3 ϵ ) Γ ϵ 2 ( T i ϵ T i ) ln ( n j / δ )
12:    if | R i | < N ( i , j ) then
13:     Generate more N ( i , j ) | R i | BSs and add them into R i
14:      N N ( i , j )
15:      S i
16:    end if
17:   end while
18: end for
19: return S 1 , S 2 , , S k
At the beginning of the algorithm, it generates collection R 0 that contains ( 2 + 2 3 ϵ ) Γ ϵ 2 ( T i ϵ T i ) ( ln n + ln ( 1 / δ ) ) BSs by using Algorithm 1 and initiates a seed set S 1 as empty.
At each iteration i of first loop (line 3–18), it finds the seed set with respect to threshold T i . Denote f ( S i ) = min ( B ^ ( S i ) , T i ϵ T i ϵ ) . At each iteration of the second loop (line 7–18), the algorithm finds a seed S i , by iteratively selecting a node u with maximum marginal of the estimation function f as per its cost, i.e., ( f ( S i { u } ) f ( S i ) ) / c ( v ) and (2) checking the condition of the number of samples (line 12). If the number of samples is sufficient to give an ( δ , ϵ ) -approximation (by Lemma 3), the algorithm moves into next iterations and keeps current seed set S i ; otherwise, the algorithm generates more samples (line 13) so that the number of samples is N ( i , j ) and adds them into R i . In this case, the seed set S i is suitable for new collection R i . The second loop terminates when it satisfies the condition B ^ ( S i ) T i ϵ T i ϵ . Next, the algorithm reuses the current samples and seed set to find the seed set for larger threshold (lines 4–5) by using similar steps with previous iteration.
The theoretical bounds of the algorithm are now analyzed. Firstly, the satisfactory number of BSes is provided to estimate B ( · ) is shown in Lemma 3.
Lemma 3.
If | R | ( 2 + 2 3 ϵ ) Γ ϵ 2 ( T i ϵ T i ) ( ln n + ln 1 δ ) then Pr [ B ^ ( S i * ) T i T i ϵ ] 1 δ
Proof. 
Denote μ = B ( S i * ) / Γ , μ ^ = B ^ ( S i * ) / Γ , we have
Pr [ B ^ ( S i * ) T i T i ϵ ] Pr [ B ^ ( S i * ) ( 1 ϵ ) B ( S i * ) ] = Pr [ μ ^ ( 1 ϵ ) μ ] ( By applying   ( 10 ) ) exp ϵ 2 | R | μ 2 exp ϵ 2 | R | μ ^ 2 ( 1 ϵ ) ( Due to μ μ ^ / ( 1 ϵ ) ) exp ( 2 + 2 3 ϵ ) B ^ ( S i * ) 2 ( 1 ϵ ) ( T i ϵ T i ) ln 1 δ δ
which implies the proof. □
The theoretical guarantee of Algorithm 2 is stated as follows.
Theorem 1.
For any inputs ϵ , δ ( 0 , 1 ) , the Algorithm 2 returns a set of seed sets S = { S 1 , S 2 , , S k } satisfying
(a) 
Pr [ c ( S i ) ( 1 + ln T i ϵ T i ϵ ) c ( S i * ) ] 1 δ / n .
(b) 
Pr B ( S i ) T i · 1 ϵ 1 + ϵ ϵ 1 δ .
Proof. 
At any i-th iterator of the first loop (line 3 to 19) in Algorithm 2, denote S i = S i t = { s i 1 , s i 2 , , s i t } as the solution of algorithm with respect to the threshold T i , and P i = { v 1 i , v 2 i , , v l i } as a set of nodes with minimum cost satisfying B ^ ( P i ) T i ϵ T i and C i = c ( P i ) . Due to the checking condition in line 12, the number of BSes at the end of iteration i obtains at least
N m i n i = ( 2 + 2 3 ϵ ) Γ ϵ 2 ( T i ϵ T i ) ln ( n | S i | / δ )
and obtains at most,
N m a x i = max j : 1 | S i | ( 2 + 2 3 ϵ ) Γ ϵ 2 ( T i ϵ T i ) ln ( n j / δ )
Prove (a) As B ^ ( · ) is submodular, we have
T i ϵ T i B ^ ( S i t 1 ) ) B ^ ( P i ) B ^ ( S i t 1 ) ) B ^ ( P i S i t 1 ) B ^ ( S i t 1 ) ) v P i \ S i t 1 ( B ^ ( S i t 1 { v } ) B ^ ( S i t 1 ) ) C i c ( S i t 1 ) v P i \ S i t 1 ( B ^ ( S i t 1 { v } ) B ^ ( S i t 1 ) )
For any positive numbers a 1 , a l and b 1 , , b l . According to [40], we have
min i = 1 l a i b i i = 1 l a i i = 1 l b i max i = 1 l a i b i
Applying the above inequality, we obtain
T i ϵ T i B ^ ( S i t ) C i c ( s i t ) ( B ^ ( S i t ) B ^ ( S i t 1 ) )
( 1 c ( s i t ) C i ) ( T i ϵ T i B ^ ( S i t 1 ) )
e c ( s i t ) C i ( T i ϵ T i B ^ ( S i t 1 ) )
The (17) condition must satisfy x + 1 e x , for any x > 0 . Therefore,
T i ϵ T i B ^ ( S i t ) e 1 C i j = 1 t c ( s i t ) ( T i ϵ T i )
= e 1 C i c ( S i t ) ( T i ϵ T i )
By the definition of S i t and because S i satisfies the condition in line 7, we have B ^ ( S i t 1 ) < T i ϵ T i ϵ and B ^ ( S i t ) T i ϵ T i ϵ . Combining with (19), we have
( T i ϵ T i ) e 1 C i c ( S i t 1 ) T i ϵ T i B ^ ( S i t 1 ) > T i ϵ T i ( T i ϵ T i ϵ ) = ϵ
implying that c ( S i t 1 ) < C i ln T i ϵ T i ϵ . On the other hand, from (17), we obtain
c ( s i t ) C i ln T i ϵ T i B ^ ( S i t 1 ) T i ϵ T i B ^ ( S i t ) 1
Thus, c ( S i t ) = c ( S i t 1 ) + c ( s i t ) C i ( 1 + ln ( T i ϵ T i ϵ ) ) , where S i is the candidate solution for threshold T i . After i-th iteration of the first loop, | R i | = N ( i , j ) = ( 2 + 2 3 ϵ ) Γ ϵ 2 ( T i ϵ T i ) ln ( n j / δ ) . By applying Lemma 3, after iterator i, we have Pr [ B ( S i * ) T i ϵ T i ] 1 δ / n j . Combining with the definition of P i , the following events happen with a probability of at least 1 δ / n t 1 δ / n :
c ( S i ) C i ( 1 + ln ( T i ϵ T i ϵ ) )
c ( S i * ) ( 1 + ln ( T i ϵ T i ϵ ) )
Prove (b) The i-th iteration of the first loop ends when B ^ ( S i ) T i T i ϵ ϵ , we obtain
Pr B ( S i ) T i 1 ϵ 1 + ϵ ϵ Pr B ( S i ) T i T i ϵ ϵ 1 + ϵ Pr B ( S i ) B ^ ( S i ) 1 + ϵ e ϵ 2 | R i | B ^ ( S i ) 2 Γ ( 1 + ϵ ) ( By applying   ( 10 ) ) e ln ( n j / δ ) 1 + ϵ 1 δ / n j
Since | S i | = j there are at most n j possible solutions S i . By applying the union bound of the probability of events, we have Pr S i , B ( S i ) T i · 1 ϵ 1 + ϵ ϵ δ . Hence, Pr B ( S i ) T i · 1 ϵ 1 + ϵ ϵ 1 δ . The proof is completed. □
Theorem 2
(Number of required BSes). For any ϵ , δ ( 0 , 1 ) , the sample complexity of ESSM is O ( ϵ 2 n ln ( n i m a x / δ ) ) , where i m a x = arg max i = 1 | S k | ln ( n i ) .
Proof. 
The number of BSes for finding seed set S i is at most N m a x i . The algorithm reuses the set of BSes for current seed set for next iteration, so the number of BSes generated by the algorithm is at most N m a x k . On the other hand, Γ = u V b ( u ) b m a x n = O ( n ) . Therefore, the number of samples used in the algorithm is
( 2 + 2 3 ϵ ) Γ ϵ 2 ( T 1 ϵ T 1 ) ln ( n i m a x / δ ) = O ( ϵ 2 n ln ( n i m a x / δ ) )
which completes the proof. □
Denote M , ( M n ) is the expected running time for generating one BS, and j m a x is the largest number of iterations of selecting a seed set. The time complexity of the algorithm is O ( ϵ 2 n k j m a x M ln ( n N m a x k / δ ) ) .

4. Experiments and Discussion

In this section, some extensive experiments are carried out to show the performance of the ESSM algorithm in comparison with other state-of-the-art algorithms on three important metrics: running time, cost of seed sets, and memory usage.

4.1. Experiment Settings

4.1.1. Datasets

For a comprehensive experiment, six networks are selected for information propagation problems [1,2,3,4,5,21] of different sizes. The description of used datasets is presented in Table 2.
  • Gnutella [41] represents Gnutella peer-to-peer file sharing network in August 2002. In this network, 20,777 edges among 6301 nodes show connections among hosts in the Gnutella network topology.
  • Email-Enron [42] network covers all the email communication within a dataset of around half a million emails. These originally public data were posted on the web, by the Federal Energy Regulatory Commission during its investigation. Nodes of the network are email addresses and if an address i has sent at least one email to address j, the graph contains an undirected edge. Note that non-Enron email addresses act as sinks and sources in the network as their communication with the Enron email addresses is only under observation. The Enron email data were originally released by William Cohen at CMU.
  • Net-Hept [43] and Net-Phy [5] are collaborative networks from the “high-energy physics theory” section and “physics” section, in which the nodes represent the authors and undirected edges represent papers written by the same authors.
  • Amazon [44] was collected in 2 March 2003 by crawling the Amazon website. It is based on customers who bought an item and also bought features of the Amazon website. If a product i is frequently copurchased with product j, the graph contains a directed edge from i to j.
  • DBLP computer science bibliography [45] provides a comprehensive list of research papers in computer science. If two authors publish at least one publication together, they establish a coauthorship network.

4.1.2. Algorithms Compared

Since IT [36] and CTVM [4] are the problems most closely related to MBT problem, ESSM is compared with their algorithms with some modifications in our experiment. In addition, the DEGREE algorithm, a popular baseline algorithm for information propagation problems [1,2,5,6], is in use. Compared algorithms are listed below.
  • BCT is an algorithm for CTVM problem [4]. BCT is used by comparison due to the similarity between the BCT and CTVM problem by considering the costs and benefits of the nodes. However, due to the differences between MBT and CTVM, BCT is adapted with some modifications as follows: For each threshold T i , we use a binary search on the cost from range [ 0 , u c ( u ) ] until the reached benefit function falls in [ T i ( 1 ϵ ) , T i ] , where ϵ = 0.1 and returns the seed set with minimum cost.
  • IT is a greedy algorithm for the Influence Threshold problem in [36]. In order to adapt IT algorithm for MBT problem, the Monte Carlo simulation is used to estimate benefit function with 10,000 time simulations as in [1,5].
  • DEGREE is one of common baseline algorithms for influence problem [1,4,22], which select the highest degree of nodes until the benefit of the selection set exceeds thresholds.

4.1.3. Parameter Settings

For computing the transmission probability in IC model, the conventional computation as in [1,2,3,4] is followed and the transmission probability is calculated as p ( u , v ) = 1 | N i n ( v ) | . We set c ( u ) = n . N o u t ( u ) v V N o u t ( v ) and randomly choose 20 % of nodes in each network and set the benefit to 1, the rest assign to 0 as in [4]. Finally, ϵ = 0.1 and δ = 1 / n are set as a default setting [2,3,4] in all the experiments.
We utilize a Linux computer with 2 × Intel(R) Xeon(R) CPU E5-2630 v4 processors running at 2.20 GHz and used 64 GB DDR4 RAM performing at 2400 MHz. Our algorithms are developed in C/C++ using the g++11 compiler.

4.2. Experimental Results

4.2.1. Comparison of the Cost

Figure 1 showed the costs of seed sets returned by algorithms in which the smaller one was better. Our algorithm ESSM outperformed other algorithms by a large gap in most datasets except the Gnutella network. Particularly, ESSM returned the seed sets whose costs are 1875 to 116,000 times more than that of other algorithms. The results also confirmed that our framework algorithm was more efficient than the others.
The IT algorithm only produced good results on the Gnutella dataset and produced worse results than ESSM did on the rest. However, it delivered better results than the rest algorithms did, because the algorithm always finds important seed nodes with low and rational cost as our algorithm do. With large datasets (Amazon and DBLP), IT did not finish within the time limit. This showed that the Monte Carlo method was not suitable for large networks due to its high complexity. DEGREE algorithm selected the highest out-degree of nodes to prioritize as seed nodes, so the highest degree value affected the cost of computing formula, leading to considerable increase in cost, even when the variety of found seed nodes were small. Especially in the Email-Enron dataset, at the first threshold T i , where a seed node was loaded with the highest out-degree, the DEGREE algorithm resulted in the high cost value, even higher than that of the BCT algorithm; although, BCT was also based on the use of BS samples but produced worse results because it used binary search, which could give much larger results than the optimal solution.

4.2.2. Comparison of Running Time

The running times of algorithms were demonstrated in Figure 2. ESSM was significantly faster than the others on all datasets. ESSM algorithm was 6900 to 127,710 times faster and 39 to 2120 times faster than IT and BCT, respectively. The running time of IT was the longest and could not finish within time limit for Amazon and DBLP networks. This was caused by the long time IT spent on accessing Monte Carlo simulation to estimate the benefit function.
The running times of algorithms are shown in Figure 2. ESSM was significantly faster than the others on all datasets. ESSM algorithm is 6900 to 127,710 times faster and 39 to 2120 times faster than IT and BCT, respectively. The running time of IT was the longest and it could not finish within time limit for Amazon and DBLP networks. This resulted from IT spending a long time on calling Monte Carlo simulation to estimate benefit function.
BCT was significantly faster than IT even though it used many loops for binary search for the reason that the BCT used BS samples to estimate the benefit function instead of Monte Carlo simulation method. However, BCT was significantly slower than our algorithm because it did not have a mechanism for reusing the seed set in finding other seed sets with a larger benefit threshold. The larger number of vertices of the datasets, the more time it took BCT to find a solution. The above results were consistent with our assessment that the seed selection strategy in the reuse of solution could shorten the running time of the algorithm. The above results were consistent with our assessment that the seed selection strategy with the reuse seed sets in our algorithm could shorten the time to find the solution.
DEGREE algorithm was also based on the use of a Monte-Carlo-like IT algorithm. Nevertheless, choosing seed nodes was easily dependent on the existing seed set without predicting the next seed nodes. As a consequence, DEGREE ran for a few seconds and was 4 to 54 times faster than our algorithm.

4.2.3. Comparison of Memory Usage

The memory usage of algorithms are illustrated in Table 3. The memory of our ESSM algorithm was not the lowest in small and medium datasets, depending on the characteristics of the data, but the difference was not fairly significant. In the remaining medium and large datasets, the ESSM algorithm clearly offered its advantages with a reduction in memory usage of more than 20,000 times compared with the BCT algorithm in the DBLP dataset. The ESSM algorithm will be more likely to be used on larger datasets while the BCT and IT algorithms will be less likely.
The BCT algorithm does not inherit the sample set across multiple thresholds, such as regenerating independent time-consumption and memory usage for sample sets at each threshold. With a lower threshold, the formula for calculation requires the large number of samples. Whereas the threshold increases, the total required sample set decreases. As a result, the memory usage must decrease and the threshold value must increase. Moreover, the BCT’s sampling algorithm does not guarantee the consistency of the number of samples at a certain threshold, leading to an unusual variation in memory usage among these closing thresholds T i , which was clearly displayed in small datasets using the close thresholds in the experiments as Gnutella, Net-Hept.
During the experiment, the IT algorithm always consumed the highest running time among the algorithms, caused by the use of the classical Monte Carlo sampling algorithm, which consumed the memory usage as well as the run-times. Two large datasets as Amazon and DBLP could not experiment with the IT algorithm partly because during the sampling process, the algorithm overloaded the memory usage. This exhibited the disadvantage of IT algorithm compared with other algorithms. On the contrary, IT used less memory than BCT and ESSM did in some cases because of its no need of storing BS samples such as the other two mentioned algorithms. Finally, similar to IT, DEGREE used the least amount of memory because of its simplicity with no inheritance in building solutions.

4.3. Discussions

The primary difference between our algorithm and the other algorithms was its permission for the reuse of solutions at lower thresholds for higher thresholds while still ensuring the quality of solutions. To ensure approximate guarantee for MBT problem, current state-of-the-art algorithms require to do this once for each threshold T i . Our algorithm only needs to be performed one time for all thresholds of the problem. Consequently, it saves time and performs well with large networks. This is consistent with our experimental results. For most datasets, our algorithm guarantees solution quality (total cost of reaching thresholds) but is significantly faster than the other algorithms. Furthermore, our algorithm also offers significantly better solution quality than the other algorithms do. The reason is that the candidate solutions are still checked by the sampling method with the appropriate number of samples.

5. Conclusions and Future Work

In this paper, motivated by applications in viral marketing, we the investigate MBT problem, which finds seed sets S 1 , S 2 , , S k so that their influence benefits are at least given thresholds T 1 , T 2 , , T k , respectively, under the well-known Independent Cascade model. In the above model, the relationships among users in a social network are represented by a propagation probability or transmission probability.
The problem of our study generalizes the IT problem by considering the following factors: the benefit of each node and finding many seed sets with many thresholds. Although the current IT algorithms are applicable to our problem, multiple repetitions of these are required to find solutions for all thresholds, which makes them expensive and time-consuming.
In order to address the above challenge, we devise ESSM, an efficient algorithm that not only provides solutions with theoretical bounds but also can find multiple seed sets at once. The results confirmed the effectiveness of our algorithm and indicated that it highly outperformed the state-of-the-art algorithms in terms of both solution quality and running time.
One question that arises is whether our algorithm can keep solution guarantees as well as performance against other information propagation models. In the future work, further investigation into MBT problem is going to reveal under other information diffusion models and efficient algorithms are further proposed.
Another interesting question about our research is whether our algorithm is still efficient when each user relationship is affected by different topics. In the future, this issue will be thoroughly under discussion and an algorithm that is appropriate for that context is recommended.

Author Contributions

Conceptualization, P.N.H.P.; methodology, P.N.H.P. and Q.T.N.C.; software, B.-N.T.N.; validation, Q.T.N.C.; formal analysis, B.-N.T.N.; investigation, P.N.H.P.; resources, Q.T.N.C.; writing—original draft preparation, P.N.H.P.; writing—review and editing, P.N.H.P., B.-N.T.N., Q.T.N.C. and V.S.; supervision, V.S.; project administration, P.N.H.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Ho Chi Minh City University of Food Industry (HUFI).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All real-world social network datasets used in the experiment can be downloaded at http://snap.stanford.edu/data/ (accessed on 15 September 2021).

Acknowledgments

This work was supported by Ho Chi Minh City University of Food Industry (HUFI).

Conflicts of Interest

The authors declare that there is no conflict of interest. The funders have no role in the research process and the writing of the manuscript.

References

  1. Kempe, D.; Kleinberg, J.M.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar] [CrossRef] [Green Version]
  2. Tang, Y.; Xiao, X.; Shi, Y. Influence maximization: Near-optimal time complexity meets practical efficiency. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; pp. 75–86. [Google Scholar] [CrossRef]
  3. Tang, Y.; Shi, Y.; Xiao, X. Influence Maximization in Near-Linear Time: A Martingale Approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015; pp. 1539–1554. [Google Scholar] [CrossRef]
  4. Nguyen, H.T.; Thai, M.T.; Dinh, T.N. A Billion-Scale Approximation Algorithm for Maximizing Benefit in Viral Marketing. IEEE ACM Trans. Netw. 2017, 25, 2419–2429. [Google Scholar] [CrossRef]
  5. Chen, W.; Lakshmanan, L.V.S.; Castillo, C. Information and Influence Propagation in Social Networks; Synthesis Lectures on Data Management; Morgan & Claypool Publishers: San Rafael, CA, USA, 2013. [Google Scholar] [CrossRef]
  6. Chen, W.; Wang, C.; Wang, Y. Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; pp. 1029–1038. [Google Scholar]
  7. Chen, W.; Collins, A.; Cummings, R.; Ke, T.; Liu, Z.; Rincón, D.; Sun, X.; Wang, Y.; Wei, W.; Yuan, Y. Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate. In Proceedings of the Eleventh SIAM International Conference on Data Mining, Mesa, AZ, USA, 28–30 April 2011; pp. 379–390. [Google Scholar] [CrossRef] [Green Version]
  8. Kuhnle, A.; Pan, T.; Alim, M.A.; Thai, M.T. Scalable Bicriteria Algorithms for the Threshold Activation Problem in Online Social Networks. In Proceedings of the IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017. [Google Scholar] [CrossRef] [Green Version]
  9. Pham, C.V.; Duong, H.V.; Bui, B.Q.; Thai, M.T. Budgeted Competitive Influence Maximization on Online Social Networks. In Lecture Notes in Computer Science, Proceedings of the Computational Data and Social Networks— 7th International Conference, CSoNet 2018, Shanghai, China, 18–20 December 2018; Chen, X., Sen, A., Li, W.W., Thai, M.T., Eds.; Springer: Cham, Switzerland, 2018; Volume 11280, pp. 13–24. [Google Scholar] [CrossRef]
  10. Pham, C.V.; Thai, M.T.; Ha, D.K.; Ngo, D.Q.; Hoang, H.X. Time-Critical Viral Marketing Strategy with the Competition on Online Social Networks. In Lecture Notes in Computer Science Proceedings of the Computational Social Networks—5th International Conference, CSoNet 2016, Ho Chi Minh City, Vietnam, 2–4 August 2016; Nguyen, H.T., Snásel, V., Eds.; Springer: Cham, Switzerland, 2016; Volume 9795, pp. 111–122. [Google Scholar] [CrossRef]
  11. Pham, C.V.; Dinh, H.M.; Nguyen, H.D.; Xuan, H.H.; Dang, H.T. Limiting the Spread of Epidemics within Time Constraint on Online Social Networks. In Proceedings of the Eight International Symposium on Information and Communication Technology, Nha Trang City, Vietnam, 7–8 December 2017; pp. 262–269. [Google Scholar] [CrossRef]
  12. Pham, C.V.; Phu, Q.V.; Hoang, H.X.; Pei, J.; Thai, M.T. Minimum budget for misinformation blocking in onlinesocial networks. Comb. Optim. 2019, 38, 1101–1127. [Google Scholar] [CrossRef]
  13. Budak, C.; Agrawal, D.; El Abbadi, A. Limiting the spread of misinformation in social networks. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, 28 March–1 April 2011; pp. 665–674. [Google Scholar] [CrossRef]
  14. Zhang, H.; Alim, M.A.; Li, X.; Thai, M.T.; Nguyen, H.T. Misinformation in Online Social Networks: Detect Them All with a Limited Budget. ACM Trans. Inf. Syst. 2016, 34, 1–24. [Google Scholar] [CrossRef]
  15. Pham, C.V.; Pham, D.V.; Bui, B.Q.; Nguyen, A.V. Minimum budget for misinformation detection in online social networks with provable guarantees. Optim. Lett. 2022, 16, 515–544. [Google Scholar] [CrossRef]
  16. Goyal, A.; Lu, W.; Lakshmanan, L.V. Simpath: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model. In Proceedings of the 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, 11–14 December 2011; pp. 211–220. [Google Scholar] [CrossRef]
  17. Crawford, V.G.; Kuhnle, A.; Thai, M.T. Submodular Cost Submodular Cover with an Approximate Oracle. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; PMLR: Mountain View, CA, USA, 2019; Volume 97, pp. 1426–1435. [Google Scholar]
  18. Pham, C.V.; Duong, H.V.; Thai, M.T. Importance Sample-Based Approximation Algorithm for Cost-Aware Targeted Viral Marketing. In Proceedings of the Computational Data and Social Networks—8th International Conference, Ho Chi Minh City, Vietnam, 18–20 November 2019; pp. 120–132. [Google Scholar] [CrossRef]
  19. Pham, P.N.H.; Nguyen, B.T.; Pham, C.V.; Nghia, N.D.; Snásel, V. Efficient Algorithm for Multiple Benefit Thresholds Problem in Online Social Networks. In Proceedings of the 15th IEEE-RIVF International Conference on Computing and Communication Technologies, Hanoi, Vietnam, 19–21 August 2021; pp. 1–6. [Google Scholar] [CrossRef]
  20. Borgs, C.; Brautbar, M.; Chayes, J.T.; Lucier, B. Maximizing Social Influence in Nearly Optimal Time. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, OR, USA, 5–7 January 2014; pp. 946–957. [Google Scholar] [CrossRef] [Green Version]
  21. Nguyen, H.T.; Thai, M.T.; Dinh, T.N. Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–1 July 2016; pp. 695–710. [Google Scholar] [CrossRef] [Green Version]
  22. Chen, W.; Yuan, Y.; Zhang, L. Scalable Influence Maximization in Social Networks under the Linear Threshold Model. In Proceedings of the ICDM 2010, the 10th IEEE International Conference on Data Mining, Sydney, Australia, 14–17 December 2010; pp. 88–97. [Google Scholar] [CrossRef]
  23. Bozorgi, A.; Samet, S.; Kwisthout, J.; Wareham, T. Community-based influence maximization in social networks under a competitive linear threshold model. Knowl.-Based Syst. 2017, 134, 149–158. [Google Scholar] [CrossRef]
  24. Borodin, A.; Filmus, Y.; Oren, J. Threshold Models for Competitive Influence in Social Networks. In Proceedings of the Internet and Network Economics—6th International Workshop, WINE 2010, Stanford, CA, USA, 13–17 December 2010; pp. 539–550. [Google Scholar] [CrossRef]
  25. Tang, J.; Tang, X.; Xiao, X.; Yuan, J. Online Processing Algorithms for Influence Maximization. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, 10–15 June 2018; Das, G., Jermaine, C.M., Bernstein, P.A., Eds.; pp. 991–1005. [Google Scholar] [CrossRef]
  26. Akram, M.; Zafar, F. Hybrid Soft Computing Models Applied to Graph Theory. In Studies in Fuzziness and Soft Computing; Springer: Cham, Switzerland, 2020; Volume 380. [Google Scholar] [CrossRef]
  27. Akram, M.; Luqman, A. Fuzzy Hypergraphs and Related Extensions. In Studies in Fuzziness and Soft Computing; Springer: Singapore, 2020; Volume 390. [Google Scholar] [CrossRef]
  28. Li, Y.; Zhang, D.; Tan, K. Targeted Influence Maximization for Online Advertisements. PVLDB 2015, 8, 1070–1081. [Google Scholar]
  29. Barbieri, N.; Bonchi, F.; Manco, G. Topic-aware social influence propagation models. Knowl. Inf. Syst. 2013, 37, 555–584. [Google Scholar] [CrossRef]
  30. Chen, S.; Fan, J.; Li, G.; Feng, J.; Tan, K.; Tang, J. Online Topic-Aware Influence Maximization. PVLDB 2015, 8, 666–677. [Google Scholar] [CrossRef]
  31. Li, G.; Chen, S.; Feng, J.; Tan, K.L.; Li, W.-S. Efficient Location-Aware Influence Maximization. In Proceedings of the 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, 16–19 April 2018; pp. 1569–1572. [Google Scholar]
  32. Wang, X.; Zhang, Y.; Zhang, W.; Lin, X. Efficient Distance-Aware Influence Maximization in Geo-Social Networks. IEEE Trans. Knowl. Data Eng. 2017, 29, 599–612. [Google Scholar] [CrossRef]
  33. Bharathi, S.; Kempe, D.; Salek, M. Competitive Influence Maximization in Social Networks. In Proceedings of the Internet and Network Economics, Third International Workshop, WINE 2007, San Diego, CA, USA, 12–14 December 2007; pp. 306–311. [Google Scholar] [CrossRef]
  34. Chen, W.; Lu, W.; Zhang, N. Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; pp. 592–598. [Google Scholar]
  35. Nguyen, H.; Zheng, R. On Budgeted Influence Maximization in Social Networks. IEEE J. Sel. Areas Commun. 2013, 31, 1084–1094. [Google Scholar] [CrossRef] [Green Version]
  36. Goyal, A.; Bonchi, F.; Lakshmanan, L.V.S.; Venkatasubramanian, S. On minimizing budget and time in influence propagation over social networks. Soc. Netw. Anal. Min. 2013, 3, 179–192. [Google Scholar] [CrossRef]
  37. Cohen, E.; Delling, D.; Pajor, T.; Werneck, R.F. Sketch-Based Influence Maximization and Computation: Scaling Up with Guarantees. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shangai, China, 3–7 November 2014; pp. 629–638. [Google Scholar] [CrossRef] [Green Version]
  38. Goyal, A.; Lu, W.; Lakshmanan, L.V. CELF++: Optimizing the Greedy Algorithm for Influence Maximization in Social Networks. In Proceedings of the 20th International Conference Companion on World Wide Web, New York, NY, USA, 28 March 2011; pp. 47–48. [Google Scholar]
  39. Chung, F.R.K.; Lu, L. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Internet Math. 2006, 3, 79–127. [Google Scholar] [CrossRef] [Green Version]
  40. Sachdeva, S.; Vishnoi, N.K. Approximation Theory and the Design of Fast Algorithms. arXiv 2013, arXiv:1309.4882. [Google Scholar]
  41. Leskovec, J.; Kleinberg, J.M.; Faloutsos, C. Graph evolution: Densification and shrinking diameters. TKDD 2007, 1, 2. [Google Scholar] [CrossRef]
  42. Leskovec, J.; Lang, K.J.; Dasgupta, A.; Mahoney, M.W. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Math. 2009, 6, 29–123. [Google Scholar] [CrossRef] [Green Version]
  43. Chen, W.; Wang, Y.; Yang, S. Efficient influence maximization in social networks. In Proceedings of the KDD ’09 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 199–208. [Google Scholar] [CrossRef]
  44. Leskovec, J.; Adamic, L.A.; Huberman, B.A. From Competition to Complementarity: Comparative Influence Diffusion and Maximization. arXiv 2015, arXiv:1507.00317. [Google Scholar]
  45. Yang, J.; Leskovec, J. Defining and Evaluating Network Communities based on Ground-truth. Knowl. Inf. Syst. 2015, 42, 181–213. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Comparison about Costs of seed sets between ESSM and other algorithms with threshold Ti from 300 to 9000.
Figure 1. Comparison about Costs of seed sets between ESSM and other algorithms with threshold Ti from 300 to 9000.
Mathematics 10 00876 g001
Figure 2. Comparison of Running time between ESSM and other algorithms with threshold T i from 300 to 9000.
Figure 2. Comparison of Running time between ESSM and other algorithms with threshold T i from 300 to 9000.
Mathematics 10 00876 g002
Table 1. Table of symbols.
Table 1. Table of symbols.
NotionalDescription
n , m The number of nodes and of edges in G, respectively
N i n ( v ) , N o u t ( v ) The incoming and outgoing neighbor node set of v.
S i The solution returned by our algorithm for threshold T i
B ( S ) , B ^ ( S ) Define the benefit function and an estimation of benefit function
Γ Γ = u V b ( u )
S i * The optimal seed set for threshold T i
N ( i , j ) N ( i , j ) = ( 2 + 2 3 ϵ ) Γ ϵ 2 ( T i ϵ T i ) ln ( n j / δ )
N m a x i N m a x i = max j : 1 | S i | ( 2 + 2 3 ϵ ) Γ ϵ 2 ( T i ϵ T i ) ln ( n j / δ )
i m a x i m a x = arg max i = 1 | S k | ln ( n i )
Table 2. Datasets.
Table 2. Datasets.
Dataset#Nodes#EdgesAvg. DegreeSource
Gnutella630120,7773.3[41]
Enron36,692183,8315.0[42]
Net-Hept15,23358,8915.5[43]
Net-Phy37,154231,58413.4[5]
Amazon262,1111,234,8779.4[44]
DBLP317,0801,049,8666.6[45]
Table 3. Memory usage of compared algorithms.
Table 3. Memory usage of compared algorithms.
DatasetThresholdAlgorithm
BCTITDEGREEESSM
Gnutella3000.7580.770.8551.02
5400.8050.750.8521.02
7800.8050.7580.7191.02
10200.7580.7890.751.02
12600.7580.7890.7231.02
15000.8090.8160.7231.02
17400.8240.8550.7851.02
19800.8240.8130.7461.02
Email-Enron33004859.981.0510.7460.813
33504874.961.0510.8550.809
34004841.891.0510.770.715
34504863.271.0510.750.855
35004839.592.3280.8090.75
35504856.992.5820.7110.816
36004858.672.5820.7460.7
36504835.62.5820.8550.715
Net-Hept28000.7230.7110.7110.77
28500.7230.7420.8550.77
29000.7230.750.7540.77
29500.770.750.8550.77
30000.8050.770.7540.77
30500.7460.8090.8090.77
31000.750.7150.750.77
31500.750.7540.8090.77
Net-Phy17002800.250.80520.661.117
19001444.210.72320.661.117
21001446.430.8220.661.117
23001442.560.8220.661.117
25001434.550.86720.661.117
27001429.170.7520.661.117
29001426.530.75820.661.117
31001437.990.79320.661.117
Amazon6000.195N/A0.72312.453
18000.742N/A0.78912.453
30000.742N/A0.80912.512
42000.746N/A0.71912.512
54000.715N/A0.81312.512
66000.715N/A0.74612.512
78000.805N/A0.75812.512
90000.715N/A0.74212.512
DBLP128026,316.8N/A0.71119.121
140041,369.6N/A0.71519.227
152026,009.6N/A0.81319.227
164024,883.2N/A0.81619.227
176024,883.2N/A0.71919.227
188024,883.2N/A0.71119.227
200024,883.2N/A0.71119.227
212024,883.2N/A0.75419.227
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Pham, P.N.H.; Nguyen, B.-N.T.; Co, Q.T.N.; Snášel, V. Multiple Benefit Thresholds Problem in Online Social Networks: An Algorithmic Approach. Mathematics 2022, 10, 876. https://doi.org/10.3390/math10060876

AMA Style

Pham PNH, Nguyen B-NT, Co QTN, Snášel V. Multiple Benefit Thresholds Problem in Online Social Networks: An Algorithmic Approach. Mathematics. 2022; 10(6):876. https://doi.org/10.3390/math10060876

Chicago/Turabian Style

Pham, Phuong N. H., Bich-Ngan T. Nguyen, Quy T. N. Co, and Václav Snášel. 2022. "Multiple Benefit Thresholds Problem in Online Social Networks: An Algorithmic Approach" Mathematics 10, no. 6: 876. https://doi.org/10.3390/math10060876

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop