Influence Maximization under Fairness Budget Distribution in Online Social Networks

Nguyen, Bich-Ngan T.; Pham, Phuong N. H.; Le, Van-Vang; Snášel, Václav

doi:10.3390/math10224185

Open AccessArticle

Influence Maximization under Fairness Budget Distribution in Online Social Networks

¹

Faculty of Information Technology, Ho Chi Minh City University of Food Industry, 140 Le Trong Tan Street, Ho Chi Minh City 700000, Vietnam

²

Department of Computer Science, Faculty of Electrical Engineering and Computer Science, VŠB-Technical University of Ostrava, 17.listopadu 15/2172, 708 33 Ostrava, Czech Republic

³

Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(22), 4185; https://doi.org/10.3390/math10224185

Submission received: 16 August 2022 / Revised: 29 September 2022 / Accepted: 2 November 2022 / Published: 9 November 2022

(This article belongs to the Special Issue Complex Network Modeling: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

In social influence analysis, viral marketing, and other fields, the influence maximization problem is a fundamental one with critical applications and has attracted many researchers in the last decades. This problem asks to find a k-size seed set with the largest expected influence spread size. Our paper studies the problem of fairness budget distribution in influence maximization, aiming to find a seed set of size k fairly disseminated in target communities. Each community has certain lower and upper bounded budgets, and the number of each community’s elements is selected into a seed set holding these bounds. Nevertheless, resolving this problem encounters two main challenges: strongly influential seed sets might not adhere to the fairness constraint, and it is an NP-hard problem. To address these shortcomings, we propose three algorithms (

FBIM 1

,

FBIM 2

, and

FBIM 3

). These algorithms combine an improved greedy strategy for selecting seeds to ensure maximum coverage with the fairness constraints by generating sampling through a Reverse Influence Sampling framework. Our algorithms provide a

(1 / 2 - ϵ)

-approximation of the optimal solution, and require

O (k T log ((8 + 2 ϵ) n \frac{ln \frac{2}{δ} + ln (_{k}^{n})}{ϵ^{2}}))

,

O (k T log \frac{n}{ϵ^{2} k})

, and

O (\frac{T}{ϵ} log \frac{k}{ϵ} log \frac{n}{ϵ^{2} k})

complexity, respectively. We conducted experiments on real social networks. The result shows that our proposed algorithms are highly scalable while satisfying theoretical assurances, and that the coverage ratios with respect to the target communities are larger than those of the state-of-the-art alternatives; there are even cases in which our algorithms reaches

100 %

coverage with respect to target communities. In addition, our algorithms are feasible and effective even in cases involving big data; in particular, the results of the algorithms guarantee fairness constraints.

Keywords:

fairness budget; submodular maximization; influence maximization; sampling; approximation algorithm; viral marketing

MSC:

68W25; 68D15; 91B99; 90B60

1. Introduction

In the digital information age, use of online social networks (OSNs) has become indispensable and widespread for many people. Current OSNs have millions or even billions of users, such as Facebook, Twitter, Instagram, LinkedIn, Youtube, and others. OSNs can rapidly influence people by sharing behavior, content, or messages between one person and another [1]. This propagation is similar to the way viruses spread exponentially. Based on this powerful feature of OSNs, brands or organizations use an online marketing tactic through OSNs known as viral marketing. Because this tactic can rapidly spread information, effectively promote products, usefully support candidates in elections, etc., on a large scale, it often helps to achieve high results with modest investment costs [2]. However, an effective viral marketing campaign must seed content with groups of influential people in OSNs. The process of identifying a group of such individuals is referred to as the Influence Maximization (

IM

) problem [3].

Influence Maximization problem. Given a ground set V, which is the set of users in the social network, and an information diffusion model, let

k \in Z^{+}

be a global cardinality constraint,

S \subseteq V

and

| S | \leq k

be the set of influential users that needs to be determined, and

f (S)

be the influence function measuring the expected number of users in V that can be affected by members in S under the given information diffusion model. The above problem finds that

\begin{matrix} \underset{\begin{matrix} S \subseteq V, | S | \leq k \end{matrix}}{m a x} f (S) \end{matrix}

(1)

IM

is a celebrated topic that has attracted the interest of many experts during the last decade. Although Kempe et al. first introduced this problem under the name influence maximization in 2003 [3], Richardson et al. in 2002 were the first to study the problem as one of maximizing an advertiser’s earnings on a social network [4].

IM

is crucial in a wide variety of applications, including viral marketing [5,6], social network analysis [7,8], social problems such as financial inclusion [9], HIV prevention for homeless youth [10], propagation of information for disease spread [11], and other issues.

There are many efficient approaches to this problem that assess the spread of influence: the formulation of the discrete optimization problem, which is NP-hard [3]; using continuous-time models [12]; ranking and score-based heuristics [13]; or an excited new approach using sketches and the Reverse Influence Sampling (

RIS

) framework, as suggested by Borgs et al. [14]. Numerous publications have used

RIS

to solve the

IM

problem, with positive results [6,15,16,17].

However, most of these existing solutions to the problem

IM

focus on the most influential nodes to maximize the total number of affected nodes. This means that these methods only aim to find the most influential users to maximize the number of affected people. Such approaches are not interest in whether the influenced people are fairly distributed over the network. Thus, there is a high probability that users in groups that do not contain seeded users will not be influenced or will not receive the information. On the contrary, it is these users that must be affected.

For example, viral marketing is a significant and standard application of the

IM

problem. The objective is to maximize the profits of advertisers by promoting products to users on OSNs [4]. OSNs are fertile ground for the advertising field; they have millions or billions of users, and the number of users continuously increases every day. However, it is troubling that users of OSNs often encounter advertising content that they have viewed too many times or already purchased, and thus do not care about. In contrast, users who are the right customers for these ads do not receive them. This is boring and frustrating for users. Therefore, the challenge is how to spread advertising content to the right customers and other potential customer groups in OSN communities.

In recent years, to conquer the above weakness of

IM

, a new variant of it has been developed that has attracted the attention of researchers. This is the problem of fair influence maximization (

FIM

), which aims to ensure a fair distribution for the groups in the final set of selected nodes [18,19]. In other words, it guarantees coverage propagation in the target communities. However, each of the existing methods offers a unique perspective on fairness. To the best of our knowledge, there are no publications that have considered both minimum and maximum budget constraints for each group to guarantee equitable distribution.

Fueled by this challenge, in this paper we study

FIM

under budget threshold constraint, setting upper and lower bounded budgets to choose seeds in each community in order to guarantee fairness (termed

FBIM

; see Definition 5). Specifically, our contributions are as follows.

We propose three algorithms to solve the $FBIM$ problem. These algorithms provide a $(1 / 2 - ϵ)$ -approximation of the optimal solution that works efficiently with big data.

(1) The first algorithm is

FBIM 1

, which uses a combination of several methods. First, it generates sampling through an

RIS

framework with the dynamic stop-and-stare algorithm, known as

DSSA

[6], and adds a fairness constraint during seed set selection. Our algorithm has

O (k T log ((8 + 2 ϵ) n \frac{ln \frac{2}{δ} + ln (\binom{n}{k})}{ϵ^{2}}))

complexity. The results show that this algorithm has a runtime with an objective function value that can be equal to or better than

DSSA

, depending on the adjustment of the dependent parameters. In particular, this method resolves the fairness constraint, which

DSSA

lacks [20].

(2) The second algorithm is

FBIM 2

, which combines seed selection with the online processing influence maximization algorithm, known as

OPIMC

(see [15]) to ensure maximum coverage and guarantee the fairness constraint.

FBIM 2

has

O (k T log \frac{n}{ϵ^{2} k})

complexity. Similar to

DSSA

,

OPIMC

does not solve the fairness constraint.

(3) The last is the

FBIM 3

algorithm, which improves on

FBIM 2

using the greedy technique with a threshold criterion for selecting a seed set.

FBIM 3

has

O (\frac{T}{ϵ} log \frac{k}{ϵ} log \frac{n}{ϵ^{2} k})

complexity. Significantly, the seed set’s distribution guarantees a high coverage ratio, which is an expression of ensuring the fairness constraint.

We further investigate the performance of our algorithms by conducting experiments for the $FBIM$ under both well-known diffusion models, namely, Linear Threshold and Independent Cascade [3], on real social networks. The results indicate that the seed sets of our algorithms have a coverage ratio with respect to online communities that is greater than the results with $OPIMC$ ; specifically, it is 2 to 10 times larger than and there are even cases in which $FBIM$ reaches $100 %$ coverage for the target communities. This process depends on the appropriate parameter selection for each dataset. An extensive coverage ratio signifies the number of chosen seeds covering the target communities, ensuring that the impacted individuals are the ones we want to influence. In addition, the efficiency of $FBIM$ algorithms must be influenced by objective factors due to the implementation method and fairness constraints. This leads to higher cost with a lower objective function value than $OPIMC$ . Nevertheless, the results guarantee the optimal theoretical approximation, especially the fairness constraint.

Organization. The rest of this article includes six sections. Related works is presented in Section 2. Definitions and descriptions related to

FBIM

are presented in Section 3. Section 4 devises the Threshold Greedy method for the Fair Submodular Maximization problem, that is, the generation of

FBIM

. Section 5 contains the proposed algorithms and their theoretical analysis. We evaluate the experimental results in Section 6, and conclude the paper in Section 7.

2. Related Works

According to earlier research [3,5,8,21], the

IM

problem is often addressed using a greedy technique with an approximation of

(1 - 1 / e)

. Although the greedy strategy is quite successful for

IM

, computing the influence function

f (S)

remains challenging, as it is P-hard. Existing approaches for

IM

may be classified into three primary classes based on how the influence function is calculated [7].

(1) Simulation-based approaches, such as Greedy [3], CELF [22], CELF++ [23,24], and UBLF [25], calculate the influence function using Monte Carlo sampling. To achieve highly scalable algorithms for

IM

, heuristic techniques can be merged based on a greedy algorithm. These algorithms aim to produce an

(1 - 1 / e)

approximation answer. These methods have an advantage in that they are suitable for diffusion models. However, their disadvantage is that it is necessary to generate many sample cases in order to calculate the objective function with only minor errors. Hence, this approach significantly increases computational costs.

(2) Proxy-based approaches, such as SimPath [26], Degree [23], PageRank [27,28,29], and EasyIM [30], approximate the influence function

f (S)

to overcome the P-hard problem by devising proxy models. The approximate solution obtained is

(1 - 1 / e - ϵ)

for any

ϵ > 0

. Many algorithms have demonstrated that the proxy-based strategy is efficient for empiricism; however, it does not provide theoretical guarantees.

(3) Sketch-based approaches, such as TIM, TIM+ [31], and IMM [32], use a novel

RIS

sampling method. The goal of these techniques is to produce an

(1 - 1 / e - ϵ)

-approximate solution with a minimal number of

RIS

samples [14]. The drawback of these approaches is that their lower bounded budget is unknown, and the number of generated samplings can be rather large. Furthermore, these algorithms guarantee theoretical efficiency thanks to their rigorously bounded solutions and minimal time complexity. However, because they must ensure the approximation ratio for the worst-case scenario, the practical efficiency of sketch-based strategies may be lower than that of proxy-based approaches.

Nguyen et al. [6] have proposed two new sampling techniques,

SSA

and

DSSA

, which attempt to obtain a small number of

RIS

samples while ensuring

(1 - 1 / e - ϵ)

-approximations. Despite this, Huang et al. [33] discovered that

SSA / DSSA

has certain flaws. They provided SSA-Fix, an updated version of

SSA

. Afterwards, Nguyen et al. presented D-SSA-Fix [34], a significantly updated version of

DSSA

that produces

(1 - 1 / e - ϵ)

-approximations.

In addition, many studies have sought to resolve other variants of

IM

[16,31,32,34]. However, almost all of these methods focus on offline processing. This means that the user does not receive any output until the final result is obtained. Thus, the user cannot terminate the algorithm early to trade solution quality for efficiency. Motivated by this phenomenon, Tang et al. proposed the

OPIMC

algorithm for online processing of influence maximization in [15]. This method allows the user to terminate any timestamp, then obtain a seed set S and report an approximate guarantee

(1 - 1 / e - ϵ)

, such that S is the

IM

problem’s approximate solution with at least

(1 - α)

probability, with

α \in (0, 1)

a user-specified parameter.

Unfortunately, virtually all of the above methods have focused on identifying the most influential nodes to increase the number of nodes affected. This fails to ensure that chosen nodes are evenly distributed across the partitions of the dataset. This shortcoming is a driving force that has attracted the attention of researchers. Recent publications have proposed multiple definitions of fairness and explicitly integrated fairness into the

IM

problem. One such fairness concern, known as group fairness, is meant to ensure that each group receives a fair share of resources and that the distribution of those resources respects the varied makeup of the groups. These studies of group fairness constraints in the

IM

problem have obtained positive results.

Tsang et al. (2019) [35] proposed the issue of optimizing the dissemination of a strategy while keeping a group fairness restriction in mind. The authors developed two fairness measures, maximin fairness and group rationality, to assess group fairness in

IM

. Maximin fairness measures the smallest number of nodes within each group for whom the influence must be maximized. Meanwhile, the main principle of group rationality is that no group may increase its influence by withdrawing from

IM

with its proportional allocation of resources and distributing those resources internally. Both measures aim to ensure that each group receives an equitable share of resources.

Stoica et al. (2020) [36] studied the problem of fair resource allocation in influence maximization. The authors provided an algorithmic framework for locating solutions that satisfy fairness constraints for multi-objective submodular maximization problems. This method increases the diversity of nodes in the seed set, and may have an impact on the effectiveness and fairness of the information diffusion process. The authors demonstrated that seeding methods that consider the diversity of nodes in the seed set are more effective and fair in certain circumstances.

Halabi et al. (2020) [37] worked on the problem of maximizing fair submodular functions, including monotone and non-monotone submodular functions, proposing streaming algorithms for this problem. For the monotone case, the authors achieved two results, the

(1 / 2)

-approximation algorithm and the

(1 / 4)

-approximation algorithm, each of which use

O (log k)

time. For non-monotone case, they achieved a

(q + ϵ)

-approximation with q as an input excess ratio, requiring

O (k)

time. These approaches apply to the creation of fair summaries for massive datasets.

Next, Khajehnejad et al. (2020) [18] studied fair influence maximization in an effort to more fairly reach minority groups. The authors used machine learning approaches to pick a seed set using an adversarial graph embedding technique, which allows for strong impact propagation as well as fairness among communities.

Rahmattalabi et al. (2021) [38] resolved the problem that Tsang et al. studied in [35]. This is the problem of group fairness in the influence maximization problem. However, Rahmattalabi et al. took a different approach, which is to offer a principled characterization of the properties that a fair influence maximization algorithm must meet. The authors designed a framework founded on the social welfare theory that aggregates the cardinal utilities each community derives using isoelastic social welfare functions. In this framework, the trade-off between fairness and efficiency can be handled by a single inequality aversion design parameter.

More recently, Becker et al. (2022) [19] considered the same problem as Tsang et al. [35], proposing a new approach. The authors modeled the problem on the basis of the probabilistic techniques used to choose seed sets, rather than purely deterministic ones. They provided two variations of this probabilistic problem: the node-based problem, which uses probabilistic strategies over nodes, and the set-based problem, which uses probabilistic strategies over sets of nodes. After examining the relationship between the two probabilistic problems, the authors demonstrated that both probabilistic variants provide approximation algorithms that achieve a constant multiplicative factor of

(1 - 1 / e)

, minus an arbitrarily small additive error caused by the simulation of the information spread.

Ali et al. (2022) [39] solved the fairness of the spread process in various groups. They focused on the time-critical aspect of

IM

, examining the number of affected people and the time step at which they are influenced. Subsequently, they found that maximizing the expected number of individuals reached by selecting a fixed cardinality seed set and minimizing the amount of seeds needed to affect a given portion of the network can lead to unfair solutions. The authors proposed an objective function that balances two goals, such as maximizing the expected number of nodes reached and minimizing the maximum disparity in influence between any two communities.

Razaghi et al. (2022) [40] worked on the same group fairness-aware influence maximization problem in social networks as Tsang et al. [35]. However, they fixed an important omission that Tsang et al. overlooked by assessing the time required to receive resources by different nodes of groups. The authors expanded the concept of group fairness in the

IM

problem by examining the speed of node activation in different social network groups. They proposed a multi-objective meta-heuristic (SetMOGWO) founded on the multi-objective gray wolf optimizer in order to increase the fair propagation of information in the

IM

problem as it relates to various fairness measures.

Based on the available literature, the above publications show that previous studies have been concerned with particular aspects or constraints on the problem of group fairness in

IM

. As far as we know, there has been no single study that mentions the fairness constraints on both the lower and upper bounded budgets of the target communities in the dataset, which is the reason and impetus for our to research and our attempts find a solution to this shortcoming.

3. Preliminaries

In this section, we first introduce the definition of the Fair Submodular Maximization (

FSM

) problem, which was studied by Halabi et al. [37], generate the present problem, and recap the Fair Greedy algorithm, which is able to return an approximation of

(1 / 2)

within

O (n k)

queries. We then introduce our studied fairness influence maximization problem and recap certain properties of the problem. Table 1 summarizes the notations commonly used in this paper.

Consider a ground set V of n elements and a submodular function

f : 2^{V} \to R_{+}

; given two sets

S, T \subseteq V

, the marginal gain of S in relation to T is defined as follows:

\begin{matrix} f (S | T) = f (S \cup T) - f (T) \end{matrix}

(2)

For two sets

S, T, S \subseteq T \subseteq V

and an element

v \in V \ T

, the function f is submodular if

\begin{matrix} f (v | S) \geq f (v | T), \end{matrix}

(3)

while f is monotone if

f (T) \geq f (S)

. The

FSM

problem is defined as follows:

Definition 1

(

FSM

problem). Assume that V is divided into

C^{^{'}}

disjoint subsets

V_{1},

V_{2}, \dots, V_{K}

, and

C_{i} \cap C_{j} = \emptyset

. Each subset

V_{i}

has a lower and an upper bound on the number of elements

k_{i}^{l}

and

k_{i}^{u}

that represents the fairness constraint. Let a positive number k be a global cardinality constraint. The

FSM

then seeks to find

\begin{matrix} m a x : & f (S) \end{matrix}

(4)

\begin{matrix} s u b j e c t t o : & | S | \leq k \end{matrix}

(5)

\begin{matrix} k_{i}^{l} \leq | S \cap V_{i} | \leq k_{i}^{u}, i = 1 \dots K \end{matrix}

(6)

We define an instance of the problem

FSM

as a tuple

(f, V, V_{1}, \dots, V_{K}, k)

. The

FSM

problem can then be reduced to a submodular maximization under a matroid constraint (

SMM

) problem [37], defined as follows:

Definition 2

(

SMM

problem). The problem seeks to select a set

S \subseteq V

with

S \in M

such that it maximizes

f (S)

, where

M

is a matroid. A family of sets

M

with subset

2^{V}

is referred to as a matroid if it satisfies the following conditions:

1.: $M \neq \emptyset$ ;
2.: downward-closedness, i.e., if $A \subseteq B$ and $B \in M$ , then $A \in M$ ;
3.: augmentation, i.e., if $A, B \in M$ with $| A | \leq | B |$ , then there exists $s \in B$ such that $A + s \in M$ .

Fair Greedy Algorithm. We now recap the Fair Greedy algorithm (Algorithm 1) [37]. The operation of this algorithm is based on the notation of an extendable set.

Algorithm 1: Fair Greedy Algorithm

Definition 3

(An extendable set S [37]). A set S,

S \subseteq V

is extendable if and only if

\begin{matrix} | S \cap C_{i} | \leq k_{i}^{u}, i = 1, \dots, K \end{matrix}

(7)

and

\begin{matrix} \sum_{i = 1}^{K} max (| S \cap C_{i} |, k_{i}^{l}) \leq k \end{matrix}

(8)

Fairness Budget Influence Maximization Problem.

In this section, we introduce the

FBIM

problem, which is the focus of this study, and its related definitions.

FBIM

inherits two issues: fair submodular maximization (

FSM

) [37] and fair influence maximization (

FIM

) [38].

Definition 4

(

FIM

problem). Consider a graph

G = (V, E)

, where V is the set of vertices,

| V | = n

, E is the set of edges,

| E | = m

, and

k \in Z^{+}

is a global cardinality constraint. Let

C

be a set of disjoint communities (empty intersections) of the graph Each vertex v of V belongs to one of the communities

C_{i} \in C

,

i \in 1, . . ., N

such that

V_{1} \cup . . . \cup V_{N} = V

, where

V_{i}

denotes the set of vertices of the community

C_{i}

. Furthermore, communities may be disconnected, that is,

\forall C_{i}, C_{j} \in C

and

\forall v \in V_{i}, u \in V_{j}

; in this case, there is no edge between v and u. Here,

A

denotes the initial set of vertices (referred to as influencer vertices), and we define

A^{*} : = {A \subset V | | A | \leq k}

as the set of feasible budget influencers. Lastly, for any choice of

A \in A^{*}

, we let

h_{C_{i}} (A)

denote the expected fraction of the influenced vertices of a community

C_{i}

, where the expectation in terms of the spread of influence is reached under a diffusion model. The fair influence maximization problem solves the optimization problem

\begin{matrix} \underset{A \in A^{*}}{m a x i m i z e} \sum_{C_{i} \in C} | V_{i} | . h_{C_{i}} (A) \end{matrix}

(9)

Definition 5

(

FBIM

problem). Consider a social network

G = (V, E, w)

, where

| V | = n

, and a set

C^{^{'}} = \{C_{1}, C_{2}, \dots, C_{K}\}

, where

\forall C_{i} \subset G

and

C_{i} ⋂_{i \neq j} C_{j} = \emptyset

. Each community

C_{i}

has a pair of upper

k_{i}^{u}

and lower

k_{i}^{l}

bounded budgets where

k_{i}^{l} \leq k_{i}^{u}

. For a given total budget k, the problem seeks to find a seed set

S, | S | \leq k

, which satisfies

k_{i}^{l} \leq | S \cap C_{i} | \leq k_{i}^{u}

such that

f (S)

is maximal, where

f (S)

is the influence function. Here,

f (S)

measures the expected number of users in V that can be influenced by the elements under S in a diffusion model.

In this paper, we solve the

FBIM

problem under both the Linear Threshold (

LT

) and Independent Cascade (

IC

) models, which are the well-known and standard diffusion models for the

IM

problem as well as other problems related to the analysis of social networks [6,15,41]. These models were first proposed by Kempe et al. in [3].

Definition 6

(

LT

model). Consider a directed graph

G = (V, E, w)

, where V is the set of vertices,

| V | = n

, and E is the set of edges,

| E | = m

. A vertex v in G is affected by each of its neighbors u with probability

p (u, v)

,

p (u, v) \in w

, and

\sum_{\begin{matrix} u neighbor of v \end{matrix}} p (u, v) \leq 1

. Every vertex v in V is assigned a threshold

Λ_{v}

from the interval

[0; 1]

at random. This threshold is the weighted proportion of v’s neighbors (active vertices) influencing v to become active. The process spreads as follows. Originally, we initialize a random set of threshold values and a seed set of active vertices S while the rest are inactive. In the t-th step, all active vertices in the

(t - 1)

-th step remain active, and the process activates any vertex v with a total weight of active neighbors greater than or equal to

Λ_{v}

:

\begin{matrix} \sum_{\begin{matrix} u neighbor of v \end{matrix}} p (u, v) \leq 1 \end{matrix}

(10)

Definition 7

(

IC

model). Consider a directed graph

G = (V, E, w)

, where V is the set of vertices,

| V | = n

, and E is the set of edges

| E | = m

. At first, the vertices in the seed set S are active, while all remaining nodes are inactive. The process spreads according to the following rule. At the t-th step, if vertex v becomes active first, it has only one chance to activate each neighbor u. The probability of success is

p (v, u)

,

p (v, u) \in w

. If u has multiple active neighbors, their diffusion is sequenced in random order. As soon as vertex v succeeds, vertex u becomes active in the

(t + 1)

-th step. Regardless of whether v succeeds or not, it cannot make any further attempts to activate u in subsequent rounds. In this way, the process works until no more activation is feasible.

For the

FBIM

problem, according to the proposition of Halabi et al. in [37], the greedy method selects the element with the highest marginal gain, where the marginal gain of an element s is the value of

(f (S \cup s) - f (S))

, while meeting the constraints. We note that the greedy approach might not produce a feasible solution if this element was only required to hold the cardinality and upper bounds constraints, as it may meet the global cardinality restriction without meeting the lower bounds. As a result, more careful selection of elements is required. Thus, the seed set S must be an extendable set.

In addition, there are several definitions that are typically used in this paper. These are Reverse Influence Sampling (

RIS

), Reachable Reverse Sets (

RR

set), and the Coverage of seed set S on the set of Reachable Reverse Sets (

{Cov}_{R} (S)

).

Definition 8

(

RIS

[14]). Given a graph

G = (V, E, w)

,

RIS

apprehends the influence scene of G by generating a set

R

of random

RR

sets. An

RR

set contains nodes that can reach v in g, where v is a random node in V and g is a sample graph from G.

Definition 9

(

RR

set [14]). Consider a graph

G = (V, E, w)

under the

IC

model; a random

RR

set

R_{i}

is generated from G according to the following steps:

Step 1. Choose a source vertex $v \in V$ ;
Step 2. Generate a sample graph g from G;
Step 3. Return $R_{i}$ such that it contains vertices that can be reached from v in g.

We refer to the set of random

RR

sets as

R

. According to [6], finding a seed set S and influence spread

f (S)

is based on computing the coverage of S on the majority of

RR

sets. Because of the generation of a set

R

of multiple random

RR

sets, influential vertices may commonly occur in the

RR

sets. Hence, we can find the vertices that occur in the majority of

RR

sets in order to add the seed set S. Moreover, the influence spread

f (S)

on a random

RR

set

R_{i}

is proportional to the probability that S intersects with

R_{i}

. Therefore, a seed set S covers a set

RR

R_{i}

if

S \cap R_{i} \neq \emptyset

. For greater simplicity, we refer to the coverage of S on

R

as

{Cov}_{R} (S)

, and compute it as follows:

\begin{matrix} {Cov}_{R} (S) = \sum_{R_{i} \in R} min {1, | S \cap R_{i} |} \end{matrix}

(11)

Furthermore, the influence spread

f (S)

on a random

RR

set

R_{i}

is proportional to the probability of S intersecting with

R_{i}

. Borgs et al. [14] proposed the following calculation for the influence spread function

f (\cdot)

:

\begin{matrix} f (S) = n \cdot E [min {1, | S \cap R_{i} |}] \end{matrix}

(12)

and for the estimation of

f (S)

over a collection of

RR

sets

R

,

\begin{matrix} \hat{f} (S) = n \cdot {Cov}_{R} (S) / | R | \end{matrix}

(13)

4. Threshold Greedy for $FSM$

In this section, we devise the Threshold Greedy algorithm, which applies the decreasing threshold greedy strategy for the

FSM

problem. This algorithm improves on Algorithm 1 by using a decreasing threshold to reduce the number of data traversals while continuing to ensure the approximate solution.

4.1. Algorithm Description

The Threshold Greedy algorithm operates as follows. At the beginning, it initiates the initial threshold t with M, where

M = m a x_{u \in V} f (u)

. One loop of the inner loop ‘for’ is named one iteration. It scans all elements in S. At each iteration, if the current element s satisfies two conditions,

S \cup s is extendable

and

f (s | S) \geq t

, then s is selected into S. After each loop of the “while” loop, t is decremented by

(1 - ϵ)

times until

t < ϵ M / k

, then the algorithm terminates and returns S. A detailed description of this algorithm is provided in Algorithm 2.

4.2. Theoretical Analysis

The following theoretical analysis along with the proofs in Lemma 1 and Theorem 1 demonstrate the feasibility and efficiency of this algorithm in guaranteeing the approximation solution.

Algorithm 2: Threshold Greedy Algorithm

Lemma 1.

Denote

t_{i}

as threshold t at the i-th iteration and let

S_{i}

be S at the beginning of the i-th iteration. We first show that

\begin{matrix} t_{i} \geq (1 - ϵ) max_{s \in V : S_{i} \cup {s} is extendable} f (s | S_{i}) \end{matrix}

(14)

Proof of Lemma 1.

We prove Lemma 1 using induction. If

i = 1

,

S_{1} = \emptyset

and

t_{1} = M \geq (1 - ϵ) {max}_{s \in V : S_{1} \cup {s} is extendable} f (s)

. Assume that the lemma holds for

i \geq 1

. Then, we have

\begin{matrix} t_{i + 1} = (1 - ϵ) t_{i} \geq (1 - ϵ) max_{s \in V : S_{i + 1} \cup {s} is extendable} f (s | S_{i + 1}) \end{matrix}

(15)

The inequality is due to the fact that the element

s_{m a x} = arg max_{s \in V : S_{i + 1} \cup {s} is extendable} f (s | S_{i + 1})

was not added to S in iteration i. The lemma is proved. □

Theorem 1.

Algorithm 2 requires

O (\frac{n}{ϵ} log \frac{k}{ϵ})

runtime and returns a

(1 / 2 - ϵ)

-approximation solution for the

FSM

problem.

Proof of Theorem 1.

The computational complexity can be easily proven. Assume that there are a total of x number of iterations in the “while” loop of Algorithm 2. Therefore, we have

{(1 - ϵ)}^{x} = \frac{ϵ}{k}

. Solving this equation yields

\begin{matrix} x = \frac{log \frac{k}{ϵ}}{log \frac{1}{1 - ϵ}} \leq \frac{1}{ϵ} log \frac{k}{ϵ} \end{matrix}

(16)

The “for” loop carries out n iterations. Thus, the time complexity of this algorithm is

O (\frac{n}{ϵ} log \frac{k}{ϵ})

.

In addition, assuming that

s_{m a x} = {max}_{s \in V : S_{i} \cup {s} is extendable} f (s | S_{i})

,

S_{i} = S_{i - 1} \cup {s_{1}, s_{2}, \dots, s_{l}}

, that

S^{s}

is S immediately before s is processed, and that

S_{i}^{s} \cup {s}

is extendable, we have the following:

\begin{matrix} t_{i} \geq f (s_{m a x} | S^{s}) \geq f (s_{m a x} | S_{i}) \end{matrix}

(17)

Now, assume that

S = {s_{1}, s_{2}, \dots, s_{u}}, u \leq k

is S after ending the main loop of the algorithm. We can consider the following two cases:

Case 1.

With

| S | = u, u = k

, assume that

O = {o_{1}, o_{2}, \dots, o_{k}}

is the optimal solution such that

{s_{1}, s_{2}, \dots, s_{i - 1}, o_{i}}

can be extended, which exists because the extendability is matroid [37] and satisfies the augmentation property. Denote

S^{i}

to be S immediately before

s_{i}

is added to S, and let

t (s_{i})

be t at iteration

s_{i}

to be added to S. We then have the following:

\begin{matrix} f (S) & = \sum_{i = 1}^{u} f (s_{i} | {s_{1}, s_{2}, \dots, s_{i - 1}}) \\ \geq \sum_{i = 1}^{u} t (s_{i}) \\ \geq \sum_{i = 1}^{u} (1 - ϵ) f (s_{i} | S^{i}) (By Lemma 1) \\ \geq \sum_{i = 1}^{u} (1 - ϵ) f (o_{i} | S^{i}) \\ \geq \sum_{i = 1}^{u} (1 - ϵ) f (o_{i} | {s_{1}, s_{2}, \dots, s_{i - 1}}) \\ \geq \sum_{i = 1}^{u} (1 - ϵ) f (o_{i} | {s_{1}, s_{2}, \dots, s_{u}} \cup {o_{1}, \dots, o_{i - 1}}) (Due to the submodularity of f) \\ = \sum_{i = 1}^{u} (1 - ϵ) f (o_{i} | S \cup {o_{1}, \dots, o_{i - 1}}) \\ = (1 - ϵ) (f (O \cup S) - f (S)) (by equality (2)) \end{matrix}

and thus,

\begin{matrix} f (S) & \geq \frac{(1 - ϵ)}{(2 - ϵ)} f (O) \\ \geq \frac{(1 - 2 ϵ)}{2} f (O) \\ = (\frac{1}{2} - ϵ) f (O) \end{matrix}

Case 2.

With

| S | = u, u < k

and

S = {s_{1}, s_{2}, \dots, s_{u}}

, assume that O is the optimal solution,

S^{'} = {s_{u + 1}, \dots, s_{k}}

, and

S_{k} = {s_{1}, s_{2}, \dots, s_{u}} \cup {s_{u + 1}, \dots, s_{k}}

, meaning that

S_{k} = S \cup S^{'}

. Denote

t_{l a s t}

to be t after ending the main loop of the algorithm. Then, we have the following:

\begin{matrix} \sum_{i = u + 1}^{k} f (s_{i} | {s_{1}, s_{2}, \dots, s_{i - 1}}) & \leq k t_{l a s t} (Due to {(f s_{i} | S)}_{i = u + 1, \dots, k} \leq t_{l a s t}) \\ \leq k \frac{ϵ M}{k} \leq ϵ M \leq ϵ f (O) \end{matrix}

and thus,

\begin{matrix} f (S_{k}) - f (S) \leq ϵ f (O) \end{matrix}

We now have the equivalent inequality, as follows:

\begin{matrix} f (S) & \geq f (S_{k}) - ϵ f (O) \\ \geq (\frac{1}{2} - ϵ) f (O) - ϵ f (O) (By the proof of Case 1) \\ \geq (\frac{1}{2} - 2 ϵ) f (O) \\ \geq (\frac{1}{2} - ϵ^{'}) f (O) (with ϵ^{'} = 2 ϵ) \end{matrix}

The proof is completed. □

As mentioned above,

FSM

is the generation of the

FBIM

problem. Therefore, the Threshold Greedy algorithm is the basis of an important key to designing an efficient algorithm for the

FBIM

problem in the next section. This is the Threshold Greedy Algorithm for Fairness Max Cover, which finds a seed set S such that it maximizes the coverage and satisfies the fairness constraint.

5. Proposed Algorithms for FBIM

This section introduces three main algorithms for the

FBIM

problem, (called

FBIM 1

,

FBIM 2

, and

FBIM 3

), and two algorithms that act as auxiliary procedures for these main algorithms (called Fairness-Max-Coverage procedure and Threshold Greedy Algorithm for Fairness Max Cover procedure). The details of these algorithms are fully presented in Algorithms 3, 4, 5, 6, and 7. Because our proposed algorithms are based on the

DSSA

[6] and

OPIMC

[15] methods, we do not perform the proofs provided in the originals.

5.1. FBIM1—An Algorithm Based on the Stop-and-Stare Method

This method combines an improved greedy strategy for selecting seeds satisfying the fairness constraint with generating sampling of the

DSSA

. The

FBIM 1

algorithm’s fundamental principle is to (1) choose k nodes that occur on the majority of communities and add them to S to maximize the coverage of S on the set of

RR

sets; and (2) compute

f (S)

using the stop-and-stare technique. If the result does not satisfy the threshold condition of the algorithm, the process repeats the search for a new S on the new set of RR sets with double the number of elements. The details of this algorithm are fully presented in Algorithms 3 and 4.

Algorithm 3: Fairness-Max-Coverage (

FMC

) procedure

5.1.1. Algorithm Description

The

FBIM 1

algorithm inputs a K-size communities set

C^{^{'}}

, a budget k, parameters

ϵ

and

δ

, (

0 \leq ϵ, δ \leq 1

), which are related to the solution’s quality, and the algorithm’s runtime. This allows us to adjust the solution quality trade-off with runtime. In particular,

ϵ

and

δ

guarantee the size bound of S. In contrast, k and the set

C^{^{'}}

guarantee the value of the objective function

f (S)

and the coverage ratio of S in communities in

C^{^{'}}

, which is considered to satisfy the fairness constraint.

In the initial step, the algorithm generates a set

R

that contains

Γ

random

R R

sets by

RIS

, with the value of

Γ

initialized as line 1. The formula of

Γ

has been proven in [6]. Subsequently, based on this

R

, Algorithm 3 returns a seed set S.

Algorithm 3 receives a K-size community

C^{^{'}}

set, a budget k and a set

R

. This algorithm produces a k-size seed set S that satisfies fairness constraint. Initiate an empty set S. It performs a loop of k times, each iteration finds an element s in all

C_{j}

of

C^{^{'}}

, except in S, so that

(S + s)

is extendable and the coverage of s on the

RR

sets of

R

is maximal. This element s is added to S. After the loop ends, the algorithm returns the seed set S.

Algorithm 4:

FBIM 1

algorithm

We now turn our attention to Algorithm 4. This algorithm executes an indefinite loop; in each iteration, it evaluates the efficiency of S by computing

f^{^{'}} (S)

on set

R^{^{'}}

and checks whether

(f (S) / f^{^{'}} (S)) - 1

meets the conditions in lines 9 and 14. If this is true, the algorithm stops. Otherwise,

R

doubles by adding

R^{^{'}}

to find a new S and proceeds to the next iteration. The loop stops when

| R |

is at least

(8 + 2 ϵ) n . \frac{l n \frac{2}{δ} + l n (\binom{n}{k})}{ϵ^{2}}

.

5.1.2. Theoretical Analysis

Theorem 2.

Algorithm 3 is an improved greedy algorithm; it has

O (k T)

complexity, with

T = |⋃_{C_{i} \in C^{^{'}}} (C_{i})|

and

T \leq n

.

Proof of Theorem 2.

Algorithm 3 iterates k times to select seeds for the seed set S of size k. Each iteration scans the majority of the elements of

C_{i}

(

\forall C_{i} \in C^{^{'}}

) to select an element that has the greatest coverage on

R

. As a result, the complexity of the algorithm is

O (k T)

. □

Theorem 3.

The complexity of Algorithm 4 is

O (k T log ((8 + 2 ϵ) n \frac{ln \frac{2}{δ} + ln (\binom{n}{k})}{ϵ^{2}}))

.

Proof of Theorem 3.

Algorithm 4 iterates the generation of the set

R

of random

RR

sets with randomly chosen source elements from

C_{i}

and finds

〈S, f (S)〉

on the basis of

R

using Algorithm 3. At lines 14 and 19, Algorithm 4 has two requirements to break the loop. In each iteration,

| R |

is doubled in size and calls Algorithm 3 to find a new S; thus, in the worst case, this algorithm stops when it meets condition line 19, which indicates that the maximum number of iterations is

log ((8 + 2 ϵ) n \frac{ln \frac{2}{δ} + ln (\binom{n}{k})}{ϵ^{2}})

. Consequently, the complexity of this method is

O (k T log ((8 + 2 ϵ) n \frac{ln \frac{2}{δ} + ln (\binom{n}{k})}{ϵ^{2}}))

. □

5.2. FBIM2 & FBIM3—Algorithms Based on the Online Processing of Influence Maximization Method

5.2.1. Algorithm Description

The

FBIM 2

and

FBIM 3

algorithms are exactly similar in their main idea, and are only different in the way they find the seed set S. The main idea of these algorithms is to perform a finite loop that performs the following processing: (1) generate two set of

RR

sets, i.e.,

R_{1}

and

R_{2}

; (2) find a seed set S based on

R_{1}

such that S satisfies the fairness constraint; (3) compute

f_{l} (S)

based on

R_{2}

and

f_{u} (S^{*})

based on

R_{1}

. If the ratio

f_{l} (S) / f_{u} (S^{*})

satisfies the optimal solution approximation (e.g., for

FBIM 2

, it is at least

1 / 2 - ϵ

, and for

FBIM 3

, it is at least

1 / 2 - 2 ϵ

), the algorithm stops before the limit. Otherwise, the algorithm repeats the step in which it finds S, now with

R_{1}

and

R_{2}

doubled in size.

As mentioned above,

FBIM 2

and

FBIM 3

are inherited from the

OPIMC

method of Tang et al. [15]; thus, we have the expressions

f_{l} (S)

and

f_{u} (S^{^{'}}

) as follows:

\begin{matrix} f_{l} (S) = ({(\sqrt{{Cov}_{R_{2}} (S) + \frac{2 ln (1 / δ_{2})}{9}} - \sqrt{\frac{ln (1 / δ_{2})}{2}})}^{2} - \frac{ln (1 / δ_{2})}{18}) . \frac{n}{Γ_{0}} \end{matrix}

(18)

\begin{matrix} f_{u} (S^{*}) = {(\sqrt{\frac{{Cov}_{R_{1}} (S)}{1 / 2} + \frac{ln (1 / δ_{1})}{2}} + \sqrt{\frac{ln (1 / δ_{1})}{2}})}^{2} . \frac{n}{Γ_{0}} \end{matrix}

(19)

For

FBIM 2

, finding the seed set S is similar to

FBIM 1

(Algorithm 3), that is, k elements are selected in all

C_{i}

communities of

C^{^{'}}

, meaning that each element appears the most in communities and guarantees an extendable S after adding to it, which is known as the mandatory condition of the fairness constraint. The details of this algorithm are fully presented in Algorithm 5.

For

FBIM 3

, finding the seed set S is improved by decreasing the search elements times for S. Instead of having to perform k iterations, as

FBIM 2

does,

FBIM 3

only requires at most

\frac{1}{ϵ} log (\frac{k}{ϵ})

iterations. The detail of this process is fully presented in the Threshold Greedy Algorithm for Fairness Max Cover procedure (Algorithm 6). The main idea of this algorithm is to find elements s in

C_{j}

, (

\forall C_{j} \in C^{^{'}}

) that make

S + s

is extendable while maximizing the gain of coverage ratio of s when adding to S on

R

(line 4 and 5). This idea is based on the principle of a simple near-linear time algorithm for the problem of maximizing monotone submodular functions, which was studied by Badanidiyuru and Vondrák [42].

Algorithm 5:

FBIM 2

algorithm

Algorithm 6: Threshold Greedy Algorithm for Fairness Max Cover (ThresholdGreedy (

R

,

ϵ

)) procedure

5.2.2. Theoretical Analysis

Theorem 4.

Algorithm 5 has

O (k T log \frac{n}{ϵ^{2} k})

complexity.

Proof of Theorem 4.

When Algorithm 5 iterates, it generates two random

R R

sets,

R_{1}

and

R_{2}

, with each set having a size of

Γ_{0}

. Later, the algorithm finds a seed set S based on

R_{1}

through Algorithm 3. Next, it calculates the approximation solution

ϵ^{^{'}}

of S, with

ϵ^{^{'}} = \frac{f_{l} (S)}{f_{u} (S^{*})}

. This algorithm has two conditions to stop and return the result on line 11. In each iteration,

Γ_{0}

doubles in size and calls Algorithm 3 to find a new S; thus, in the worst case, this algorithm stops when it has not found S satisfying

ϵ^{^{'}} \geq (1 / 2 - ϵ)

and the number of iterations has reached

i_{m a x}

, with

i_{m a x} = ⌈ log (Γ_{m a x} / Γ_{0}) ⌉

in line 5. In such a case, we have

Γ_{m a x} = \frac{2 n {(\frac{1}{2} \sqrt{ln \frac{6}{δ}} + \sqrt{\frac{1}{2} (ln (\binom{n}{k}) + ln \frac{6}{δ})})}^{2}}{ϵ^{2} k}

(at line 1) and

Γ_{0} = Γ_{m a x} . ϵ^{2} k / n

(at line 2).

After calculating to reduce for

i_{m a x}

, we have

i_{m a x} = ⌈ log \frac{n}{ϵ^{2} k} ⌉

. As a result, this algorithm has

O (k T log \frac{n}{ϵ^{2} k})

complexity. □

Theorem 5.

Algorithm 6 has

O (\frac{T}{ϵ} log \frac{k}{ϵ})

complexity.

Proof of Theorem 5.

Similar to Algorithm 2, in Algorithm 6 the outer “for” loop requires at most

\frac{1}{ϵ} log \frac{k}{ϵ}

iterations, while the inner “for” loop requires a maximum of T iterations. As a result, this algorithm has

O (\frac{T}{ϵ} log \frac{k}{ϵ})

complexity. □

Theorem 6.

Algorithm 7 has

O (\frac{T}{ϵ} log \frac{k}{ϵ} log \frac{n}{ϵ^{2} k})

complexity.

Proof of Theorem 6.

Algorithm 7 operates similarly to Algorithm 5, only differing in the step in which it finds set S (in line 7); that is, Algorithm 7 uses Algorithm 6. Therefore, we can likewise prove that for Algorithm 5, Algorithm 7 has

O (\frac{T}{ϵ} log \frac{k}{ϵ} log \frac{n}{ϵ^{2} k})

complexity. □

Algorithm 7:

FBIM 3

algorithm

6. Experiment

We conducted a number of experiments on the

FBIM

problem in the course of our work. This section describes the experimental process, including the datasets, the algorithms for comparison, parameter setting, evaluation of the results, and discussion.

6.1. Experiment Settings

6.1.1. Datasets

For a comprehensive experiment, we chose four datasets on SNAP [43]. These datasets have medium to large size and diverse numbers of edges, nodes, and communities. The datasets were the Epinions social network (Epinions), the Pokec social network (Pokec), the LiveJournal social network and ground-truth communities (Live-journal), and the Orkut social network and ground-truth communities (Orkut). These datasets have commonly been used to find the seed set with Influence Maximization. Table 2 presents information on the datasets.

6.1.2. Environment

We conducted our experiments on a Linux machine with Intel Xeon Gold 6154 (720) @ 3.700GHz CPUs and 3TB RAM. Our implementation was created using g++ 11 and the C/C++ language.

6.1.3. Algorithm Comparison

To the best of our knowledge, there is no existing algorithm that solves the problem of fairness influence maximization for communities under constraints with a pair of lower and upper bound budgets. Therefore, we experimented to compare our three algorithms with

OPIMC

, as they are similar in terms of implementation. As such, this experiment involves four algorithms, namely,

OPIMC

,

FBIM 1

,

FBIM 2

, and

FBIM 3

, with different sets of input parameters used to analyze and evaluate their effectiveness. For brevity, we refer to our three proposed algorithms as the

FBIM

algorithms. The experimental results are shown in Figure 1, Figure 2, Figure 3 and Figure 4.

In this paper, we do not compare

FBIM 1

with

DSSA

, as we showed the efficiency of

FBIM 1

and compared

FBIM 1

to

DSSA

in our previous study [20].

FBIM 1

has little advantage compared to

FBIM 2

and

FBIM 3

because, as demonstrated by Tang et al. [15],

OPIMC

is more efficient than

DSSA

. Nonetheless, we include

FBIM 1

in this paper as a synthesis of algorithms for our research on the

FBIM

problem. At the same time, we wanted to compare the performance of

FBIM 1

with

FBIM 2

,

FBIM 3

, and

OPIMC

when experimenting with the same set of parameters and data with specific appropriate datasets and parameters,

FBIM 1

has a conspicuously better influence and coverage ratio than

FBIM 2

,

FBIM 3

, and even

OPIMC

(see Figure 3 and Figure 4 for the Pokec and Orkut datasets).

As already indicated, these algorithms obtain S and nearly obtain

| S | ≃ k

, allowing

f (S)

to be maximized. Additionally, whereas

OPIMC

does not satisfy the fairness constraint, our methods of discovering S do. As such, we contrasted variables such as influence

f (S)

, memory usage, runtime, approximation ratio

f_{l} (S) / f_{u} (S^{*})

, and coverage ratio S across the target communities. This ratio is the number of communities with elements selected into S that satisfies the constraints of the lower and upper bounds as compared with the original total number of communities (K).

6.1.4. Parameter Setting

We conducted experiments with two sets of parameter settings:

We set $k \in [1000, 10000]$ , $| C^{'} | = K$ with $K = 20 % k$ , the pair $(k_{i}^{l}; k_{i}^{u})$ of each $C_{i}$ to $(0.3; 0.5)$ , called $〈A l g o r i t h m N a m e〉 . 1$ (such as $FBIM 1$ .1, $FBIM 2$ .1, $FBIM 3$ .1), set $(k_{i}^{l}; k_{i}^{u})$ equal to $(0.1; 0.9)$ , and called $〈A l g o r i t h m N a m e〉 . 2$ similar to case 1. The experiment with these settings is called Experiment 1.
Here, we used the same k as in Experiment 1, with $K = k$ . In this case, K is no longer limited to the problem. Because it is already set to the maximum value, that is, in the ultimate case, if the number of communities covered is exactly K (for $K = k$ ), each community of K communities chooses precisely one element for input S. For the other parameters, the upper bound $k_{i}^{u}$ was assigned a fixed value of $0.5$ while the lower bound $k_{i}^{l}$ varied by $0.1, 0.01$ and $0.001$ , respectively, for $〈A l g o r i t h m N a m e〉 . 3$ , $〈A l g o r i t h m N a m e〉 . 4$ , and $〈A l g o r i t h m N a m e〉 . 5$ . We explain these changes in the discussion of the experimental results. The above case is referred to as Experiment 2.

As mentioned above, we executed these algorithms for both information diffusion models, that is,

LT

and

IC

. Finally, we used the parameters

ϵ = 0.1, δ = \frac{1}{n}

as the default setting.

6.2. Discussion and Evaluation of Experimental Results

In this section, we discuss and evaluate the experimental results in order to clarify the strengths and weaknesses of the different algorithms. The results are clearly shown in Figure 1, Figure 2, Figure 3 and Figure 4.

6.2.1. Objective Factors Affecting the Efficiency of Algorithms

As clarified in the theoretical analysis of the algorithms, our algorithms guarantee the theoretical optimization probability. However, in the experimental process, a number of objective factors affect the performance of the algorithms.

1.

Data preprocessing

(a): Communities detection strategy. We use the Directed Louvain method [44] to detect communities in the datasets. This method applies the idea of simulating the Monte Carlo randomness of the approach. It has been proven that this method is able to achieve a more promising result when extracting communities in case of a directed graph, although does have flaws. In short, our algorithms’ efficiency is affected by the random factor during the communities detection stage.
(b): Communities selection strategy. To ensure objectivity, selecting K communities for the input of the problem $FBIM$ works by choosing randomly, as long as the probability of satisfying the condition for a valid result is within the upper/lower bounds of the fairness constraint. The selection of the combination of communities is entirely random. This selection is repeated many times if the previous combinations do not satisfy the conditions of the $FBIM$ problem. In short, our algorithms are affected by the randomness factor in choosing the initial K communities. However, if we apply these algorithms to real problems, the communities selection stage could depend on a constraint. Thus, a user can improve the selection of elements of S with respect to communities in order to achieve a better result.

2.

RIS

framework

Our algorithms depend on the generation of sample graphs of the

RIS

framework. In the first few iterations of the algorithms, if the size of the vertices intersection set of the sample graphs and the selected communities is empty or small, the algorithm must generate more sample graphs to increase this value (i.e., it must increase the number of vertex degrees). After this necessary condition is satisfied, the algorithm finds a seed set S that meets the fairness constraint. Therefore, our algorithms may take extra time to generate sample graphs. Briefly, our algorithms depend on the random factor in creating the sample graph and the number of sample graphs.

In summary, if the above factors are not good, the algorithm takes more time to run before choosing a set S that meets the requirements of the problem. These weaknesses are involved in one of our directions for improvement in future studies.

6.2.2. Experimental Result Evaluation

Experiment 1. For this setting case, the goal of the experiment is to evaluate the algorithm’s performance by choosing two narrow and wide bound ranges along with the number of small communities. This setting simulates the general requirement in which it is necessary to select a seed set from several specific finite communities. Experiment 1 evaluates an algorithm’s feasibility and performance compared to other algorithms in terms of how disparate they are in runtime and resources.

Hence, Experiment 1 is intended to show how the

FBIM

algorithms perform in a general setting, including a large range

[k_{i}^{l}, k_{i}^{u}]

and a small K. The results in Figure 1 and Figure 2 show that most of the

FBIM

algorithms run faster than

OPIMC

; in certain cases, the

FBIM

algorithms are

4 x

to

12 x

faster than

OPIMC

, especially with increased k. The essence of these algorithms is that

OPIMC

uses the greedy strategy, which scans all elements on V to find S on the

R

set, while,

FBIM 2

and

FBIM 3

use an improved greedy approach which scans all elements in communities of

C^{^{'}}

(| ⋃_{C_{i} \in C^{^{'}}} (C_{i}) | \leq n)

. However, there are cases in which the runtime of the

FBIM

algorithms is extended more than

OPIMC

, as they are affected by the aforementioned objective factors. Intuitively, the instances of the

FBIM

algorithms can run longer than

OPIMC

when k is small. Furthermore,

FBIM 1

sometimes runs longer than

OPIMC

because

DSSA

(the base method of

FBIM 1

) generates more sample graphs than

OPIMC

. This assertion has been made clear by Tang et al. in [15].

Though the

FBIM

algorithms typically run faster than

OPIMC

, their influence on

f (S)

is less than that of

OPIMC

. The difference varies by

1.5 x

to

10 x

. The reason for this is the small value of K, as when K is small there are fewer communities to cover. This leads to faster calculations while satisfying the fairness constraint. Nevertheless, this is a double-edged sword; because the set of K communities are chosen at random, there is a smaller domain in which to select the seeds, although less time is required to guarantee fairness. Consequently, the algorithm may not always choose nodes with the largest influence in accordance with the improved greedy strategy.

Experiment 2. The goal of this experiment is to evaluate the theoretical potential of the algorithm with the requirement that it cover as many communities as possible; as such, K is not explicitly set, and the lower bound changes while the upper bound remains the same. The lower bound determines the minimum number of elements in the seed set that must be selected from each community to satisfy the requirement that S be extendable. The upper bound is merely a constraint on the maximum number of elements selected from each community.

Thus, for this setting example we want to know the potential value of K to choose a seed set S of size k such that S can cover the majority of the target communities. As a result, we do not specify a value for K. In this case, Figure 3 and Figure 4 show that the lower bound strongly affects the results of the

FBIM

algorithms. In these figures, the influence of

〈A l g o r i t h m N a m e〉 . 4

and

〈A l g o r i t h m N a m e〉 . 5

increases significantly compared to

〈A l g o r i t h m N a m e〉 . 3

;

〈A l g o r i t h m N a m e〉 . 3

has the same lower bound setting as

〈A l g o r i t h m N a m e〉 . 2

. The coverage ratio of the

FBIM

algorithms is greater than that of

OPIMC

, from 2x to 10x. Even in specific cases, the coverage ratio of

OPIMC

is only

1.5 %

(for Pokec), while the ratio of the

FBIM

algorithms can achieve 33.9–43.9%. For the Orkut dataset, the most extensive dataset of the four,

FBIM 1

achieves

100 %

. Furthermore, the value of the coverage ratio and the disparity of these ratios between the

FBIM

algorithms and

OPIMC

are proportional to k. In particular, because

〈A l g o r i t h m N a m e〉 . 5

has the smallest lower bound in the lower bounds of the experiments, the algorithm’s coverage ratios are better and the difference distance of the

f (S)

value from

OPIMC

is shortened.

Contrarily, set S covers the desired communities, meaning that the affected individuals are the target elements to be affected. In conclusion, the quality of the seed set of the

FBIM

algorithms can be dealt with by controlling the lower bound

k_{i}^{l}

and k. Nevertheless, lengthy runtime is a disadvantage. The reason for this is that in order to find S,

OPIMC

only needs to see the first k seeds that meet its requirements, whereas the

FBIM

algorithms need to find k seeds such that (1) the same requirements are satisfied as for

OPIMC

, and (2) the requirement that S be extendable is satisfied. Therefore, the more communities S covers, the more vertex lists must be examined in order to ensure the fairness constraint. This assertion is clearly shown in the experiments with the algorithms in both the

LT

and

IC

propagation models.

Considering the memory usage of these algorithms, in two parameters settings under both

IC

and

LT

models, the

FBIM

algorithms almost always require more memory usage, as the datasets contain large communities (especially Epinions Live-journal, and Orkut). This is inevitable due to the

FBIM

algorithms needing to take an additional step to store and process information about communities. Briefly, although the

FBIM

algorithms obtain S with an influence spread

f (S)

smaller than that of

OPIMC

, their running time is usually faster, and most importantly, they solve the fairness constraint. For the convenience of readers, we summarize the results of our experiments using the

FBIM

algorithms and

OPIMC

algorithm in Table 3.

7. Conclusions and Future Work

In this paper, we propose three algorithms to resolve the

FBIM

problem, which is the

FIM

problem under a budget threshold constraint and while setting upper and lower bounds to choose seeds in each community in order to ensure fairness. The main result of this study is the finding that our approximation algorithms achieve a

(1 / 2 - ϵ)

-approximation of the optimal solution, and require

O (k T log ((8 + 2 ϵ) n \frac{ln \frac{2}{δ} + ln (_{k}^{n})}{ϵ^{2}}))

,

O (k T log \frac{n}{ϵ^{2} k})

, and

O (\frac{T}{ϵ} log \frac{k}{ϵ} log \frac{n}{ϵ^{2} k})

complexity, respectively. We compare our algorithms with the state-of-the-art

OPIMC

algorithm by conducting experiments under both the

LT

and

IC

information diffusion models. The experiments help to confirm our proven theoretical results. At the same time, we present our algorithms’ advantages and disadvantages through the analysis and evaluation of our experimental results. The results indicate that our algorithms are highly scalable and achieve results that satisfy both theoretical assurances and approximation solutions. In addition, these algorithms are feasible and effective even with big data. In future work, we plan to improve the objective factors that affect the efficiency of these algorithms in order to obtain shorter runtime and a more significant influence spread in larger and more varied datasets.

Author Contributions

Conceptualization, B.-N.T.N. and V.-V.L.; Formal analysis, B.-N.T.N.; Investigation, P.N.H.P.; Methodology, B.-N.T.N., P.N.H.P. and V.-V.L.; Project administration, B.-N.T.N.; Resources, B.-N.T.N.; Software, B.-N.T.N. and V.-V.L.; Supervision, V.S.; Validation, V.S.; Writing—original draft, B.-N.T.N.; Writing—review and editing, B.-N.T.N., P.N.H.P., V.-V.L. and V.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Ho Chi Minh city University of Food Industry (HUFI), Ton Duc Thang University (TDTU), and VŠB-Technical University of Ostrava (VŠB-TUO).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All real-world social network datasets used in the experiments can be downloaded at http://snap.stanford.edu/data/ (accessed on 15 August 2022).

Acknowledgments

The authors are thankful for the support of Ton Duc Thang University (TDTU), Ho Chi Minh City University of Food Industry (HUFI), and VŠB-Technical University of Ostrava (VŠB-TUO).

Conflicts of Interest

The authors declare that they have no competing interests. The study’s design, data collection, analysis, and interpretation, the preparation of the paper, and the choice to publish the findings were all made independently of the funders.

References

Heidemann, J.; Klier, M.; Probst, F. Online social networks: A survey of a global phenomenon. Comput. Netw. 2012, 56, 3866–3878. [Google Scholar] [CrossRef]
Banerjee, S.; Jenamani, M.; Pratihar, D.K. A survey on influence maximization in a social network. Knowl. Inf. Syst. 2020, 62, 3417–3455. [Google Scholar] [CrossRef] [Green Version]
Kempe, D.; Kleinberg, J.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar]
Richardson, M.; Domingos, P.M. Mining knowledge-sharing sites for viral marketing. In Proceedings of the Eighth SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 61–70. [Google Scholar]
Chen, W.; Wang, C.; Wang, Y. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–28 July 2010; pp. 1029–1038. [Google Scholar]
Nguyen, H.T.; Thai, M.T.; Dinh, T.N. Stop-and-stare: Optimal sampling algorithms for viral marketing in billion-scale networks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD, San Francisco, CA, USA, 26 June–1 July 2016; pp. 695–710. [Google Scholar]
Li, Y.; Fan, J.; Wang, Y.; Tan, K.L. Influence maximization on social graphs: A survey. IEEE Trans. Knowl. Data Eng. 2018, 30, 1852–1872. [Google Scholar] [CrossRef]
Kempe, D.; Kleinberg, J.; Tardos, É. Maximizing the spread of influence through a social network. Theory Comput. 2015, 11, 105–147. [Google Scholar] [CrossRef]
Banerjee, A.; Chandrasekhar, A.G.; Duflo, E.; Jackson, M.O. The diffusion of microfinance. Science 2013, 341, 1236498. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yadav, A.; Wilder, B.; Rice, E.; Petering, R.; Craddock, J.; Yoshioka-Maxwell, A.; Hemler, M.; Onasch-Vera, L.; Tambe, M.; Woo, D. Bridging the gap between theory and practice in influence maximization: Raising awareness about hiv among homeless youth. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, Stockholm, Sweden, 13–19 July 2018; pp. 5399–5403. [Google Scholar]
Mirzasoleiman, B.; Babaei, M.; Jalili, M. Immunizing complex networks with limited budget. EPL (Europhys. Lett.) 2012, 98, 38004. [Google Scholar] [CrossRef]
Du, N.; Song, L.; Gomez Rodriguez, M.; Zha, H. Scalable influence estimation in continuous-time diffusion networks. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information, Lake Tahoe, NV, USA, 5–8 December 2013; Volume 26. [Google Scholar]
Li, J.; Cai, T.; Mian, A.; Li, R.H.; Sellis, T.; Yu, J.X. Holistic influence maximization for targeted advertisements in spatial social networks. In Proceedings of the 34th International Conference on Data Engineering (ICDE), Paris, France, 16–19 April 2018; pp. 1340–1343. [Google Scholar]
Borgs, C.; Brautbar, M.; Chayes, J.; Lucier, B. Maximizing social influence in nearly optimal time. In Proceedings of the Twenty-Fifth Annual Symposium on Discrete Algorithms, SODA, ACM-SIAM, Portland, OR, USA, 5–7 January 2014; pp. 946–957. [Google Scholar]
Tang, J.; Tang, X.; Xiao, X.; Yuan, J. Online processing algorithms for influence maximization. In Proceedings of the International Conference on Management of Data, SIGMOD ’18, Houston, TX, USA, 10–15 June 2018; pp. 991–1005. [Google Scholar]
Pham, C.V.; Ha, D.K.; Vu, Q.C.; Su, A.N.; Hoang, H.X. Influence maximization with priority in online social networks. Algorithms 2020, 13, 183. [Google Scholar] [CrossRef]
Sun, G.; Chen, C.-C. Influence maximization algorithm based on reverse reachable set. Math. Probl. Eng. 2021, 2021, 5535843. [Google Scholar] [CrossRef]
Khajehnejad, M.; Rezaei, A.A.; Babaei, M.; Hoffmann, J.; Jalili, M.; Weller, A. Adversarial graph embeddings for fair influence maximization over social networks. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI, ijcai.org, Yokohama, Japan, 11–17 July 2020; Bessiere, C., Ed.; pp. 4306–4312. [Google Scholar]
Becker, R.; D’Angelo, G.; Ghobadi, S.; Gilbert, H. Fairness in influence maximization through randomization. J. Artif. Intell. Res. 2022, 73, 1251–1283. [Google Scholar] [CrossRef]
Nguyen, B.N.T.; Pham, P.N.; Tran, L.H.; Pham, C.V.; Snášel, V. Fairness budget distribution for influence maximization in online social networks. In Proceedings of the 2021 International Conference on Artificial Intelligence and Big Data in Digital Era” (ICABDE 2021), Ho Chi Minh City, Vietnam, 18–19 December 2021. [Google Scholar]
Udwani, R. Multi-objective maximization of monotone submodular functions with cardinality constraint. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems NeurIPS 2018, Montréal, QC, Canada, 3–8 December 2018; Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; pp. 9513–9524. [Google Scholar]
Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; Glance, N. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 420–429. [Google Scholar]
Chen, W.; Wang, Y.; Yang, S. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 199–208. [Google Scholar]
Goyal, A.; Lu, W.; Lakshmanan, L.V. Celf++: Optimizing the greedy algorithm for influence maximization in social networks. In Proceedings of the 20th International Conference Companion on World Wide Web, Hyderabad, India, 28 March–1 April 2011; pp. 47–48. [Google Scholar]
Zhou, C.; Zhang, P.; Zang, W.; Guo, L. On the upper bounds of spread for greedy algorithms in social network influence maximization. IEEE Trans. Knowl. Data Eng. 2015, 27, 2770–2783. [Google Scholar] [CrossRef]
Goyal, A.; Lu, W.; Lakshmanan, L.V. Simpath: An efficient algorithm for influence maximization under the linear threshold model. In Proceedings of the IEEE 11th International Conference on Data Mining, Vancouver, BC, Canada, 11–14 December 2011; pp. 211–220. [Google Scholar]
He, X.; Kempe, D. Stability of influence maximization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 1256–1265. [Google Scholar]
Liu, Q.; Xiang, B.; Chen, E.; Xiong, H.; Tang, F.; Yu, J.X. Influence maximization over large-scale social networks: A bounded linear approach. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China, 3–7 November 2014; pp. 171–180. [Google Scholar]
Page, L.; Brin, S.; Motwani, R.; Winograd, T. The Pagerank Citation Ranking: Bringing Order to the Web; Technical Report 1999-66; Previous Number = SIDL-WP-1999-0120; Stanford InfoLab: Stanford, CA, USA, 1999. [Google Scholar]
Galhotra, S.; Arora, A.; Roy, S. Holistic influence maximization: Combining scalability and efficiency with opinion-aware models. In Proceedings of the 2016 International Conference on Management of Data, Francisco, CA, USA, 26 June–1 July 2016; pp. 743–758. [Google Scholar]
Tang, Y.; Xiao, X.; Shi, Y. Influence maximization: Near-optimal time complexity meets practical efficiency. In Proceedings of the International Conference on Management of Data, SIGMOD, Snowbird, UT, USA, 22–27 June 2014; Dyreson, C.E., Li, F., Özsu, M.T., Eds.; pp. 75–86. [Google Scholar]
Tang, Y.; Shi, Y.; Xiao, X. Influence maximization in near-linear time: A martingale approach. In Proceedings of the 2015 SIGMOD International Conference on Management of Data, Victoria, Australia, 31 May–4 June 2015; Sellis, T.K., Davidson, S.B., Ives, Z.G., Eds.; pp. 1539–1554. [Google Scholar]
Huang, K.; Wang, S.; Bevilacqua, G.; Xiao, X.; Lakshmanan, L.V. Revisiting the stop-and-stare algorithms for influence maximization. Proc. VLDB Endow. 2017, 10, 913–924. [Google Scholar] [CrossRef] [Green Version]
Nguyen, H.T.; Dinh, T.N.; Thai, M.T. Revisiting of ‘revisiting the stop-and-stare algorithms for influence maximization’. In Computational Data and Social Networks; Chen, X., Sen, A., Li, W.W., Thai, M.T., Eds.; Springer International Publishing: Cham, Switzerlands, 2018; pp. 273–285. [Google Scholar]
Tsang, A.; Wilder, B.; Rice, E.; Tambe, M.; Zick, Y. Group-fairness in influence maximization. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI, ijcai.org, Macao, China, 10–16 August 2019; Kraus, S., Ed.; pp. 5997–6005. [Google Scholar]
Stoica, A.-A.; Han, J.X.; Chaintreau, A. Seeding network influence in biased networks and the benefits of diversity. In Proceedings of the Web Conference, Taipei, Taiwan, 20–24 April 2020; pp. 2089–2098. [Google Scholar]
El Halabi, M.; Mitrović, S.; Norouzi-Fard, A.; Tardos, J.; Tarnawski, J.M. Fairness in streaming submodular maximization: Algorithms and hardness. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, Online, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; [Google Scholar]
Rahmattalabi, A.; Jabbari, S.; Lakkaraju, H.; Vayanos, P.; Izenberg, M.; Brown, R.; Rice, E.; Tambe, M. Fair influence maximization: A welfare optimization approach. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, virtually, 2–9 February 2021; pp. 11630–11638. [Google Scholar]
Ali, J.; Babaei, M.; Chakraborty, A.; Mirzasoleiman, B.; Gummadi, K.; Singla, A. On the fairness of time-critical influence maximization in social networks. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 1541–1542. [Google Scholar]
Razaghi, B.; Roayaei, M.; Charkari, N. On the Group-Fairness-Aware Influence Maximization in Social Networks. IEEE Trans. Comput. Soc. Syst. 2022. [Google Scholar] [CrossRef]
Huang, H.; Shen, H.; Meng, Z.; Chang, H.; He, H. Community-based influence maximization for viral marketing. Appl. Intell. 2019, 49, 2137–2150. [Google Scholar] [CrossRef]
Badanidiyuru, A.; Vondrák, J. Fast algorithms for maximizing submodular functions. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, OR, USA, 5–7 January 2014; pp. 1497–1514. [Google Scholar]
Leskovec, J.; Krevl, A. SNAP Datasets: Stanford Large Network Dataset Collection. Available online: http://snap.stanford.edu/data (accessed on 15 August 2022).
Dugué, N.; Perez, A. Directed Louvain: Maximizing Modularity in Directed Networks. Ph.D. Thesis, Université d’Orléans, Orléans, France, 2015. [Google Scholar]

Figure 1. Running time, memory usage, and influence for Experiment 1 under the

LT

model.

Figure 1. Running time, memory usage, and influence for Experiment 1 under the

LT

model.

Figure 2. Running time, memory usage, and influence for Experiment 1 under the

IC

model.

Figure 2. Running time, memory usage, and influence for Experiment 1 under the

IC

model.

Figure 3. Running time, influence, and coverage ratio for Experiment 2 under the

LT

model.

Figure 3. Running time, influence, and coverage ratio for Experiment 2 under the

LT

model.

Figure 4. Running time, influence, and coverage ratio for Experiment 2 under the

IC

model.

Figure 4. Running time, influence, and coverage ratio for Experiment 2 under the

IC

model.

Table 1. Table of the usual notation used in the maximizing submodular problem under fairness constraint.

Notation	Description
n	the number of nodes in the graph.
V	the set of nodes in the graph G, $\| V \| = n$ .
$2^{V}$	the subset family of V.
m	the number of edges in the graph.
E	the set of edges in the graph G, $\| E \| = m$ .
w	the set of edge weights in the graph.
v	a random node in the graph.
u	a neighbor node of v in the graph.
k	a total budget, which is upper bound of $\| S \|$ .
K	the number of target communities is selected for the $FBIM$ ’s input.
$C^{^{'}}$	the set of K communities in network G.
$C$	a set of disjoint communities of the graph, $\| C \| = N$
N	the size of a set $C$
$C_{i}, C_{j}$	the i-th community and the j-th community.
$V_{i}$	the set of nodes of the community $C_{i}$
$k_{i}^{l}$	the lower bounded budget of the community $C_{i}$ .
$k_{i}^{u}$	the upper bounded budget of the community $C_{i}$ .
S	the returned size-k seed set of algorithms.
$S^{*}$	an optimal size-k seed set.
$R_{i}, R_{j}$	a random $RR$ set.
T	the number of nodes in $C^{^{'}}$ , $T = \|⋃_{C_{i} \in C^{^{'}}} (C_{i})\|$ and $T \leq n$ .
$R, R^{^{'}}, R_{1}, R_{2}$	the set of random $RR$ sets.
${Cov}_{R} (S)$	number of $RR$ sets in R incident at some node in S.
$f (S)$	influence spread of a seed set S
$f_{l} (S), f_{u} (S^{*})$	the lower bound of $f (S)$ and the upper bound of $f (S^{*})$ .
$\hat{f} (S)$	an estimation of $f (S)$ on a collection of $R R$ sets $R$
$E$	an expected value

Table 2. Statistics of the datasets.

Dataset	#Nodes	#Edges	#Communities	Avg. Degree	Type
Epinions	131,828	841,372	6359	13.4	Directed
Pokec	1,632,803	30,622,564	1284	37.5	Directed
Live-journal	3,997,962	34,681,189	5000	28.5	Directed
Orkut	3,072,441	117,185,083	4745	76.3	Undirected

Table 3. Statistical comparison of experimental results of

FBIM

algorithms vs.

OPIMC

.

Table 3. Statistical comparison of experimental results of

FBIM

algorithms vs.

OPIMC

.

Experiment 1	Runtime	faster than 4x to 12x
	Memory	larger than 1x to 2.7x
	Influence	less than 1.5x to 10x
Experiment 2	Runtime	faster than 0.5x to 6x
	Influence	less than 2x to 6x
	Coverage ratio	greater than 1x to 5x (for Orkut, even $FBIM$ 1 can reach $100 %$ ; for Pokec, $FBIM$ s can achieve $33.9 % \to 43.9 %$ while $OPIMC$ achieves $1.5 %$ )

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, B.-N.T.; Pham, P.N.H.; Le, V.-V.; Snášel, V. Influence Maximization under Fairness Budget Distribution in Online Social Networks. Mathematics 2022, 10, 4185. https://doi.org/10.3390/math10224185

AMA Style

Nguyen B-NT, Pham PNH, Le V-V, Snášel V. Influence Maximization under Fairness Budget Distribution in Online Social Networks. Mathematics. 2022; 10(22):4185. https://doi.org/10.3390/math10224185

Chicago/Turabian Style

Nguyen, Bich-Ngan T., Phuong N. H. Pham, Van-Vang Le, and Václav Snášel. 2022. "Influence Maximization under Fairness Budget Distribution in Online Social Networks" Mathematics 10, no. 22: 4185. https://doi.org/10.3390/math10224185

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Influence Maximization under Fairness Budget Distribution in Online Social Networks

Abstract

1. Introduction

2. Related Works

3. Preliminaries

4. Threshold Greedy for FSM

4.1. Algorithm Description

4.2. Theoretical Analysis

5. Proposed Algorithms for FBIM

5.1. FBIM1—An Algorithm Based on the Stop-and-Stare Method

5.1.1. Algorithm Description

5.1.2. Theoretical Analysis

5.2. FBIM2 & FBIM3—Algorithms Based on the Online Processing of Influence Maximization Method

5.2.1. Algorithm Description

5.2.2. Theoretical Analysis

6. Experiment

6.1. Experiment Settings

6.1.1. Datasets

6.1.2. Environment

6.1.3. Algorithm Comparison

6.1.4. Parameter Setting

6.2. Discussion and Evaluation of Experimental Results

6.2.1. Objective Factors Affecting the Efficiency of Algorithms

6.2.2. Experimental Result Evaluation

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4. Threshold Greedy for $FSM$