Maximizing the Influence Spread in Social Networks: A Learning-Automata-Driven Discrete Butterfly Optimization Algorithm

Tang, Jianxin; Zhu, Hongyu; Lan, Jimao; Zhang, Li; Song, Shihui

doi:10.3390/sym15010117

Open AccessArticle

Maximizing the Influence Spread in Social Networks: A Learning-Automata-Driven Discrete Butterfly Optimization Algorithm

by

Jianxin Tang

^1,2,*,

Hongyu Zhu

²,

Jimao Lan

²,

Li Zhang

² and

Shihui Song

²

¹

Wenzhou Engineering Institute of Pump & Valve, Lanzhou University of Technology, Wenzhou 325100, China

²

School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(1), 117; https://doi.org/10.3390/sym15010117

Submission received: 27 November 2022 / Revised: 24 December 2022 / Accepted: 27 December 2022 / Published: 31 December 2022

Download

Browse Figures

Versions Notes

Abstract

:

Influence maximization aims at the identification of a small group of individuals that may result in the most wide information transmission in social networks. Although greedy-based algorithms can yield reliable solutions, the computational cost is extreme expensive, especially in large-scale networks. Additionally, centrality-based heuristics tend to suffer from the problem of low accuracy. To solve the influence maximization problem in an efficient way, a learning-automata-driven discrete butterfly optimization algorithm (LA-DBOA) mapped into the network topology is proposed in this paper. According to the LA-DBOA framework, a novel encoding mechanism and discrete evolution rules adapted to network topology are presented. By exploiting the asymmetry of social connections, a modified learning automata is adopted to guide the butterfly population toward promising areas. Based on the topological features of the discrete networks, a new local search strategy is conceived to enhance the search performance of the butterflies. Extensive experiments are conducted on six real networks under the independent cascade model; the results demonstrate that the proposed algorithm achieves comparable influence spread to that of CELF and outperforms other classical methods, which proves that the meta-heuristics based on swarm intelligence are effective in solving the influence maximization problem.

Keywords:

social network; influence maximization; swarm intelligence; learning automata; discrete butterfly optimization algorithm

1. Introduction

Nowadays, facilitating the interactions of information among people through social networks has become one of the most primary services of communicating media, such as Twitter, Facebook, TikTok, etc. An increasing amount of work is being conducted by using social networks in different scenarios, for example, dynamical community detection [1], evaluation of online public opinion [2], community partition [3], etc. Word-of-mouth is a popular method widely used in viral marketing [4,5]. An example for toys in viral marketing is product promotion, where the campaign manager provides the product for free or at a discount to some influential individuals such as adopters, expecting them to recommend the product to their friends, and friends then recommend it to their friends by word of mouth to make the maximum profit. This process was considered as the influence maximization (IM) problem by Domings and Richardson [6] from the network perspective.

IM is aimed to select a small group of influential nodes as the seed set to maximize the spread of the influence coverage in social networks. Kempe et al. [7] proved that influence maximization is an NP-hard problem, and then they developed a greedy method based on a hill-climbing strategy to solve this problem. Although greedy-based algorithms can yield reliable solutions, tens of thousands Monte Carlo simulations are required for the approximate accuracy of each evaluation, which is highly time-consuming. Thus, many efforts have been made to enhance the efficiency of the greedy algorithm, such as the well-known CELF [8] and CELF++ [9]. However, the greedy-based algorithms still require considerable computation resources, which significantly limits the scalability of the methods in large-scale networks.

In recent years, many researchers have been drawn to the problem on how to improve the efficiency of influence maximization algorithms, especially in large-scale social networks. Heuristic algorithms [10,11,12] based on network topology have been widely adopted to solve the influence maximization problem. However, such approaches can usually more efficiently obtain solutions than greedy algorithms by sacrificing high accuracy. Meta-heuristic algorithms [13,14,15], which simulate the foraging behavior of biological populations or the transformation of physical phenomenon features, were recently utilized to tackle influence maximization. Such methods replace the time-consuming Monte Carlo mechanism with a targeted evaluating function, and the efficiency is accelerated compared with that of greedy algorithms; meanwhile, the solution accuracy is significantly improved compared with that of centrality-based heuristics through discrete evolutionary mechanisms. However, meta-heuristics tend to excessively pursue one metric, such as efficiency, but lead to inferiority in the other indicators, especially in large-scale networks. Furthermore, some methods [16,17] based on learning mechanisms have been proposed to guide the evolution of the algorithms to achieve enhanced performance.

The butterfly optimization algorithm (BOA) [15] is an effective swarm intelligence algorithm in various fields. Tiwari et al. [18] combined a dynamic butterfly optimization algorithm with a mutual information-based feature interaction maximization scheme for selecting the optimal feature subset in feature selection. Sundaravadivel et al. [19] designed a weighted butterfly optimization algorithm with an intuitionistic fuzzy Gaussian function based adaptive-neuro fuzzy inference system classifier to predict the number of COVID-19 disease cases. An adaptive sampling test and search space reduction method was developed for the structure identification problem by Zhou et al. [20]. In addition, the experimental results reported in the literature [15] show that the performance of BOA is better than that of particle swarm optimization (PSO) [21], the artificial bee colony (ABC) algorithm [22], the firefly algorithm (FA) [23], and monarch butterfly optimization (MBO) [24] when handling continuous optimization problems. In order to solve the influence maximization problem with a tradeoff between efficiency and effectiveness in real-world networks, a learning-automaton-driven discrete butterfly optimization algorithm (LA-DBOA) was developed in this study. To the best of our knowledge, this is the fist time that the BOA was discretized into a network topology to solve the IM problem. Our study’s main contributions are as follows:

A novel encoding method and discrete evolution rules are proposed by modifying the original butterfly optimization algorithm to solve the influence maximization in real-world networks;
A modified learning automata is adopted to guide the butterfly population toward promising areas, and a novel local search strategy is employed to enhance the search capability of LA-DBOA;
Experimental results on six real networks under the independent cascade model show that LA-DBOA outperforms other algorithms and achieves comparable results to CELF in terms of the influence spread.

The rest of this paper is organized as follows: Section 2 describes the literature related to influence maximization. The definitions of the influence maximization problem, influence spreading models, and influence estimation function are given in Section 3. Details of the proposed algorithm are shown in Section 4. Section 5 provides the experimental results and analysis. Finally, the conclusions and future work are summarized in Section 6.

2. Related Work

Since Domingos and Richardson [6] formulated the influence maximization as a mathematical problem, researchers have proposed a number of methods to address the problem of how to select a group of individuals with stronger influence maximized in social networks. Generally, the algorithms can be mainly classified into the following three categories: greedy-based, centrality-based heuristic, and meta-heuristic algorithms.

2.1. Greedy-Based Algorithms

Kempe et al. [7] first modeled influence maximization as an optimization problem and proved that it is an NP-hard problem under two classical propagation models: linear threshold model(LT) and independent cascade model(IC). Then, a greedy algorithm based on the hill-climbing strategy was proposed to solve this problem by the authors. However, whenever the next seed node was selected, the greedy algorithm needed to evaluate almost all the nodes in the network one-by-one, thus resulting in a time-consuming process, especially in large-scale networks. Hence, in order to improve the efficiency of the greedy method, Leskovec et al. [8] proposed a greedy algorithm based on a lazy-forward strategy called cost-efficient lazy forward (CELF), which exploits the sub-modularity of the objective function; correspondingly, the computational cost is reduced by two orders of magnitude compared with that of the greedy algorithm. Inspired by CELF, an improved CELF algorithm named CELF++ was proposed by Goyal et al. [9], which improves the efficiency of the algorithm over CELF by more than 50% in some cases.

After that, by taking advantage of the high efficiency of centrality-based approaches and the satisfying reliability of the greedy strategy, Kundu and Pal [25] proposed a greedy algorithm based on the discarded strategy (DGS). The algorithm uses a centrality approach to rank the nodes in the networks. When the seed nodes are selected in each round based on the greedy strategy, the current node is marked as discarded if the actual marginal revenue of the current target node is lower than the expected revenue of its subsequent k nodes. To reduce the memory consumption when dealing with large-scale social networks, Lu et al. [26] designed a recursive equation to approximate the reachable probability of influence propagation between pairs of nodes in the network, and the node with the largest current recursive valuation is selected as the seed node based on the greedy strategy.

Nowadays, although the greedy algorithm [27,28] has been improved through reducing the time consumption with respect to the traditional greedy strategy, it is still inefficient when dealing with large-scale networks.

2.2. Centrality-Based Heuristic Algorithms

In contrast with the greedy algorithms that sequentially select the top-k most influential nodes, the centrality-based heuristics directly select the seed nodes based on the network topology. Essentially, the location of nodes can affect the transmission of information in the network. More specifically, the more central a node, the more prominent its importance. A simple method is the selection of the top-k nodes as the most influential individuals based on a given centrality index, for example, the high degree method [29] and K-

s h e l l

decomposition [30]. However, the degree-centrality-based method selects the top-k nodes with the highest degree as the seed nodes, which tend to aggregate in the network and cause an overlap in influence propagation.

To reduce the aggregation of seed nodes selected based on the degree centrality method, Chen et al. [31] suggested that the degree value of the direct neighbors should be reduced by a certain percentage when selecting a node as a seed node in the network. Based on this ideology, single discount and degree discount were proposed. To solve the problem of overlapping neighborhoods, Wang et al. [12] introduced a punishing strategy named degree punishment to perform sequential selection. Because existing heuristics and structural centrality methods may ignore weakly connected nodes in complex networks, Morone et al. [32] proposed a collective optimization method by mapping the influence maximization to the optimal percolation problem. Recently, Yang et al. [33] designed a novel heuristic algorithm to identify the seed nodes in social networks, which makes a trade-off between the traditional local path index and Katz index. An efficient heuristic algorithm was proposed to guarantee the total influence of the seed set and the dispersion between them by Wang et al. [34].

According to the above studies, the heuristic algorithms improve the efficiency of the optimal seed set. However, because the methods are based on the network topology to find the most influential individuals, and users in real-world networks easily tend to congregate, which leads to the problem of overlap in the influence spread.

2.3. Meta-Heuristic Algorithms

Meta-heuristic algorithms that model the shift in the foraging behavior or characteristics of physical phenomena of biological populations have been applied in influence maximization in the last few years. Jiang et al. [35] were the first to map the simulated annealing (SA) algorithm into network topology to optimize a functional influence evaluation model, called expected diffusion value (EDV), to approximate the best seed nodes. The influence of the current node is approximated by calculating the expected propagation value of the candidate node within its direct adjacent neighbors, which skillfully avoids the high computational burden of the Monte Carlo simulation process. Based on the advantages of node distance and single node propagation in the SA algorithm, Zhang et al. [36] proposed the genetic algorithm (GA) to solve the influence maximization problem. The GA algorithm utilizes its multipopulation property to maintain the diversity of solutions.

To enable algorithms to solve practical scenarios, based on a binary strategy, Gong et al. [37] proposed a discrete particle swarm optimization (DPSO) algorithm to solve the IM problem. However, the local search mechanism makes the algorithm susceptible to being trapped in local minima. Based on the discrete evolutionary rules of DPSO, a discrete bat algorithm (DBA), which develops a pool of potentially influential candidate nodes depending on the network topology contribution of the node, was proposed to speed up the convergence by Tang et al. [38] Nevertheless, the evolutionary mechanism based on a random selection strategy leads to instability of the optimal solution. To solve this problem, Han et al. [39] proposed a clique-based discrete bat algorithm (Clq-DBA), which is based on the clique partition of the network.

Recently, more meta-heuristic algorithms [40,41] have been proposed to considerably reduce the time consumption; however, their accuracy of finding the optimal solution is lower than that of the classical greedy algorithm. Therefore, obtaining a good compromise between efficiency and effectiveness is a problem that is worthy of further exploration in the future work. Other works provide the inspiration to build efficient IM algorithms. Zheng et al. [42] constructed a knowledge-based graph-embedding module to extend the versatility of knowledge-based visual question-answering models. Furthermore, the authors studied a multi-layer semantic network [43] and a multiscale relational network [44], which provided us with inspiration to develop more effective strategies for identifying influential nodes.

3. Preliminaries

A social network can be modeled as a graph

G = (V, E)

, where V is the set of all nodes in G with ∣V∣=N, and E is the edge set with ∣E∣=M, representing the social ties among the nodes.

3.1. Influence Maximization

Definition 1.

Given a social network G and a positive integer

k (k \leq N)

, the k nodes in the network are to be selected as the seed set under the specified propagation model with the purpose of maximizing the number of nodes activated after the propagation process, denoted as

σ (S)

.

S^{*} = \underset{S \subseteq V, | S | = k}{arg max} σ (S) .

(1)

where

σ (S)

is the expected number of nodes activated by S in a given graph G, and

S^{*}

is the set of seeds with the maximal influence coverage.

3.2. Influence Estimating Function

Because the Monte Carlo simulating method usually runs at least tens of thousands of times, greedy-based algorithms are inevitably time-consuming. To obtain a more accurate estimate of influence diffusion, as well as to seek an effective and efficient fitness function as an alternative to the computationally expensive Monte Carlo simulation, Gong et al. [37] developed a local influence estimation based on two-hop theory to evaluate the influence within the two-hop area for a given set of nodes, as defined in Equation (2).

\begin{matrix} L I E (S) & = σ_{0} (S) + σ_{1}^{*} (S) + {\tilde{σ}}_{2} (S) \\ = k + \sum_{u \in S, v \in N_{u}^{(0)} ∖ S} P_{u, v} + \sum_{v \in N_{S}^{(1)} ∖ S, w \in N_{v}^{(0)}} P (v) \cdot P_{v, w} . \end{matrix}

(2)

where

σ_{0} (S)

is the size of seed set S;

σ_{1}^{*} (S)

and

{\tilde{σ}}_{2} (S)

are the expected influence spread of one- and two-hop areas for the seed set S, respectively;

N_{u}^{(0)}

and

N_{v}^{(0)}

represent inactive out-neighbors of u and v, respectively;

N_{S}^{(1)}

represents active out-neighbors of S;

P_{u, v}

and

P_{v, w}

denote the probability of node u activating node v and node v activating node w, respectively. Therefore, the problem of selecting the seed set S with the maximum influence is transformed into a problem of optimizing the fitness function.

3.3. Influence Diffusion Model

The IC model is a widely used influence diffusion model that simulates the diffusion of information in social networks. In this model, a node has two states, active or inactive. An inactive node can be influenced to transform into an active node, and if the node is unaffected, the state of the node remains unchanged. When a node u becomes active at time

t - 1

, it has only one chance to activate its inactive neighbor v with activation probability

p_{u v}

at time t. Whether or not node v is successfully activated by node u, node u does not have another chance to continue to activate node v in the following time. If node v is successfully activated, it will have one chance to activate its direct inactive neighbors at time

t + 1

. If no new nodes are activated at T time, the diffusion process is terminated.

In this study, the IC model with propagation probability of p = 0.01 and p = 0.05 were used to verify the performance of the proposed algorithm.

3.4. Learning Automata

Learning automata (LA), proposed by Hashemi et al. [45], can be described as a quadruplet

〈 α, β, γ, P 〉

that performs continuous feedback and adjustment to find the optimal solution through the combination of these four elements, where

α, β, γ,

and P represent an action set, reinforcement process, response set, and probability set, respectively. A mechanism for updating the probability vector P can be described as in Equations (3) and (4).

\begin{matrix} P_{i} = \{\begin{matrix} P_{i} + a_{r} [1 - P_{i}], & γ = 0 \\ (1 - a_{r}) P_{i}, & γ = 0, \end{matrix} \end{matrix}

(3)

\begin{matrix} P_{j} = \{\begin{matrix} (1 - b_{p}) P_{j}, & γ = 1 \\ \frac{b_{p}}{r - 1} + (1 - b_{p}) P_{j}, & γ = 1 . \end{matrix} \end{matrix}

(4)

where

a_{r}

and

b_{p}

represent the reward and penalty parameters in the range [0, 1], respectively.

3.5. Butterfly Optimization Algorithm

Butterflies have survived for millions of years due to their senses. Butterflies use their senses of smell, sight, taste, touch, and hearing to find food and mating partners. Inspired by the above butterfly behavior, a meta-heuristic intelligence algorithm called butterfly optimization algorithm (BOA) was proposed by Arora et al. [15]. The pseudo-code of the original BOA is shown in Algorithm 1.

Algorithm 1 Butterfly optimization algorithm.

1:: Generate initial population of n butterflies
2:: Initialize sensor modality c, power exponent a, and switch probability p
3:: while stopping criteria not met do
4:: for each butterfly in population do
5:: Calculate fragrance using Equation (5)
6:: Find the best butterfly as $g^{*}$
7:: for each butterfly in population do
8:: Generate a random number r from [0, 1]
9:: if $r < p$ then
10:: Update position using Equation (6)
11:: else
12:: Update position using Equation (7)
13:: Update the value of c using Equation (8)
14:: Output $g^{*}$

In the BOA, each butterfly has its own fragrance f, which can be represented by sensory modality c, stimulus intensity I, and power index a. The representation of fragrance and the variation in stimulus intensity are two important issues, because fragrance is relative; that is, it can be perceived by other butterflies. According to Stevens’ power law, c is used to distinguish smell from other modalities. When a butterfly with less I moves toward the butterfly with more I, f more rapidly increases than I. Accordingly, f is allowed to vary with the degree of absorption achieved by the power index parameter a. In the BOA, the fragrance is expressed as a function of the physical intensity of the stimulus as described in Equation (5).

\begin{matrix} f = c I^{a}, \end{matrix}

(5)

where f is the magnitude of the fragrance, which is the intensity of the fragrance perceived by other butterflies; c, I, and a denote sensory modality, stimulus intensity, and power exponent, respectively, where c and a are drawn randomly from [0, 1].

Empirical observation showed that butterflies have very accurate judgment of the location of food or mates. In addition, butterflies can identify different fragrance and sense their intensity. Butterflies produce a fragrance of a certain intensity related to their adaptations; that is, when a butterfly moves from one location to another, its adaptations accordingly change. When the butterfly senses that another butterfly is emitting more fragrance in the area, it moves closer, and this stage is treated as a global search. In this stage, the butterflies move towardsthe optimal butterfly

g^{*}

, which can be expressed by Equation (6) [15].

\begin{matrix} {x_{i}}^{t + 1} = {x_{i}}^{t} + (r^{2} \times g^{*} - {x_{i}}^{t}) \times f_{i}, \end{matrix}

(6)

where

{x_{i}}^{t}

is the solution vector

x_{i}

for the ith butterfly in iteration t;

g^{*}

represents the global optimal solution of the current stage;

f_{i}

and r denote the fragrance of the ith butterfly and random number, respectively, where r is drawn from [0, 1].

In another case, when the butterfly cannot perceive a scent larger than its own, it moves randomly. This stage is called local search, which can be represented as in Equation (7) [15].

\begin{matrix} {x_{i}}^{t + 1} = {x_{i}}^{t} + (r^{2} \times {x_{j}}^{t} - {x_{k}}^{t}) \times f_{i}, \end{matrix}

(7)

where

{x_{j}}^{t}

and

{x_{k}}^{t}

denote the solution vector

x_{j}

for the jth butterfly and the solution vector

x_{k}

for the kth butterfly, respectively. The butterfly swarm can finally converge to the global optimization according to Equations (6) and (7).

The value of c is updated as shown in Equation (8).

\begin{matrix} c^{t + 1} = c^{t} + \frac{0.025}{c^{t} \times M a x I t e r} . \end{matrix}

(8)

where

M a x I t e r

is the maximum iteration number.

4. Proposed Method

According to the previous analysis, the goal of influence maximization is translated into the selection of the top-k influential nodes by optimizing the fitness function. In this study, a learning automata driven discrete butterfly optimization algorithm was developed to optimize the fitness function by redefining the encoding mechanism and evolution rules of the original butterfly optimization algorithm. As the solution space of the influence maximization problem is discrete, the original butterfly optimization algorithm used for continuous problems cannot be directly applied. A degree-based initialization method and local search strategy are used to enhance the search capability of the butterflies. A learning mechanism is employed to guide the butterfly to promising regions. In this section, we give the discrete encoding mechanism, discrete evolutionary rules, the overall framework and details of the proposed LA-DBOA for influence maximization.

4.1. Mapping BOA into Network Topology

4.1.1. Encoding the Butterfly Population

The original BOA was developed to solve continuous optimization problems. However, the targeted networks are discrete, and to address the influence maximization problem in discrete network space, the encoding mechanism of butterfly individuals was redesigned according to the characteristics of the network topology in this study. Each individual in butterfly population is represented by k potential candidate seed nodes, which means that each butterfly is a candidate solution to the IM problem. The location of butterfly i can be encoded as a k-dimensional integer vector

x_{i}

= (

x_{i 1}, x_{i 2}, \dots, x_{i k}

) (where

i = 1, 2, \dots, n

). Each element

x_{i j}

in

x_{i}

(where

j = 1, 2, \dots, k

) is a node in the network G. For example, given a seed set with size k = 5, a butterfly individual

X_{i}

= (1, 7, 17, 3, 20) means that 5 nodes including 1, 7, 17, 3, and 20 are selected as the most influential candidate nodes from the network. The position vector of each butterfly individual is updated according to the following redefined evolutionary rules until the termination condition is reached, and the best butterfly individual is considered as the target seed set.

4.1.2. Discrete Evolutionary Rules

First, the fragrance is redesigned

f_{i}

as in Equation (9).

\begin{matrix} f_{i} = c I_{i}, \end{matrix}

(9)

where c is the scaling factor in the interval [0.6, 1.2].

I_{i}

is redefined, as shown in Equation (10), by the percentage of the sum of the degree of the candidate node set in the total sum of the degree of nodes of n butterflies.

\begin{matrix} I_{i} = \frac{D_{i}}{\sum_{j = 1}^{n} D_{j}}, \end{matrix}

(10)

where

D_{i}

and

D_{j}

represent the sum of the out degrees of the node sets i and j, respectively, as defined in Equation (11); n represents the population size.

\begin{matrix} D_{i} = \sum_{l = 0}^{k} D_{l}, \end{matrix}

(11)

where

D_{i}

represents the sum of the out-degrees of the node set i,

D_{l}

represents the out-degree of the node l and k represent the size of seed set.

Second, the position vectors corresponding to the global exploration and local exploration are redefined as in Equations (12) and (13), respectively.

\begin{matrix} {x_{i}}^{t + 1} \leftarrow {x_{i}}^{t} \oplus H ((a + R_{t}) \times r^{2} \times f_{i} \times (g^{*} \cap {x_{i}}^{t})), \end{matrix}

(12)

\begin{matrix} {x_{i}}^{t + 1} \leftarrow {x_{i}}^{t} \oplus H ((a + R_{t}) \times r^{2} \times f_{i} \times ({x_{j}}^{t} \cap {x_{k}}^{t})) . \end{matrix}

(13)

where

{x_{i}}^{t}

,

{x_{j}}^{t}

, and

{x_{k}}^{t}

represent the positions of butterflies

x_{i}

,

x_{j}

, and

x_{k}

at the tth iteration, respectively;

g^{*}

represents the global optimal solution in the current stage; a and r represent the learning factor and random number drawn uniformly from [0, 1], respectively;

R_{t}

denotes the reciprocal of the number of current iterations, i.e.,

\frac{1}{g}

.

The symbol ∩’ is a logical operator similar to the intersection operation, which is used to check whether there is a common element

x_{i j}

between

x_{i}

and

g^{*}

. Specifically, if there is the same element in

x_{i}

and

g^{*}

, then the element corresponding to the index in the intersection result is set to 1; otherwise, it is set to 0. For example, assume that k = 5,

{x_{i}}^{t}

= (1, 7, 17, 3, 20) and

g^{*}

= (18, 3, 26, 7, 9), the result of corresponding ∩ operation on the two vectors can be expressed as (0, 1, 0, 1, 0).

The threshold function

H (\cdot)

is defined in Equation (14) to format the position vector.

H (X_{i})

is denoted as

H (X_{i}) = (h_{1} (x_{i 1}), h_{2} (x_{i 2}), \dots, h_{k} (x_{i k}

), where

h_{j} (x_{i j}) (j = 1, 2, \dots, k)

is a threshold factor.

\begin{matrix} h_{j} (x_{i j}) = \{\begin{matrix} 1, & i f x_{i j} \geq 0.5 \\ 0, & i f x_{i j} < 0.5, \end{matrix} \end{matrix}

(14)

The logical operator ⨁ is used to determine whether the elements in

{x_{i}}^{t}

need be retained or updated when updating the position. The updating formula for

{x_{i}}^{t + 1}

is shown in Equation (15).

\begin{matrix} {x_{i j}}^{t + 1} = \{\begin{matrix} {x_{i j}}^{t}, & i f h_{j} (x_{i j}) = 1 \\ R e p a l c e ({x_{i j}}^{t}, V), & i f h_{j} (x_{i j}) = 0 . \end{matrix} \end{matrix}

(15)

If

h_{j} (x_{i j})

= 0, the

x_{i j}

in the corresponding position is retained. Otherwise, the function Replace(·) is executed on

x_{i j}

to find an alternative candidate node. Replacing (·) in Equation (15) performs the replacement of the element

{x_{i j}}^{t}

with a random node in the set of nodes V. Meanwhile, it ensures that there are no duplicate nodes in

x_{i}

after the replacement is finished.

4.1.3. Integrating Learning Automata into BOA

As described before, the learning automaton consists of four elements. The specific implementation process is that learning automata chooses an action from the action set

α

with the probability set P; then, the obtained data are subjected to the reinforcement process

β

through the response set

γ

.

In the proposed LA-DBOA, we incorporates the two search strategies from the original BOA algorithm, namely local search and global search, into the action set

α

of LA. Then, the performance of the butterflies is evaluated after the evolution of the action set

α

. This process is the reinforcement process

β

of LA. Immediately following, the performance of each evolved butterfly is compared with that of the butterfly before evolution and responses were stored in the response set

γ

of LA. Finally, due to the implementation of the search strategies in the two action sets

α

with certain probabilities, the probabilities of the two search strategies are stored in the probability set P through the response set

γ

. To exploit the asymmetry of social network connections, that is, the differences in the relationships between each butterfly in the population, LA can identify superior solutions to lead the butterfly population toward promising areas with these four elements working together.

4.2. Framework of LA-DBOA

According to the description in Section 3.2, the influence maximization problem is transformed into an optimization function. Therefore, based on the fitness function, the LA-DBOA algorithm was developed to find the most influential k nodes in the networks in this study. The degree-based method is applied to the initialization of the population, and a local search strategy is used to enhance the search ability of the proposed algorithm. The framework of LA-DBOA is shown in Algorithm 2.

The flowchart of the proposed algorithm is shown in Figure 1. In the proposed algorithm framework, each butterfly in the population is initialized based on the

d e g r e e

-

b a s e d

i n i t i a l i z a t i o n

function (line 1), and the position vector

g^{*}

with the highest fitness value is selected by calculating the fitness value for each butterfly (line 2). Next, the probability vector P is initialized (line 3). The following is the entire

w h i l e

loop (lines 5–26). The position vector

x_{i}

of the butterfly i is stored before the position vector is updated (line 6). The value of

f_{i}

is calculated for the butterflies at the current iteration stage (line 7), one action is chosen according to the probability vector p, and then the position vector is updated (lines 8–13). The fitness value of position vector

x_{i}

of butterfly i is updated (line 14). If the updated fitness value is greater than the previous one, the previously stored position vector of butterfly i is assigned to the present butterfly i, and the local search operation is performed (lines 15–17). After, the reinforcement (response) signal

γ

is computed using Equation (16) (line 18). The probability vector P of the learning automaton is updated (line 19). The global best position vector

g^{*}

is updated with the best position vector (lines 20–21). When the termination condition is satisfied, the algorithm ends. Finally, the best position vector

g^{*}

is output as the seed set.

In this algorithm, two actions are used to more efficiently find the optimal solution, namely global and local search operations, where the LA performs only one action. The global search speeds up the convergence of the proposed algorithm by finding the optimal fitness value

g^{*}

at the current stage, and the local search increases the diversity of the population by randomly finding two butterflies during the current process. Furthermore, the historical optimal value of butterfly i is preserved by

x_{i}^{p r e}

to lead the evolution toward the optimum. Then, the butterflies are guided to fly to promising regions through the learning automata mechanism. Based on action

α

, the reinforcement signal

γ

is generated. The response/feedback value of the probabilistic environment is calculated by Equation (16).

\begin{matrix} γ = \{\begin{matrix} 0, & i f f (x_{i}) > f (x_{i}^{p r e}) \\ 1, & o t h e r w i s e . \end{matrix} \end{matrix}

(16)

Algorithm 2 Framework of LA-DBOA

Require: Graph

G = (V, E)

, butterfly population size n, seed set size k, the number of iterations

g_{m a x}

.

1:: Initialize position vector x← degree-based Initialization (G,n,k)
2:: Select out $g *$ according to the fitness value of each $x_{i}$ of butterfly i
3:: P← (0.5, 0.5)
4:: Initialize iterator g = 0
5:: while $g < g_{m a x}$ do
6:: $x_{i}^{p r e}$ ← $x_{i}$
7:: Calculate $f_{i}$ of butterfly i
8:: for each $x_{i}$ ∈x do
9:: Select an action $α$ using vector p
10:: if $α$ = global search then
11:: Update vector $x_{i}$ according to Equation (6)
12:: else
13:: Update vector $x_{i}$ according to Equation (7)
14:: Update fitness value of $x_{i}$
15:: if $f (x_{i})$ < $f (x_{i}^{p r e})$ then
16:: $x_{i}$ ← $x_{i}^{p r e}$
17:: $x_{i}$ ← $L o c a l S e a r c h$ ( $x_{i}$ )
18:: $γ$ ← Generate LA response $γ$ according to Equation (16)
19:: Update probability vector P according to Equation (3) and Equation (4)
20:: $g^{*}$ $^{'}$ ← Choose best position vector according to fitness value
21:: $g^{*}$ ← Max( $g^{*}$ $^{'}$ , $g^{*}$ )
22:: g← $g + 1$

Ensure: The best position vector

g^{*}

.

4.2.1. Initialization

A degree-based heuristic method is utilized to initialize the butterfly position vector to speed up the convergence of the proposed algorithm, and the detailed procedure is given in Algorithm 3.

Initially, the k nodes with the highest degree in the graph G are selected (line 2). Then, a turbulence operation is used to randomly diversify the position. For each element in the position vector, a random number is generated from the interval (0, 1). If the generated number is greater than 0.5,

x_{i j}

is replaced using the function Replace(·); otherwise,

x_{i j}

remains unchanged, and there are no duplicate elements after the replacement (lines 3–5). Finally, the initialized position vector x is output.

Algorithm 3: Degree-based initialization (G,n,k)

Require: Graph

G = (V, E)

, butterfly population size n, seed set size k.

1:: for each $x_{i}$ ∈x do
2:: $x_{i}$ ← $D e g r e e$ ( $G, k$ )
3:: for each $x_{i j}$ ∈ $x_{i}$ do
4:: if $r a n d (0, 1)$ > 0.5 then
5:: $x_{i j}$ ← $R e p l a c e$ ( $x_{i j}$ , V)

Ensure: The initial position vector x.

4.2.2. Local Search Strategy

To avoid butterflies blindly exploring in the search space, a local search strategy specifically conceived for the network topology is conceived. The location vector X is evaluated to satisfy the symmetry principle, where the global connectivity between nodes in the network decreases as the number of target nodes removed in the social network increases. At each iteration of the algorithm, dominant individuals are retained and inferior individuals are eliminated for the final screening operation of the location vector X. The detailed procedures are described in Algorithm 4.

Algorithm 4: Local search (

x_{i}

)

Require: position vector

x_{i}

.

1:: X← $x_{i}$
2:: T← ${x_{i} ∖$ $x_{i}$ ∩ $g^{*}$ } or ${x_{j} ∖$ $x_{j}$ ∩ $x_{k}$ }
3:: for each $X_{j}$ ∈T do
4:: $X_{j}$ ← $R e p a l c e$ ( $X_{j}$ , $N (X_{j}$ ))
5:: if $f (X_{j})$ > $f (x_{i})$ then
6:: $x_{i}$ ←X
7:: else
8:: X← $x_{i}$
9:: X← $x_{i}$

Ensure: position vector X.

Initially, the position vector

x_{i}

is stored in X (line 1). Different elements of vectors

x_{i}

and

g^{*}

or

x_{j}

and

x_{k}

are deposited in set T (line 2), followed by the whole

f o r

loop (lines 3–8).

X_{j}

is replaced by its neighbors (line 4). The local position vector is updated (lines 5–8). The vector

x_{i}

is assigned to X (lines 9); finally, the position vector X is returned.

4.3. Computational Complexity of LA-DBOA

We analyzed the time complexity of the proposed algorithm. k, n,

\bar{D}

, and

g_{m a x}

denote the size of the node set, the population size, the average node degree, and the number of iterations, respectively. The first part is the preparation work. The complexity of the degree-based initialization is

O (k \cdot n)

(line 1), and it takes

O (k \cdot \bar{D} + n)

time to find the optimal solution

g^{*}

based on the fitness value (line 2). In the entire

w h i l e

loop, the time complexity of

f_{i}

to be computed for each butterfly is

O (k \cdot n)

(line 7), the position vector is updated in

O (k \cdot l o g k)

time (lines 8–13), the time complexity of the fitness value to be updated is

O (k \cdot \bar{D})

(line 14), the time taken for local search is

O (k \cdot \bar{D})

(line 17), and the probability vector is updated in

O (m)

time, where m denotes a constant. The total time complexity of the

w h i l e

loop is

O (g_{m a x} (k \cdot n + n (k \cdot l o g k + 2 k \cdot \bar{D} + m)))

, and the time complexity of the final selection of the optimal

g^{*}

is

O (k \cdot \bar{D} + n)

. Therefore, the worst-case time complexity is

O (g_{m a x} \cdot k \cdot n (l o g k + \bar{D}))

.

5. Experiments and Discussion

5.1. Preparation

To verify the efficiency and effectiveness of the proposed LA-DBOA, six real-world networks were used to test the proposed LA-DBOA for dealing with the influence maximization problem. Table 1 describes the statistical information of the real-world datasets. The node degree distribution of the six networks is shown in Figure 2.

The experiments were mainly divided into three independent phases. In the first phase, the parameter settings of LA-DBOA were tested to ensure its optimal performance. Second, LA-DBOA and DPSO were compared on six networks with respect to their fitness function optimization and running time. In the third stage, the proposed LA-DBOA was compared with the other six state-of-the-art algorithms in terms of influence spread, and statistical tests were performed to illustrate the performance of the proposed LA-DBOA and the other six algorithms on influence propagation.

All the procedures were coded in C++ language and executed on an Intel^® Core

^{TM}

i5-9300H CPU @ 2.40 GHz with 8 G memory in a Windows system.

Discrete particle swarm optimization (DPSO) [37] is a meta-heuristic algorithm that employs a discrete mechanism to solve real-world problems and a local search strategy to accelerate the convergence of the proposed algorithm.
Learning-automata-based discrete particle swarm optimization (LAPSO-IM) [17] incorporates a learning mechanism based on discrete particle swarm optimization and modifies the original evolutionary rules.
Degree-descending search evolution (DDSE) [46] employs a degree-descending search strategy based on the differential evolutionary (DE) algorithm to seed the most influential nodes.
Degree centrality (DC) [29] is a local centrality method based on network topology by selecting the top-k nodes with the highest degree centrality as seed nodes.
Cost-effective lazy forward (CELF) [8] is a classical greedy-based algorithm that employs a lazy forward’strategy and exploits the submodularity property to improve the efficiency of traditional greedy algorithms.
Greedy randomized adaptive search procedure (GRASP) [47] designs a quick solution procedure for the influence maximization problem based on the greedy randomized adaptive search procedure framework.

For the above algorithms, the parameters of each algorithm were set to the original values according the literature.

5.2. Parameter Settings

First, the parameters of the proposed algorithm were experimented to determine the optimal values. The size of the seed set was considered as 50 to set the values of various parameters of LA-DBOA under the IC diffusion model, where propagation probability p = 0.01, and Monte Carlo simulation was set to 10,000 when the influence diffusion was conducted after the optimal seed set was returned by LA-DBOA.

5.2.1. Size of Population and Number of Iterations

To analyze the performance of LA-DBOA under different population size sn and iterations

g_{m a x}

, the learning factor a was set to 2, and the reward and penalty parameter was set to 0.6. Figure 3a,b show the fitness values of LA-DBOA under different values of n and

g_{m a x}

, respectively.

The variation in fitness values under different population sizes in six real-world social networks is shown in Figure 3a. The population size n was set to 5, 10, 20, 50, or 100, and the maximum number of iterations

g_{m a x}

was 150. It can be observed that as the value of n increased, the fitness value increased. However, when n increased from 50 to 100, there was no significant increase in the fitness value. When n = 20, the performance of the proposed algorithm produced better results. So, considering the balance of efficiency and effectiveness of the proposed algorithm, we set the parameter n to 20.

As illustrated by the bar charts in Figure 3b, the fitness value increased with the number of iterations on the six real-world networks. The maximum number of iterations was set to 50, 100, 150, 200, and 250. It can also be seen that when the number of iterations reached 150, the increase in the number of iterations had no significant effect on the increase in the fitness value. Therefore, considering the trade-off between efficiency and effectiveness of the proposed algorithm, the maximum number of iterations

g_{m a x}

was set to 150.

5.2.2. Learning Factor

To set the value of the learning factor, we set the population size n = 20, the number of iterations

g_{m a x}

= 150, and both the reward and penalty parameters to 2. As shown in Figure 4a, the fitness value increased with the learning factor a on the six social networks. When a> 2, the performance improvement was not significant. Accordingly, a was set to 2.

5.2.3. Reward and Penalty Parameters

To set the values of the reward parameter

a_{r}

and the penalty parameter

b_{p}

, we set the population size n = 20, the number of iterations

g_{m a x}

= 150, and the learning factor a = 2. As illustrated by the curves shown in Figure 4b, the fitness values increased as

a_{r}

and

b_{p}

increased. However, the variation in fitness values was not significant when

a_{r}

> 0.6 and

b_{p}

> 0.6. Therefore, both

a_{r}

and

b_{p}

were set to 0.6.

5.3. Experimental Comparison

To further demonstrate the efficiency and effectiveness of the proposed algorithm, we simulated the fitness values of LA-DBOA, DBOA, and DPSO by varying the size of the seed set k and compared their running time to analyze the performance of the proposed algorithm. DBOA is a variant of LA-DBOA without learning automaton. The proposed algorithm was compared with the other six state-of-the-art algorithms in terms of influence spread.

5.3.1. Comparison on Fitness Values and Running Time

Figure 5 and Figure 6 show the results of the fitness values of LA-DBOA compared with those of DBOA and DPSO under the IC model with propagation probabilities of p = 0.01 and p = 0.05, respectively. It can be seen that the fitness values of the four algorithms gradually increased with the size of the seed set. However, in the growth of DPSO compared with that of proposed LA-DBOA, DBOA was worse on all six real-world networks, although Figure 7 shows that the running time of DPSO was the shortest. On the contrary, the proposed algorithm performed the best on all networks, as shown in Figure 5 and Figure 6. In addition, the proposed LA-DBOA outperformed DBOA on all networks, as shown in Figure 5 and Figure 6; their running times were almost the same, as shown in Figure 7.

5.3.2. Comparison of Influence Spread and Running Time

Figure 8 and Figure 9 show the influence propagation of the proposed algorithm and the other six algorithms for spreading probability p = 0.01 and p = 0.05, respectively. It can be seen that the influence spread of LA-DBOA reached satisfactory solutions on the six social networks.

Figure 8 and Figure 9 show that the proposed LA-DBOA and CELF were the two algorithms with the best performance. In addition, the performance of GRASP was slightly lower than that of LA-DBOA. LAPSO-IM and DC achieved better results than DPSO and DDSE, which yielded noticeably inferior results. From Figure 8a,b, it can be observed that the influence coverage on the seven algorithms was comparable. The remaining ten figures show that DDSE had the worst performance among the considered algorithms, which indicated that DDSE had unstable performance in terms of influence spread. The proposed algorithm LA-DBOA achieved results comparable to those of CELF and outperformed GRASP, LAPSO-IM, and DPSO. As shown by the experimental results, the proposed LA-DBOA was close to CELF and outperformed the other five algorithms. It can be seen from Figure 10 that the running time of the proposed LA-DBOA was less than that of GRASP and CELF. Although its running time is longer than that of DDSE, the solution quality of DDSE was much worse. The proposed LA-DBOA can achieve superior results in the efficiency and effectiveness tradeoff.

The experimental results in Section 5.3.1, as shown in Figure 8, Figure 9 and Figure 10, demonstrated that the proposed evolutionary rules of LA-DBOA are feasible and can effectively identify optimal seed nodes. The mechanism of learning automaton is employed to effectively guide the butterflies toward the promising region, and the convergence of the algorithm is accelerated using a local search strategy and a degree-based initialization strategy. As illustrated in the plots, the influence spread values achieved by LA-DBOA are slightly less than those of CELF, mainly due to the selecting strategy of CELF.

The reason why DDSE has the worst performance is mainly because of the weak local search ability of DDSE; it is difficult to maintain the diversity of the population. In particular, the algorithm is easily trapped into premature convergence as the search space increases, which leads to the low-accuracy solutions of DDSE. Furthermore, DC mainly searches the influential nodes based on the topology of the network; yet, the nodes with high degree do not necessarily have high influence on the margined gains can be reduced once the neighbors have been activated by other active nodes. Therefore, the seed set obtained by the DC algorithm does not have a good influence spread. GRASP generates few benefits mainly due to the randomness of selecting seed nodes in the first stage. Moreover, although the evolution rules of DPSO are feasible, the local search makes DPSO easily fall into local optima.

5.4. Statistical Tests

To independently verify the validity of LA-DBOA, rigorous statistical tests were performed to check whether the experimental results returned by the proposed algorithm and the other six algorithms were highly statistically significant on the six real-world networks. In the test, the seed set k for each network was set to 10, 20, 30, 40, and 50, separately; these tests are independent of each other. In addition, five-parameter hypothesis tests were also performed for each k scenario on each network. Wilcoxon rank sum tests [48] were conducted using SPSS software, and the results are presented in Table 2. SPSS is the umbrella term for a series of software products and related services from International Business Machines (IBM) corporation for statistical analysis computing, data mining, predictive analysis and decision support tasks. We set the confidence level

α

to 0.05 for each problem. From the statistical results, one can see that LA-DBOA clearly outperformed the DPSO, GRASP, LAPSO-IM, DDSE, and DC algorithms. More importantly, it almost achieved performance comparable to that of the greedy-based algorithm CELF.

6. Conclusions

In this study, a learning automaton driven discrete butterfly optimization algorithm (LA-DBOA) was designed to solve the influence maximization problem in social networks. First, a novel encoding mechanism and evolutionary rules were proposed based on the original butterfly optimization algorithm. Second, a modified learning automata was adopted to guide the butterfly population toward promising regions and speed up the convergence of the algorithm. Third, a novel local search strategy was proposed to enhance the search performance of LA-DBOA. Extensive experiments showed that the proposed LA-DBOA has superior performance to the DPSO, LAPSO-IM, GRASP, DDSE, and DC algorithms, while achieving almost the same results as the greedy-algorithm-based CELF. The proposed LA-DBOA can obtain more accurate results when making the trade-off between efficiency and effectiveness.

The influence maximization problem remains a hot topic in social network analysis. It will be challenging to further develop effective and efficient algorithms to solve the problem in large-scale networks. Although the proposed LA-DBOA can accurately identify influential nodes, the local search strategy makes the algorithm still time-consuming on large-scale networks. Therefore, improving the local search strategy or proposing new exploitation strategies with high efficiency following the framework of the DBOA is one main focus of future work.

Author Contributions

Conceptualization, H.Z. and J.T.; methodology, H.Z.; software, H.Z.; validation, J.L.; formal analysis, L.Z.; investigation, S.S.; resources, J.T.; data curation, H.Z.; writing—original draft preparation, H.Z.; writing—review and editing, J.T.; visualization, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LQ20F020011 and the National Natural Science Foundations of China under Grant No. 62162040.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jia, T.; Cai, C.; Li, X.; Luo, X.; Zhang, Y.; Yu, X. Dynamical community detection and spatiotemporal analysis in multilayer spatial interaction networks using trajectory data. Int. J. Geogr. Inf. Sci. 2022, 36, 1719–1740. [Google Scholar] [CrossRef]
Meng, F.; Xiao, X.; Wang, J. Rating the crisis of online public opinion using a multi-level index system. arXiv 2022, arXiv:2207.14740. [Google Scholar] [CrossRef]
Ni, Q.; Guo, J.; Wu, W.; Wang, H.; Wu, J. Continuous influence-based community partition for social networks. IEEE Trans. Netw. Sci. Eng. 2021, 9, 1187–1197. [Google Scholar] [CrossRef]
He, Q.; Wang, X.; Lei, Z.; Huang, M.; Cai, Y.; Ma, L. TIFIM: A two-stage iterative framework for influence maximization in social networks. Appl. Math. Comput. 2019, 354, 338–352. [Google Scholar] [CrossRef]
Li, X.; Cheng, X.; Su, S.; Sun, C. Community-based seeds selection algorithm for location aware influence maximization. Neurocomputing 2018, 275, 1601–1613. [Google Scholar] [CrossRef]
Domingos, P.; Richardson, M. Mining the network value of customers. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001; pp. 57–66. [Google Scholar] [CrossRef]
Kempe, D.; Kleinberg, J.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar] [CrossRef] [Green Version]
Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; Glance, N. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–17 August 2007; pp. 420–429. [Google Scholar] [CrossRef] [Green Version]
Goyal, A.; Lu, W.; Lakshmanan, L.V. Celf++ optimizing the greedy algorithm for influence maximization in social networks. In Proceedings of the 20th International Conference Companion on World Wide Web, Hyderabad, India, 28 March–1 April 2011; pp. 47–48. [Google Scholar] [CrossRef]
Liqing, Q.; Chunmei, G.; Shuang, Z.; Xiangbo, T.; Mingjv, Z. TSIM: A two-stage selection algorithm for influence maximization in social networks. IEEE Access 2020, 8, 12084–12095. [Google Scholar] [CrossRef]
Qiu, L.; Tian, X.; Sai, S.; Gu, C. LGIM: A global selection algorithm based on local influence for influence maximization in social networks. IEEE Access 2019, 8, 4318–4328. [Google Scholar] [CrossRef]
Wang, X.; Su, Y.; Zhao, C.; Yi, D. Effective identification of multiple influential spreaders by DegreePunishment. Phys. A Stat. Mech. Its Appl. 2016, 461, 238–247. [Google Scholar] [CrossRef]
Zareie, A.; Sheikhahmadi, A.; Jalili, M. Identification of influential users in social network using gray wolf optimization algorithm. Expert Syst. Appl. 2020, 142, 112971. [Google Scholar] [CrossRef]
Cantini, R.; Marozzo, F.; Mazza, S.; Talia, D.; Trunfio, P. A weighted artificial bee colony algorithm for influence maximization. Online Soc. Netw. Media 2021, 26, 100167. [Google Scholar] [CrossRef]
Arora, S.; Singh, S. Butterfly optimization algorithm: A novel approach for global optimization. Soft Comput. 2019, 23, 715–734. [Google Scholar] [CrossRef]
Yang, Q.; Chen, W.N.; Da Deng, J.; Li, Y.; Gu, T.; Zhang, J. A level-based learning swarm optimizer for large-scale optimization. IEEE Trans. Evol. Comput. 2017, 22, 578–594. [Google Scholar] [CrossRef]
Singh, S.S.; Kumar, A.; Singh, K.; Biswas, B. LAPSO-IM: A learning-based influence maximization approach for social networks. Appl. Soft Comput. 2019, 82, 105554. [Google Scholar] [CrossRef]
Tiwari, A.; Chaturvedi, A. A hybrid feature selection approach based on information theory and dynamic butterfly optimization algorithm for data classification. Expert Syst. Appl. 2022, 196, 116621. [Google Scholar] [CrossRef]
Sundaravadivel, T.; Mahalakshmi, V. Weighted butterfly optimization algorithm with intuitionistic fuzzy Gaussian function based adaptive-neuro fuzzy inference system for COVID-19 prediction. Mater. Today Proc. 2022, 56, 3317–3324. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.; Zhang, G.; Wang, X.; Ni, P.; Zhang, J. Structural identification using improved butterfly optimization algorithm with adaptive sampling test and search space reduction method. Structures 2021, 33, 2121–2139. [Google Scholar] [CrossRef]
Shi, Y. Particle swarm optimization: Developments, applications and resources. In Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No. 01TH8546), Seoul, Republic of Korea, 27–30 May 2001; Volume 1, pp. 81–86. [Google Scholar] [CrossRef]
Karaboga, D.; Basturk, B. A powerful and efficient algorithm for numerical function optimization: Artificial bee colony (ABC) algorithm. J. Glob. Optim. 2007, 39, 459–471. [Google Scholar] [CrossRef]
Yang, X.S. Firefly Algorithm, Levy Flights and Global Optimization; Springer: London, UK, 2010. [Google Scholar]
Feng, Y.; Wang, G.; Deb, S.; Lu, M.; Zhao, X. Monarch butterfly optimization. Neural Comput. Appl. 2019, 31, 1995–2014. [Google Scholar] [CrossRef] [Green Version]
Kundu, S.; Pal, S.K. Deprecation based greedy strategy for target set selection in large scale social networks. Inf. Sci. 2015, 316, 107–122. [Google Scholar] [CrossRef]
Lu, W.X.; Zhou, C.; Wu, J. Big social network influence maximization via recursively estimating influence spread. Knowl.-Based Syst. 2016, 113, 143–154. [Google Scholar] [CrossRef]
Yu, M.; Yang, W.; Wang, W.; Shen, G.; Dong, G.; Gong, L. UGGreedy: Influence maximization for user group in microblogging. Chin. J. Electron. 2016, 25, 241–248. [Google Scholar] [CrossRef]
Wu, G.; Gao, X.; Yan, G.; Chen, G. Parallel greedy algorithm to multiple influence maximization in social network. ACM Trans. Knowl. Discov. Data 2021, 15, 1–21. [Google Scholar] [CrossRef]
Kundu, S.; Murthy, C.; Pal, S.K. A new centrality measure for influence maximization in social networks. In Proceedings of the International Conference on Pattern Recognition and Machine Intelligence, Moscow, Russia, 27 June–1 July 2011; pp. 242–247. [Google Scholar] [CrossRef] [Green Version]
Zhao, Q.; Lu, H.; Gan, Z.; Ma, X. A K-shell decomposition based algorithm for influence maximization. In Proceedings of the International Conference on Web Engineering, Rotterdam, The Netherlands, 23–26 June 2015; pp. 269–283. [Google Scholar] [CrossRef]
Chen, W.; Wang, Y.; Yang, S. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 199–208. [Google Scholar] [CrossRef]
Morone, F.; Makse, H.A. Influence maximization in complex networks through optimal percolation. Nature 2015, 524, 65–68. [Google Scholar] [CrossRef] [Green Version]
Yang, P.L.; Xu, G.Q.; Yu, Q.; Guo, J.W. An adaptive heuristic clustering algorithm for influence maximization in complex networks. Chaos Interdiscip. J. Nonlinear Sci. 2020, 30, 093106. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Zheng, Y.; Shi, X.; Liu, Y. An effective heuristic clustering algorithm for mining multiple critical nodes in complex networks. Phys. A Stat. Mech. Its Appl. 2022, 588, 126535. [Google Scholar] [CrossRef]
Jiang, Q.; Song, G.; Gao, C.; Wang, Y.; Si, W.; Xie, K. Simulated annealing based influence maximization in social networks. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2011. [Google Scholar]
Zhang, K.; Du, H.; Feldman, M.W. Maximizing influence in a social network: Improved results using a genetic algorithm. Phys. A Stat. Mech. Its Appl. 2017, 478, 20–30. [Google Scholar] [CrossRef]
Gong, M.; Yan, J.; Shen, B.; Ma, L.; Cai, Q. Influence maximization in social networks based on discrete particle swarm optimization. Inf. Sci. 2016, 367, 600–614. [Google Scholar] [CrossRef]
Tang, J.; Zhang, R.; Yao, Y.; Zhao, Z.; Wang, P.; Li, H.; Yuan, J. Maximizing the spread of influence via the collective intelligence of discrete bat algorithm. Knowl.-Based Syst. 2018, 160, 88–103. [Google Scholar] [CrossRef]
Han, L.; Li, K.C.; Castiglione, A.; Tang, J.; Huang, H.; Zhou, Q. A clique-based discrete bat algorithm for influence maximization in identifying top-k influential nodes of social networks. Soft Comput. 2021, 25, 8223–8240. [Google Scholar] [CrossRef]
Wang, L.; Ma, L.; Wang, C.; Xie, N.G.; Koh, J.M.; Cheong, K.H. Identifying influential spreaders in social networks through discrete moth-flame optimization. IEEE Trans. Evol. Comput. 2021, 25, 1091–1102. [Google Scholar] [CrossRef]
Li, H.; Zhang, R.; Zhao, Z.; Liu, X.; Yuan, Y. Identification of top-k influential nodes based on discrete crow search algorithm optimization for influence maximization. Appl. Intell. 2021, 51, 7749–7765. [Google Scholar] [CrossRef]
Zheng, W.; Yin, L.; Chen, X.; Ma, Z.; Liu, S.; Yang, B. Knowledge base graph embedding module design for Visual question answering model. Pattern Recognit. 2021, 120, 108153. [Google Scholar] [CrossRef]
Zheng, W.; Liu, X.; Yin, L. Sentence representation method based on multi-layer semantic network. Appl. Sci. 2021, 11, 1316. [Google Scholar] [CrossRef]
Zheng, W.; Tian, X.; Yang, B.; Liu, S.; Ding, Y.; Tian, J.; Yin, L. A few shot classification methods based on multiscale relational networks. Appl. Sci. 2022, 12, 4059. [Google Scholar] [CrossRef]
Hashemi, A.B.; Meybodi, M.R. A note on the learning automata based algorithms for adaptive parameter selection in PSO. Appl. Soft Comput. 2011, 11, 689–705. [Google Scholar] [CrossRef]
Cui, L.; Hu, H.; Yu, S.; Yan, Q.; Ming, Z.; Wen, Z.; Lu, N. DDSE: A novel evolutionary algorithm based on degree-descending search strategy for influence maximization in social networks. J. Netw. Comput. Appl. 2018, 103, 119–130. [Google Scholar] [CrossRef]
Lozano-Osorio, I.; Sánchez-Oro, J.; Duarte, A.; Cordón, Ó. A quick GRASP-based method for influence maximization in social networks. J. Ambient. Intell. Humaniz. Comput. 2021, 1–13. [Google Scholar] [CrossRef]
García, S.; Molina, D.; Lozano, M.; Herrera, F. A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: A case study on the CEC’2005 special session on real parameter optimization. J. Heuristics 2009, 15, 617–644. [Google Scholar] [CrossRef]

Figure 1. The flowchart of LA-DBOA for IM.

Figure 2. Node degree distribution of the six social networks. (a) HepTh. (b) NetHEHT. (c) Europe. (d) Musae. (e) Deezer. (f) Slashdot.

Figure 3. The fitness values under different population sizes n and number of iterations

g_{m a x}

on the six social networks. (a) Population sizes n. (b) Number of iterations

g_{m a x}

.

Figure 3. The fitness values under different population sizes n and number of iterations

g_{m a x}

on the six social networks. (a) Population sizes n. (b) Number of iterations

g_{m a x}

.

Figure 4. Variation in fitness value under different parameters. (a) Effect of learning factor

α

. (b) Effect of reward parameter

a_{r}

and the penalty parameter

b_{p}

.

Figure 4. Variation in fitness value under different parameters. (a) Effect of learning factor

α

. (b) Effect of reward parameter

a_{r}

and the penalty parameter

b_{p}

.

Figure 5. Comparison on fitness value optimization of LA-DBOA, DBOA and DPSO on the six social networks (IC model, p = 0.01). (a) HepTh. (b) NetHEHT. (c) Europe. (d) Musae. (e) Deezer. (f) Slashdot.

Figure 6. Comparison on fitness value optimization of LA-DBOA, DBOA and DPSO on the six social networks (IC model, p = 0.05). (a) HepTh. (b) NetHEHT. (c) Europe. (d) Musae. (e) Deezer. (f) Slashdot.

Figure 7. Comparison on running time of LA-DBOA, DBOA and DPSO on six social networks. (a) HepTh. (b) NetHEHT. (c) Europe. (d) Musae. (e) Deezer. (f) Slashdot.

Figure 8. Comparison of influence spread of the seven algorithms on the six social networks (IC model, p = 0.01). (a) HepTh. (b) NetHEHT. (c) Europe. (d) Musae. (e) Deezer. (f) Slashdot.

Figure 9. Comparison on influence spread of the seven algorithms on the six social networks (IC model, p = 0.05). (a) HepTh. (b) NetHEHT. (c) Europe. (d) Musae. (e) Deezer. (f) Slashdot.

Figure 10. Comparison on running time of LA-DBOA, DDSE, GRASP and CELF on the six social networks. (a) HepTh. (b) NetHEHT. (c) Europe. (d) Musae. (e) Deezer. (f) Slashdot.

Table 1. Statistical characteristics of the six social networks.

ID	Networks	∣V∣	∣E∣	<k>	$\bar{d}$	C
1	HepTh	9877	25,998	5.264	5.945	0.471
2	NetHEHT	15,229	31,376	4.121	5.840	0.499
3	Europe	28,281	92,752	6.559	6.450	0.141
4	Musae	37,700	289,003	15.332	3.246	0.168
5	Deezer	47,538	222,887	9.377	5.341	0.116
6	Slashdot	70,068	358,647	10.237	4.159	0.056

|V| and |E| represent the number of nodes and edges, respectively; <k> is the average degree

\bar{d}

is the average shortest path length; C represents the average clustering coefficient of the targeted network.

Table 2. Statistical results of the Wilcoxon test of the proposed LA-DBOA and the other six algorithms at

α

= 0.05.

Table 2. Statistical results of the Wilcoxon test of the proposed LA-DBOA and the other six algorithms at

α

= 0.05.

LA-DBOA vs.	k	N−	N+	Z	p-Value
DPSO	10	0	6	−2.201	0.028
	20	0	6	−2.201	0.028
	30	0	6	−2.201	0.028
	40	0	6	−2.201	0.028
	50	0	6	−2.201	0.028
LAPSO-IM	10	2	4	−1.153	0.249
	20	0	6	−2.201	0.028
	30	0	6	−2.201	0.028
	40	1	5	−1.992	0.046
	50	0	6	−2.201	0.028
DDSE	10	0	6	−2.201	0.028
	20	0	6	−2.201	0.028
	30	0	6	−2.201	0.028
	40	0	6	−2.201	0.028
	50	0	6	−2.201	0.028
DC	10	2	4	−0.943	0.345
	20	0	6	−2.201	0.028
	30	0	6	−2.201	0.028
	40	0	6	−2.201	0.028
	50	0	6	−2.201	0.028
CELF	10	3	3	−0.524	0.600
	20	4	2	−1.363	0.173
	30	4	2	−1.572	0.116
	40	5	1	−1.992	0.046
	50	4	2	−1.572	0.116
GRASP	10	1	5	−1.992	0.046
	20	2	4	−1.153	0.249
	30	0	6	−2.201	0.028
	40	0	6	−2.201	0.028
	50	0	6	−2.201	0.028

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, J.; Zhu, H.; Lan, J.; Zhang, L.; Song, S. Maximizing the Influence Spread in Social Networks: A Learning-Automata-Driven Discrete Butterfly Optimization Algorithm. Symmetry 2023, 15, 117. https://doi.org/10.3390/sym15010117

AMA Style

Tang J, Zhu H, Lan J, Zhang L, Song S. Maximizing the Influence Spread in Social Networks: A Learning-Automata-Driven Discrete Butterfly Optimization Algorithm. Symmetry. 2023; 15(1):117. https://doi.org/10.3390/sym15010117

Chicago/Turabian Style

Tang, Jianxin, Hongyu Zhu, Jimao Lan, Li Zhang, and Shihui Song. 2023. "Maximizing the Influence Spread in Social Networks: A Learning-Automata-Driven Discrete Butterfly Optimization Algorithm" Symmetry 15, no. 1: 117. https://doi.org/10.3390/sym15010117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Maximizing the Influence Spread in Social Networks: A Learning-Automata-Driven Discrete Butterfly Optimization Algorithm

Abstract

1. Introduction

2. Related Work

2.1. Greedy-Based Algorithms

2.2. Centrality-Based Heuristic Algorithms

2.3. Meta-Heuristic Algorithms

3. Preliminaries

3.1. Influence Maximization

3.2. Influence Estimating Function

3.3. Influence Diffusion Model

3.4. Learning Automata

3.5. Butterfly Optimization Algorithm

4. Proposed Method

4.1. Mapping BOA into Network Topology

4.1.1. Encoding the Butterfly Population

4.1.2. Discrete Evolutionary Rules

4.1.3. Integrating Learning Automata into BOA

4.2. Framework of LA-DBOA

4.2.1. Initialization

4.2.2. Local Search Strategy

4.3. Computational Complexity of LA-DBOA

5. Experiments and Discussion

5.1. Preparation

5.2. Parameter Settings

5.2.1. Size of Population and Number of Iterations

5.2.2. Learning Factor

5.2.3. Reward and Penalty Parameters

5.3. Experimental Comparison

5.3.1. Comparison on Fitness Values and Running Time

5.3.2. Comparison of Influence Spread and Running Time

5.4. Statistical Tests

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI