1. Introduction
Nowadays, facilitating the interactions of information among people through social networks has become one of the most primary services of communicating media, such as Twitter, Facebook, TikTok, etc. An increasing amount of work is being conducted by using social networks in different scenarios, for example, dynamical community detection [
1], evaluation of online public opinion [
2], community partition [
3], etc. Word-of-mouth is a popular method widely used in viral marketing [
4,
5]. An example for toys in viral marketing is product promotion, where the campaign manager provides the product for free or at a discount to some influential individuals such as adopters, expecting them to recommend the product to their friends, and friends then recommend it to their friends by word of mouth to make the maximum profit. This process was considered as the influence maximization (IM) problem by Domings and Richardson [
6] from the network perspective.
IM is aimed to select a small group of influential nodes as the seed set to maximize the spread of the influence coverage in social networks. Kempe et al. [
7] proved that influence maximization is an NP-hard problem, and then they developed a greedy method based on a hill-climbing strategy to solve this problem. Although greedy-based algorithms can yield reliable solutions, tens of thousands Monte Carlo simulations are required for the approximate accuracy of each evaluation, which is highly time-consuming. Thus, many efforts have been made to enhance the efficiency of the greedy algorithm, such as the well-known CELF [
8] and CELF++ [
9]. However, the greedy-based algorithms still require considerable computation resources, which significantly limits the scalability of the methods in large-scale networks.
In recent years, many researchers have been drawn to the problem on how to improve the efficiency of influence maximization algorithms, especially in large-scale social networks. Heuristic algorithms [
10,
11,
12] based on network topology have been widely adopted to solve the influence maximization problem. However, such approaches can usually more efficiently obtain solutions than greedy algorithms by sacrificing high accuracy. Meta-heuristic algorithms [
13,
14,
15], which simulate the foraging behavior of biological populations or the transformation of physical phenomenon features, were recently utilized to tackle influence maximization. Such methods replace the time-consuming Monte Carlo mechanism with a targeted evaluating function, and the efficiency is accelerated compared with that of greedy algorithms; meanwhile, the solution accuracy is significantly improved compared with that of centrality-based heuristics through discrete evolutionary mechanisms. However, meta-heuristics tend to excessively pursue one metric, such as efficiency, but lead to inferiority in the other indicators, especially in large-scale networks. Furthermore, some methods [
16,
17] based on learning mechanisms have been proposed to guide the evolution of the algorithms to achieve enhanced performance.
The butterfly optimization algorithm (BOA) [
15] is an effective swarm intelligence algorithm in various fields. Tiwari et al. [
18] combined a dynamic butterfly optimization algorithm with a mutual information-based feature interaction maximization scheme for selecting the optimal feature subset in feature selection. Sundaravadivel et al. [
19] designed a weighted butterfly optimization algorithm with an intuitionistic fuzzy Gaussian function based adaptive-neuro fuzzy inference system classifier to predict the number of COVID-19 disease cases. An adaptive sampling test and search space reduction method was developed for the structure identification problem by Zhou et al. [
20]. In addition, the experimental results reported in the literature [
15] show that the performance of BOA is better than that of particle swarm optimization (PSO) [
21], the artificial bee colony (ABC) algorithm [
22], the firefly algorithm (FA) [
23], and monarch butterfly optimization (MBO) [
24] when handling continuous optimization problems. In order to solve the influence maximization problem with a tradeoff between efficiency and effectiveness in real-world networks, a learning-automaton-driven discrete butterfly optimization algorithm (LA-DBOA) was developed in this study. To the best of our knowledge, this is the fist time that the BOA was discretized into a network topology to solve the IM problem. Our study’s main contributions are as follows:
A novel encoding method and discrete evolution rules are proposed by modifying the original butterfly optimization algorithm to solve the influence maximization in real-world networks;
A modified learning automata is adopted to guide the butterfly population toward promising areas, and a novel local search strategy is employed to enhance the search capability of LA-DBOA;
Experimental results on six real networks under the independent cascade model show that LA-DBOA outperforms other algorithms and achieves comparable results to CELF in terms of the influence spread.
The rest of this paper is organized as follows:
Section 2 describes the literature related to influence maximization. The definitions of the influence maximization problem, influence spreading models, and influence estimation function are given in
Section 3. Details of the proposed algorithm are shown in
Section 4.
Section 5 provides the experimental results and analysis. Finally, the conclusions and future work are summarized in
Section 6.
3. Preliminaries
A social network can be modeled as a graph , where V is the set of all nodes in G with ∣V∣=N, and E is the edge set with ∣E∣=M, representing the social ties among the nodes.
3.1. Influence Maximization
Definition 1. Given a social network G and a positive integer , the k nodes in the network are to be selected as the seed set under the specified propagation model with the purpose of maximizing the number of nodes activated after the propagation process, denoted as .where is the expected number of nodes activated by S in a given graph G, and is the set of seeds with the maximal influence coverage. 3.2. Influence Estimating Function
Because the Monte Carlo simulating method usually runs at least tens of thousands of times, greedy-based algorithms are inevitably time-consuming. To obtain a more accurate estimate of influence diffusion, as well as to seek an effective and efficient fitness function as an alternative to the computationally expensive Monte Carlo simulation, Gong et al. [
37] developed a local influence estimation based on two-hop theory to evaluate the influence within the two-hop area for a given set of nodes, as defined in Equation (
2).
where
is the size of seed set
S;
and
are the expected influence spread of one- and two-hop areas for the seed set
S, respectively;
and
represent inactive out-neighbors of
u and
v, respectively;
represents active out-neighbors of
S;
and
denote the probability of node
u activating node
v and node
v activating node
w, respectively. Therefore, the problem of selecting the seed set
S with the maximum influence is transformed into a problem of optimizing the fitness function.
3.3. Influence Diffusion Model
The IC model is a widely used influence diffusion model that simulates the diffusion of information in social networks. In this model, a node has two states, active or inactive. An inactive node can be influenced to transform into an active node, and if the node is unaffected, the state of the node remains unchanged. When a node u becomes active at time , it has only one chance to activate its inactive neighbor v with activation probability at time t. Whether or not node v is successfully activated by node u, node u does not have another chance to continue to activate node v in the following time. If node v is successfully activated, it will have one chance to activate its direct inactive neighbors at time . If no new nodes are activated at T time, the diffusion process is terminated.
In this study, the IC model with propagation probability of p = 0.01 and p = 0.05 were used to verify the performance of the proposed algorithm.
3.4. Learning Automata
Learning automata (LA), proposed by Hashemi et al. [
45], can be described as a quadruplet
that performs continuous feedback and adjustment to find the optimal solution through the combination of these four elements, where
and
P represent an action set, reinforcement process, response set, and probability set, respectively. A mechanism for updating the probability vector
P can be described as in Equations (
3) and (
4).
where
and
represent the reward and penalty parameters in the range [0, 1], respectively.
3.5. Butterfly Optimization Algorithm
Butterflies have survived for millions of years due to their senses. Butterflies use their senses of smell, sight, taste, touch, and hearing to find food and mating partners. Inspired by the above butterfly behavior, a meta-heuristic intelligence algorithm called butterfly optimization algorithm (BOA) was proposed by Arora et al. [
15]. The pseudo-code of the original BOA is shown in Algorithm 1.
Algorithm 1 Butterfly optimization algorithm. |
- 1:
Generate initial population of n butterflies - 2:
Initialize sensor modality c, power exponent a, and switch probability p - 3:
while stopping criteria not met do - 4:
for each butterfly in population do - 5:
Calculate fragrance using Equation ( 5) - 6:
Find the best butterfly as - 7:
for each butterfly in population do - 8:
Generate a random number r from [0, 1] - 9:
if then - 10:
Update position using Equation ( 6) - 11:
else - 12:
Update position using Equation ( 7) - 13:
Update the value of c using Equation ( 8) - 14:
Output
|
In the BOA, each butterfly has its own fragrance
f, which can be represented by sensory modality
c, stimulus intensity
I, and power index
a. The representation of fragrance and the variation in stimulus intensity are two important issues, because fragrance is relative; that is, it can be perceived by other butterflies. According to Stevens’ power law,
c is used to distinguish smell from other modalities. When a butterfly with less
I moves toward the butterfly with more
I,
f more rapidly increases than
I. Accordingly,
f is allowed to vary with the degree of absorption achieved by the power index parameter
a. In the BOA, the fragrance is expressed as a function of the physical intensity of the stimulus as described in Equation (
5).
where
f is the magnitude of the fragrance, which is the intensity of the fragrance perceived by other butterflies;
c,
I, and
a denote sensory modality, stimulus intensity, and power exponent, respectively, where
c and
a are drawn randomly from [0, 1].
Empirical observation showed that butterflies have very accurate judgment of the location of food or mates. In addition, butterflies can identify different fragrance and sense their intensity. Butterflies produce a fragrance of a certain intensity related to their adaptations; that is, when a butterfly moves from one location to another, its adaptations accordingly change. When the butterfly senses that another butterfly is emitting more fragrance in the area, it moves closer, and this stage is treated as a global search. In this stage, the butterflies move towardsthe optimal butterfly
, which can be expressed by Equation (
6) [
15].
where
is the solution vector
for the
ith butterfly in iteration
t;
represents the global optimal solution of the current stage;
and
r denote the fragrance of the
ith butterfly and random number, respectively, where
r is drawn from [0, 1].
In another case, when the butterfly cannot perceive a scent larger than its own, it moves randomly. This stage is called local search, which can be represented as in Equation (
7) [
15].
where
and
denote the solution vector
for the
jth butterfly and the solution vector
for the
kth butterfly, respectively. The butterfly swarm can finally converge to the global optimization according to Equations (
6) and (
7).
The value of
c is updated as shown in Equation (
8).
where
is the maximum iteration number.
4. Proposed Method
According to the previous analysis, the goal of influence maximization is translated into the selection of the top-k influential nodes by optimizing the fitness function. In this study, a learning automata driven discrete butterfly optimization algorithm was developed to optimize the fitness function by redefining the encoding mechanism and evolution rules of the original butterfly optimization algorithm. As the solution space of the influence maximization problem is discrete, the original butterfly optimization algorithm used for continuous problems cannot be directly applied. A degree-based initialization method and local search strategy are used to enhance the search capability of the butterflies. A learning mechanism is employed to guide the butterfly to promising regions. In this section, we give the discrete encoding mechanism, discrete evolutionary rules, the overall framework and details of the proposed LA-DBOA for influence maximization.
4.1. Mapping BOA into Network Topology
4.1.1. Encoding the Butterfly Population
The original BOA was developed to solve continuous optimization problems. However, the targeted networks are discrete, and to address the influence maximization problem in discrete network space, the encoding mechanism of butterfly individuals was redesigned according to the characteristics of the network topology in this study. Each individual in butterfly population is represented by k potential candidate seed nodes, which means that each butterfly is a candidate solution to the IM problem. The location of butterfly i can be encoded as a k-dimensional integer vector = () (where ). Each element in (where ) is a node in the network G. For example, given a seed set with size k = 5, a butterfly individual = (1, 7, 17, 3, 20) means that 5 nodes including 1, 7, 17, 3, and 20 are selected as the most influential candidate nodes from the network. The position vector of each butterfly individual is updated according to the following redefined evolutionary rules until the termination condition is reached, and the best butterfly individual is considered as the target seed set.
4.1.2. Discrete Evolutionary Rules
First, the fragrance is redesigned
as in Equation (
9).
where
c is the scaling factor in the interval [0.6, 1.2].
is redefined, as shown in Equation (
10), by the percentage of the sum of the degree of the candidate node set in the total sum of the degree of nodes of
n butterflies.
where
and
represent the sum of the out degrees of the node sets
i and
j, respectively, as defined in Equation (
11);
n represents the population size.
where
represents the sum of the out-degrees of the node set
i,
represents the out-degree of the node
l and
k represent the size of seed set.
Second, the position vectors corresponding to the global exploration and local exploration are redefined as in Equations (
12) and (
13), respectively.
where
,
, and
represent the positions of butterflies
,
, and
at the
tth iteration, respectively;
represents the global optimal solution in the current stage;
a and
r represent the learning factor and random number drawn uniformly from [0, 1], respectively;
denotes the reciprocal of the number of current iterations, i.e.,
.
The symbol ∩’ is a logical operator similar to the intersection operation, which is used to check whether there is a common element between and . Specifically, if there is the same element in and , then the element corresponding to the index in the intersection result is set to 1; otherwise, it is set to 0. For example, assume that k = 5, = (1, 7, 17, 3, 20) and = (18, 3, 26, 7, 9), the result of corresponding ∩ operation on the two vectors can be expressed as (0, 1, 0, 1, 0).
The threshold function
is defined in Equation (
14) to format the position vector.
is denoted as
), where
is a threshold factor.
The logical operator ⨁ is used to determine whether the elements in
need be retained or updated when updating the position. The updating formula for
is shown in Equation (
15).
If
= 0, the
in the corresponding position is retained. Otherwise, the function Replace(·) is executed on
to find an alternative candidate node. Replacing (·) in Equation (
15) performs the replacement of the element
with a random node in the set of nodes
V. Meanwhile, it ensures that there are no duplicate nodes in
after the replacement is finished.
4.1.3. Integrating Learning Automata into BOA
As described before, the learning automaton consists of four elements. The specific implementation process is that learning automata chooses an action from the action set with the probability set P; then, the obtained data are subjected to the reinforcement process through the response set .
In the proposed LA-DBOA, we incorporates the two search strategies from the original BOA algorithm, namely local search and global search, into the action set of LA. Then, the performance of the butterflies is evaluated after the evolution of the action set . This process is the reinforcement process of LA. Immediately following, the performance of each evolved butterfly is compared with that of the butterfly before evolution and responses were stored in the response set of LA. Finally, due to the implementation of the search strategies in the two action sets with certain probabilities, the probabilities of the two search strategies are stored in the probability set P through the response set . To exploit the asymmetry of social network connections, that is, the differences in the relationships between each butterfly in the population, LA can identify superior solutions to lead the butterfly population toward promising areas with these four elements working together.
4.2. Framework of LA-DBOA
According to the description in
Section 3.2, the influence maximization problem is transformed into an optimization function. Therefore, based on the fitness function, the LA-DBOA algorithm was developed to find the most influential
k nodes in the networks in this study. The degree-based method is applied to the initialization of the population, and a local search strategy is used to enhance the search ability of the proposed algorithm. The framework of LA-DBOA is shown in Algorithm 2.
The flowchart of the proposed algorithm is shown in
Figure 1. In the proposed algorithm framework, each butterfly in the population is initialized based on the
-
function (line 1), and the position vector
with the highest fitness value is selected by calculating the fitness value for each butterfly (line 2). Next, the probability vector
P is initialized (line 3). The following is the entire
loop (lines 5–26). The position vector
of the butterfly
i is stored before the position vector is updated (line 6). The value of
is calculated for the butterflies at the current iteration stage (line 7), one action is chosen according to the probability vector
p, and then the position vector is updated (lines 8–13). The fitness value of position vector
of butterfly
i is updated (line 14). If the updated fitness value is greater than the previous one, the previously stored position vector of butterfly
i is assigned to the present butterfly
i, and the local search operation is performed (lines 15–17). After, the reinforcement (response) signal
is computed using Equation (
16) (line 18). The probability vector
P of the learning automaton is updated (line 19). The global best position vector
is updated with the best position vector (lines 20–21). When the termination condition is satisfied, the algorithm ends. Finally, the best position vector
is output as the seed set.
In this algorithm, two actions are used to more efficiently find the optimal solution, namely global and local search operations, where the LA performs only one action. The global search speeds up the convergence of the proposed algorithm by finding the optimal fitness value
at the current stage, and the local search increases the diversity of the population by randomly finding two butterflies during the current process. Furthermore, the historical optimal value of butterfly
i is preserved by
to lead the evolution toward the optimum. Then, the butterflies are guided to fly to promising regions through the learning automata mechanism. Based on action
, the reinforcement signal
is generated. The response/feedback value of the probabilistic environment is calculated by Equation (
16).
Algorithm 2 Framework of LA-DBOA |
Require: Graph , butterfly population size n, seed set size k, the number of iterations .
- 1:
Initialize position vector x← degree-based Initialization (G,n,k) - 2:
Select out according to the fitness value of each of butterfly i - 3:
P← (0.5, 0.5) - 4:
Initialize iterator g = 0 - 5:
whiledo - 6:
← - 7:
Calculate of butterfly i - 8:
for each ∈x do - 9:
Select an action using vector p - 10:
if = global search then - 11:
Update vector according to Equation ( 6) - 12:
else - 13:
Update vector according to Equation ( 7) - 14:
Update fitness value of - 15:
if < then - 16:
← - 17:
←() - 18:
← Generate LA response according to Equation ( 16) - 19:
Update probability vector P according to Equation ( 3) and Equation ( 4) - 20:
← Choose best position vector according to fitness value - 21:
← Max(, ) - 22:
g← Ensure: The best position vector .
|
4.2.1. Initialization
A degree-based heuristic method is utilized to initialize the butterfly position vector to speed up the convergence of the proposed algorithm, and the detailed procedure is given in Algorithm 3.
Initially, the
k nodes with the highest degree in the graph
G are selected (line 2). Then, a turbulence operation is used to randomly diversify the position. For each element in the position vector, a random number is generated from the interval (0, 1). If the generated number is greater than 0.5,
is replaced using the function Replace(·); otherwise,
remains unchanged, and there are no duplicate elements after the replacement (lines 3–5). Finally, the initialized position vector
x is output.
Algorithm 3: Degree-based initialization (G,n,k) |
Require: Graph , butterfly population size n, seed set size k.
- 1:
for each ∈x do - 2:
←() - 3:
for each ∈ do - 4:
if > 0.5 then - 5:
←(, V) Ensure: The initial position vector x.
|
4.2.2. Local Search Strategy
To avoid butterflies blindly exploring in the search space, a local search strategy specifically conceived for the network topology is conceived. The location vector X is evaluated to satisfy the symmetry principle, where the global connectivity between nodes in the network decreases as the number of target nodes removed in the social network increases. At each iteration of the algorithm, dominant individuals are retained and inferior individuals are eliminated for the final screening operation of the location vector X. The detailed procedures are described in Algorithm 4.
Algorithm 4: Local search () |
Require: position vector .
- 1:
X← - 2:
T←∩} or ∩} - 3:
for each ∈T do - 4:
←(, )) - 5:
if > then - 6:
←X - 7:
else - 8:
X← - 9:
X← Ensure: position vector X.
|
Initially, the position vector is stored in X (line 1). Different elements of vectors and or and are deposited in set T (line 2), followed by the whole loop (lines 3–8). is replaced by its neighbors (line 4). The local position vector is updated (lines 5–8). The vector is assigned to X (lines 9); finally, the position vector X is returned.
4.3. Computational Complexity of LA-DBOA
We analyzed the time complexity of the proposed algorithm. k, n, , and denote the size of the node set, the population size, the average node degree, and the number of iterations, respectively. The first part is the preparation work. The complexity of the degree-based initialization is (line 1), and it takes time to find the optimal solution based on the fitness value (line 2). In the entire loop, the time complexity of to be computed for each butterfly is (line 7), the position vector is updated in time (lines 8–13), the time complexity of the fitness value to be updated is (line 14), the time taken for local search is (line 17), and the probability vector is updated in time, where m denotes a constant. The total time complexity of the loop is , and the time complexity of the final selection of the optimal is . Therefore, the worst-case time complexity is .
5. Experiments and Discussion
5.1. Preparation
To verify the efficiency and effectiveness of the proposed LA-DBOA, six real-world networks were used to test the proposed LA-DBOA for dealing with the influence maximization problem.
Table 1 describes the statistical information of the real-world datasets. The node degree distribution of the six networks is shown in
Figure 2.
The experiments were mainly divided into three independent phases. In the first phase, the parameter settings of LA-DBOA were tested to ensure its optimal performance. Second, LA-DBOA and DPSO were compared on six networks with respect to their fitness function optimization and running time. In the third stage, the proposed LA-DBOA was compared with the other six state-of-the-art algorithms in terms of influence spread, and statistical tests were performed to illustrate the performance of the proposed LA-DBOA and the other six algorithms on influence propagation.
All the procedures were coded in C++ language and executed on an Intel® Core i5-9300H CPU @ 2.40 GHz with 8 G memory in a Windows system.
Discrete particle swarm optimization (DPSO) [
37] is a meta-heuristic algorithm that employs a discrete mechanism to solve real-world problems and a local search strategy to accelerate the convergence of the proposed algorithm.
Learning-automata-based discrete particle swarm optimization (LAPSO-IM) [
17] incorporates a learning mechanism based on discrete particle swarm optimization and modifies the original evolutionary rules.
Degree-descending search evolution (DDSE) [
46] employs a degree-descending search strategy based on the differential evolutionary (DE) algorithm to seed the most influential nodes.
Degree centrality (DC) [
29] is a local centrality method based on network topology by selecting the top-
k nodes with the highest degree centrality as seed nodes.
Cost-effective lazy forward (CELF) [
8] is a classical greedy-based algorithm that employs a lazy forward’strategy and exploits the submodularity property to improve the efficiency of traditional greedy algorithms.
Greedy randomized adaptive search procedure (GRASP) [
47] designs a quick solution procedure for the influence maximization problem based on the greedy randomized adaptive search procedure framework.
For the above algorithms, the parameters of each algorithm were set to the original values according the literature.
5.2. Parameter Settings
First, the parameters of the proposed algorithm were experimented to determine the optimal values. The size of the seed set was considered as 50 to set the values of various parameters of LA-DBOA under the IC diffusion model, where propagation probability p = 0.01, and Monte Carlo simulation was set to 10,000 when the influence diffusion was conducted after the optimal seed set was returned by LA-DBOA.
5.2.1. Size of Population and Number of Iterations
To analyze the performance of LA-DBOA under different population size s
n and iterations
, the learning factor
a was set to 2, and the reward and penalty parameter was set to 0.6.
Figure 3a,b show the fitness values of LA-DBOA under different values of
n and
, respectively.
The variation in fitness values under different population sizes in six real-world social networks is shown in
Figure 3a. The population size
n was set to 5, 10, 20, 50, or 100, and the maximum number of iterations
was 150. It can be observed that as the value of
n increased, the fitness value increased. However, when
n increased from 50 to 100, there was no significant increase in the fitness value. When
n = 20, the performance of the proposed algorithm produced better results. So, considering the balance of efficiency and effectiveness of the proposed algorithm, we set the parameter
n to 20.
As illustrated by the bar charts in
Figure 3b, the fitness value increased with the number of iterations on the six real-world networks. The maximum number of iterations was set to 50, 100, 150, 200, and 250. It can also be seen that when the number of iterations reached 150, the increase in the number of iterations had no significant effect on the increase in the fitness value. Therefore, considering the trade-off between efficiency and effectiveness of the proposed algorithm, the maximum number of iterations
was set to 150.
5.2.2. Learning Factor
To set the value of the learning factor, we set the population size
n = 20, the number of iterations
= 150, and both the reward and penalty parameters to 2. As shown in
Figure 4a, the fitness value increased with the learning factor
a on the six social networks. When
a> 2, the performance improvement was not significant. Accordingly,
a was set to 2.
5.2.3. Reward and Penalty Parameters
To set the values of the reward parameter
and the penalty parameter
, we set the population size
n = 20, the number of iterations
= 150, and the learning factor
a = 2. As illustrated by the curves shown in
Figure 4b, the fitness values increased as
and
increased. However, the variation in fitness values was not significant when
> 0.6 and
> 0.6. Therefore, both
and
were set to 0.6.
5.3. Experimental Comparison
To further demonstrate the efficiency and effectiveness of the proposed algorithm, we simulated the fitness values of LA-DBOA, DBOA, and DPSO by varying the size of the seed set k and compared their running time to analyze the performance of the proposed algorithm. DBOA is a variant of LA-DBOA without learning automaton. The proposed algorithm was compared with the other six state-of-the-art algorithms in terms of influence spread.
5.3.1. Comparison on Fitness Values and Running Time
Figure 5 and
Figure 6 show the results of the fitness values of LA-DBOA compared with those of DBOA and DPSO under the IC model with propagation probabilities of
p = 0.01 and
p = 0.05, respectively. It can be seen that the fitness values of the four algorithms gradually increased with the size of the seed set. However, in the growth of DPSO compared with that of proposed LA-DBOA, DBOA was worse on all six real-world networks, although
Figure 7 shows that the running time of DPSO was the shortest. On the contrary, the proposed algorithm performed the best on all networks, as shown in
Figure 5 and
Figure 6. In addition, the proposed LA-DBOA outperformed DBOA on all networks, as shown in
Figure 5 and
Figure 6; their running times were almost the same, as shown in
Figure 7.
5.3.2. Comparison of Influence Spread and Running Time
Figure 8 and
Figure 9 show the influence propagation of the proposed algorithm and the other six algorithms for spreading probability
p = 0.01 and
p = 0.05, respectively. It can be seen that the influence spread of LA-DBOA reached satisfactory solutions on the six social networks.
Figure 8 and
Figure 9 show that the proposed LA-DBOA and CELF were the two algorithms with the best performance. In addition, the performance of GRASP was slightly lower than that of LA-DBOA. LAPSO-IM and DC achieved better results than DPSO and DDSE, which yielded noticeably inferior results. From
Figure 8a,b, it can be observed that the influence coverage on the seven algorithms was comparable. The remaining ten figures show that DDSE had the worst performance among the considered algorithms, which indicated that DDSE had unstable performance in terms of influence spread. The proposed algorithm LA-DBOA achieved results comparable to those of CELF and outperformed GRASP, LAPSO-IM, and DPSO. As shown by the experimental results, the proposed LA-DBOA was close to CELF and outperformed the other five algorithms. It can be seen from
Figure 10 that the running time of the proposed LA-DBOA was less than that of GRASP and CELF. Although its running time is longer than that of DDSE, the solution quality of DDSE was much worse. The proposed LA-DBOA can achieve superior results in the efficiency and effectiveness tradeoff.
The experimental results in
Section 5.3.1, as shown in
Figure 8,
Figure 9 and
Figure 10, demonstrated that the proposed evolutionary rules of LA-DBOA are feasible and can effectively identify optimal seed nodes. The mechanism of learning automaton is employed to effectively guide the butterflies toward the promising region, and the convergence of the algorithm is accelerated using a local search strategy and a degree-based initialization strategy. As illustrated in the plots, the influence spread values achieved by LA-DBOA are slightly less than those of CELF, mainly due to the selecting strategy of CELF.
The reason why DDSE has the worst performance is mainly because of the weak local search ability of DDSE; it is difficult to maintain the diversity of the population. In particular, the algorithm is easily trapped into premature convergence as the search space increases, which leads to the low-accuracy solutions of DDSE. Furthermore, DC mainly searches the influential nodes based on the topology of the network; yet, the nodes with high degree do not necessarily have high influence on the margined gains can be reduced once the neighbors have been activated by other active nodes. Therefore, the seed set obtained by the DC algorithm does not have a good influence spread. GRASP generates few benefits mainly due to the randomness of selecting seed nodes in the first stage. Moreover, although the evolution rules of DPSO are feasible, the local search makes DPSO easily fall into local optima.
5.4. Statistical Tests
To independently verify the validity of LA-DBOA, rigorous statistical tests were performed to check whether the experimental results returned by the proposed algorithm and the other six algorithms were highly statistically significant on the six real-world networks. In the test, the seed set
k for each network was set to 10, 20, 30, 40, and 50, separately; these tests are independent of each other. In addition, five-parameter hypothesis tests were also performed for each
k scenario on each network. Wilcoxon rank sum tests [
48] were conducted using SPSS software, and the results are presented in
Table 2. SPSS is the umbrella term for a series of software products and related services from International Business Machines (IBM) corporation for statistical analysis computing, data mining, predictive analysis and decision support tasks. We set the confidence level
to 0.05 for each problem. From the statistical results, one can see that LA-DBOA clearly outperformed the DPSO, GRASP, LAPSO-IM, DDSE, and DC algorithms. More importantly, it almost achieved performance comparable to that of the greedy-based algorithm CELF.
6. Conclusions
In this study, a learning automaton driven discrete butterfly optimization algorithm (LA-DBOA) was designed to solve the influence maximization problem in social networks. First, a novel encoding mechanism and evolutionary rules were proposed based on the original butterfly optimization algorithm. Second, a modified learning automata was adopted to guide the butterfly population toward promising regions and speed up the convergence of the algorithm. Third, a novel local search strategy was proposed to enhance the search performance of LA-DBOA. Extensive experiments showed that the proposed LA-DBOA has superior performance to the DPSO, LAPSO-IM, GRASP, DDSE, and DC algorithms, while achieving almost the same results as the greedy-algorithm-based CELF. The proposed LA-DBOA can obtain more accurate results when making the trade-off between efficiency and effectiveness.
The influence maximization problem remains a hot topic in social network analysis. It will be challenging to further develop effective and efficient algorithms to solve the problem in large-scale networks. Although the proposed LA-DBOA can accurately identify influential nodes, the local search strategy makes the algorithm still time-consuming on large-scale networks. Therefore, improving the local search strategy or proposing new exploitation strategies with high efficiency following the framework of the DBOA is one main focus of future work.