A Landscape-Aware Discrete Particle Swarm Optimization for the Influence Maximization Problem in Social Networks

Chai, Baoqiang; Fu, Jiaqiang; Zhang, Ruisheng; Tang, Jianxin

doi:10.3390/sym17030435

Open AccessArticle

A Landscape-Aware Discrete Particle Swarm Optimization for the Influence Maximization Problem in Social Networks

¹

School of Information science and Engineering, Lanzhou University, Lanzhou 730000, China

²

School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(3), 435; https://doi.org/10.3390/sym17030435

Submission received: 11 February 2025 / Revised: 2 March 2025 / Accepted: 6 March 2025 / Published: 14 March 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

:

Influence maximization (IM) is a pivotal challenge in social network analysis, which aims to identify a subset of key nodes that can maximize the information spread across networks. Traditional methods often sacrifice solution accuracy for spreading efficiency, while meta-heuristic approaches face limitations in escaping local optima and balancing exploration and exploitation. To address such challenges, this paper introduces a landscape-aware discrete particle swarm optimization (LA-DPSO) to solve the IM problem. The proposed algorithm employs a population partitioning strategy based on a fitness distance correlation index to enhance population diversity. For the two partitioned subpopulations, a global evolutionary mechanism and a variable neighborhood search mechanism are designed to make a symmetrical balance between the exploration and exploitation. The fitness landscape entropy is introduced to detect the local optima and prevent the population from premature convergence during the evolution. Experiments conducted on six real-world social networks demonstrate that the proposed LA-DPSO achieves an average performance improvement of 16% compared to state-of-the-art methods while exhibiting excellent scalability across diverse network types.

Keywords:

social network; influence maximization; fitness landscape; particle swarm optimization

1. Introduction

Nowadays, social networks have emerged as central platforms for information dissemination and daily interaction, characterized by real-time updates, strong user interactions, and globalization. Influence maximization (IM) [1] is a crucial problem in social network analysis, with the objective of identifying a subset of key nodes to maximize the spread of intriguing information, innovative products, and novel ideas. This concept is widely applied in fields including information diffusion, marketing strategies, transportation optimization, resource distribution, political campaigns, and epidemic containment [2,3,4].

Influence maximization was first modeled as a discrete combinatorial optimization problem by Kempe et al. [5], and it was proven to be an NP-hard problem. To solve the problem more efficiently, several methods based on the seminal greedy algorithm have been proposed, including the CELF and CELF++ [6,7]. Such algorithms rely heavily on Monte Carlo simulations to find approximate optimal solutions but still face high computational costs, especially in dealing with large-scale networks. To overcome the bottlenecks, researchers proposed topological-based heuristic methods such as degree centrality, eigenvector centrality, and gravity centrality to solve the IM problem [8,9]. While such methods can avoid the Monte Carlo simulations and show satisfying efficiency, they fail to provide guaranteed solution accuracy easily in dealing with diverse network structures. Consequently, researchers turned to meta-heuristic approaches to achieve more precise solutions [10]. However, due to the unscalability of the heterodox fitness functions and the derivative search strategies, the solution quality always turns out to be unsatisfying. Therefore, developing effective and efficient algorithms to solve the influence maximization problem remains a hot topic in the area.

Several meta-heuristics have been proposed in the last few years to solve the influence maximization problem, such as the discrete crow search algorithm, discretized Harris’ Hawks optimization algorithm, and phased evaluation-enhanced approach [11,12,13]. Compared to traditional methods, such optimizations show the potential to enhance the solution quality through diverse evolutionary and local search strategies. However, these strategies focus solely on the population evolution rather than the characteristics of the IM problem itself, which leads to the population being trapped into local optima easily during the evolutionary process. Such bottlenecks motivate the researchers to probe the problem itself: can the distribution of the solution space for the IM problem, or the characteristics of candidate solutions, be systematically probed in advance to guide the design of effective algorithms? Building on the aforementioned need, fitness landscape analysis (FLA) emerges as a critical methodology to quantify the patterns of solution distribution, gradient dynamics, and local optima structures within the solution space, thereby offering prior knowledge for algorithm design [14]. Existing research indicates that the fitness landscape has been widely used to guide algorithm design and enable the optimization strategies to explore the search space more effectively, thereby improving the performance and convergence speed of the optimizations [15].

Motivated by the vision, in the context of IM, we make the attempt for the first time to utilize the FLA technique to reveal the “landscape” characteristics of information propagation pathways (e.g., flat regions, plateaus, or sharp peaks), such that the algorithms can dynamically balance the exploration and exploitation. A landscape-aware discrete particle swarm optimization (LA-DPSO) is proposed for the influence maximization problem. Specifically, the algorithm comprises a discrete particle swarm optimization and a landscape guiding operation. The fitness landscape entropy is adopted to guide the evolutionary process of the population effectively. Then, the fitness distance correlation coefficient is redesigned to partition the population into two subpopulations for further evolution. A modified variable neighborhood search (VNS) mechanism is applied to the elite subpopulation to enhance the solution quality, while a global search mechanism is used for the ordinary subpopulation to thoroughly explore the solution space. The main contributions of this paper are summarized as follows:

The fitness landscape entropy is introduced for the first time to quantify whether the population is trapped into local optima to enhance the evolutionary efficiency.
A fitness distance correlation coefficient is conceived for the first time to divide the population into an ordinary subpopulation and an elite subpopulation to improve the diversity of the population.
The VNS mechanism is designed for the IM problem specially to optimize the search capabilities of the elite population, thereby improving the algorithm’s performance.

The rest of the paper is structured as follows: Section 2 discusses related works on the influence maximization problem. Section 3 presents the necessary preliminaries for the proposed algorithm. Section 4 shows the implementation details of the LA-DPSO. Experimental results and analysis are given in Section 5. Finally, the work is summarized with future research directions.

2. Related Work

2.1. Greedy Algorithms

Greedy algorithms are widely used to solve the IM problem due to their ability to provide reliable approximate solutions. Such algorithms iteratively add nodes with the highest marginal gain to the seed set until k nodes are selected. In the seminal study, Kempe et al. [5] proposed a hill-climbing greedy method with an approximation guarantee of

(1 - \frac{1}{e} - ϵ)

of the optimal solution. However, the algorithm is inefficient because estimating each candidate node requires extensive Monte Carlo simulations. To improve the efficiency of the classical greedy algorithm, Leskovec et al. [6] explored the properties of submodular functions and introduced the Cost-Effective Lazy Forward selection (CELF) algorithm, which achieved up to 700 times the acceleration. To improve the efficiency in large-scale networks, Lu et al. [16] proposed a probability-based recursive method that selects the node with the highest current evaluation as the seed node by using a greedy strategy. Subsequently, Lozano-Osorio et al. [17] developed a fast greedy randomized adaptive search algorithm by integrating a stochastic two-phase greedy construction technique with an intelligent neighborhood search strategy.

Greedy strategy-based methods can achieve high accuracy but are inefficient. Although recent studies have improved the running time, such algorithms still struggle with excessive time consumption and even sacrifice the algorithms’ accuracy when applied to large-scale social networks.

2.2. Heuristic Algorithms

In consideration of the time-consuming nature of the greedy strategies, some researchers turned to topology-based heuristic methods to solve the IM problem. Such methods evaluate the node importance by using topological features of the networks, such as degree centrality, eigenvector centrality, and gravity centrality [1,18]. However, such methods often focus merely on a single attribute of the network but neglect to consider integrating other topological structures. To address this limitation, Yang et al. [19] proposed a local similarity metric that dynamically adjusts according to the network topological features. It can achieve a good balance between the time complexity and solution accuracy. To alleviate clustering phenomena when selecting seed nodes, Li et al. [20] introduced a dynamic algorithm based on cohesive entropy, which adopts the node topological similarity and relative entropy, instead of Euclidean distance, to identify seed nodes. For large-scale networks, Kianian and Rostamnia [21] developed an efficient heuristic independent path algorithm, which leverages node characteristics and pruning techniques to approximate the influence spread. Recently, Zhu et al. [22] proposed a semi-local centrality method that integrates the concepts of local average shortest path and extended neighborhood to enhance efficiency in handling large networks.

Heuristic methods demonstrate significant efficiency in large-scale networks, but their accuracy is always inferior to that of greedy strategies. Furthermore, such methods are less stable due to their dependence on network topology.

2.3. Community-Based Algorithms

Social networks often exhibit community structures, where nodes within a community are densely connected while connections between communities are relatively sparse. This facilitates easier information spread within communities. To identify influential nodes in the network, Rao and Chowdary [23] proposed an efficient community-based influence maximization model, which identifies seed nodes with maximum influence through a two-stage process that filters high-quality communities and candidate nodes. To detect overlapping communities, Bouyer et al. [24] proposed a fast overlapping community-based influence maximization algorithm, which utilizes the global diffusion probability to generate candidate nodes while reducing the time costs by minimizing the search space. Complementary to these approaches, Liu et al. [25] proposed a community-based backward generation network that combines community detection with backward generation networks and employs a graph traversal method to select nodes within each community. In the fair influence maximization problem, Ma et al. [26] developed a community-based evolutionary algorithm. This algorithm identifies potential nodes by using a community node selection strategy that considers community size and node attributes.

The community-based methods perform well in large-scale networks but sometimes fail to achieve accurate community partitioning, leading to the misallocation of key nodes and unsatisfying influence propagation. Meanwhile, it should be noted that such methods prioritize influence propagation within communities while neglecting cross-community influence, potentially resulting in locally optimal.

2.4. Machine Learning-Based Algorithms

In recent years, with the rapid development of machine learning, new methods based on graph representation learning have emerged to solve the IM problem. Traditional methods often struggle to maintain efficiency while improving the solution quality. To solve this problem, Li et al. [27] developed a framework called PIANO, which combines graph embedding with reinforcement learning. This framework trains a Q-function to predict the marginal influence gain of nodes through their representations and selects the top k nodes based on descending Q-values. Building on these developments, Kumar et al. [28] employed Struc2Vec graph embeddings with graph neural network (GNN) regression to enable scalable influence prediction. Their subsequent work [29] introduced Graph-LSTM (GLSTM), integrating transfer learning mechanisms with centrality-based feature engineering for enhanced seed identification. However, this heavily depends on the accuracy of training labels for solution quality. To further improve the accuracy, Tang et al. [30] proposed a new GNN-based framework that integrates graph convolutional networks with graph transformers for capturing the network information to select seed nodes. Recently, Li et al. [31] introduced an improved K-shell algorithm with heterogeneous cross-comparison to solve the IM problem. The model utilizes encoder and graph convolutional networks to learn users’ potential representations of historical content preferences and topological structure. Meanwhile, it defines the heterogeneous similarity and the heterogeneous information entropy to measure users’ influence ability.

Machine learning-based methods can always promise satisfying solution quality without compromising the efficiency. However, when faced with insufficient or unbalanced training data, the model exhibits a tendency towards overfitting, resulting in suboptimal performance and constrained robustness.

2.5. Meta-Heuristic Algorithms

Meta-heuristic algorithms simulate natural phenomena to find approximate optimal solutions in the search space by using general evolutionary strategies. Gong et al. [10] proposed a local influence evaluation function to approximate the expected influence spread within the two-hop neighborhood of a candidate seed set, and they designed a discrete particle swarm optimization by redefining the particle velocity and position updating rules to solve the IM problem for the first time. However, this algorithm tends to fall into local optima easily. To solve this problem, Tang et al. [32] developed a discrete bat algorithm with probabilistic greedy local search and a candidate pool to enhance algorithm accuracy.

In dealing with the IM problem in large-scale networks, Biswas et al. [33] designed a two-stage differential evolution algorithm, which narrows the search space by selecting candidate nodes based on node scoring, followed by a differential evolution being adopted to select the optimal seed set. To improve seed set accuracy, Zhu et al. [13] introduced a phased evaluation enhancement method that combines an evolutionary algorithm with random range partitioning and a simulated annealing strategy to search for the optimal solution. To address the dynamic and large-scale nature of social networks, Li et al. [34] proposed an adaptive agent-based evolutionary method that dynamically adjusts candidate solutions based on a genetic algorithm. Considering the potential impact of unreliable transmission on social network connections, Wang et al. [35] proposed a discrete moth–flame optimization algorithm, which optimizes the seed set through local crossover and mutation schemes. Subsequently, Khatri et al. [12] introduced a discretized Harris hawks optimization that incorporates a new neighborhood detection strategy and an adaptive random population initialization method, demonstrating enhanced performance in networks with community structures.

The meta-heuristic approaches can achieve remarkable accuracy while maintaining acceptable efficiency. However, as real-world network scales increase rapidly, such algorithms face the challenge of balancing efficiency and effectiveness in dealing with the IM problem. Meanwhile, most researchers design discrete evolutionary mechanisms in an intuitive way, which often neglects the problem characteristics and the evolutionary process of the population. Such oversights can lead to local optima or premature convergence easily. Therefore, in this paper, we make an attempt to employ the fitness landscape techniques to redesign evolutionary mechanisms from the perspective of probing into the potential solution distribution to further balance the time efficiency and performance.

3. Preliminaries

3.1. Influence Maximization Problem

The influence maximization problem seeks to identify a subset of k key users within a social network to maximize the information spread [1]. By selecting an initial set of seed nodes, the goal is to maximize the number of nodes activated through the propagation model. It can be mathematically modeled as Equation (1):

S^{*} = \underset{| S | = k}{arg max} σ (S),

(1)

where S is a candidate seed set and

σ (S)

is the estimated influence spread of S.

S^{*}

is the optimal seed set.

3.2. Diffusion Models

The independent cascade model (IC) is a widely probabilistic model used to describe information propagation in social networks [5]. In this model, when a node is activated, it attempts to activate its inactive neighbors with a certain probability. These attempts are independent and occur only once. If a neighboring node is successfully activated, it attempts to activate its neighbor nodes in the next round. This iterative process continues until no new nodes are activated.

3.3. Influence Estimating Function

Expected diffusion value (

E D V

) [36] is a metric used to estimate the influence propagation of given nodes based on the IC model. This function approximates the influence spread by calculating the number of one-hop neighbor nodes that can be influenced by the seed set S. The influence estimating function is defined as Equation (2):

\begin{matrix} E D V (S) = k + \sum_{i \in N_{S}^{(1)} ∖ S} (1 - {(1 - p)}^{τ (i)}), \end{matrix}

(2)

where k represents the number of nodes in the candidate seed set S,

N_{S}^{(1)}

denotes the direct neighbors (one-hop neighbors) of the seed set, and p is the activation probability.

τ (i)

represents the number of connections between node i and the seed nodes in S.

3.4. Fitness Landscape Metrics

The fitness distance correlation (

F D C

) [37] assesses problem complexity by examining the correlation between individual fitness and distance within the search space. It reflects certain characteristics of the optimization problem and has been incorporated into various optimization algorithms. The formulas for calculating the

F D C

value

r_{F D}

are defined in Equations (3) and (4).

r_{F D} = \frac{c_{F D}}{S_{D} S_{F}},

(3)

c_{F D} = \frac{1}{n} \sum_{i = 1}^{n} (f^{(i)} - \bar{f}) (d^{(i)} - \bar{d}),

(4)

where n is the total number of individuals and

f^{(i)}

and

d^{(i)}

are the fitness value and the distance from the best individual of the ith individual, respectively.

S_{D}

and

S_{F}

are the standard deviations of the distance and fitness values, respectively.

\bar{f}

represents the average fitness, and

\bar{d}

is the average distance.

The entropy metric [38] generates sample points through random walk sampling and estimates the landscape roughness by using the entropy of the probability distribution of rugged and non-rugged elements in the sequence. By calculating the entropy of the fitness value distribution in the solution space, it aids in designing optimization algorithms and guiding population evolution. The calculation proceeds through three steps:

(1): Perform a random walk on the landscape to generate a time series of fitness values, ${f_{t}}_{t = 0}^{n}$ .
(2): The resulting time series are converted into a string $S (ϵ)$ according to a threshold $ϵ$ , calculated by Equation (5). The parameter $ϵ$ is a real number that determines the accuracy of the string computation.

$S_{i} = Ψ_{f_{t}} (i, ϵ) = \{\begin{matrix} - 1, & if f_{i} - f_{i - 1} < - ϵ \\ 0, & if | f_{i} - f_{i - 1} | \leq ϵ \\ 1, & if f_{i} - f_{i - 1} > ϵ . \end{matrix}$

(5)
(3): Calculate the entropy metric based on the $S (ϵ)$ according to Equation (6):

$\{\begin{matrix} H (ϵ) & = - \sum_{p \neq q} P_{[p q]} {log}_{6} P_{[p q]} \\ P_{[p q]} & = \frac{n_{[p q]}}{n}, \end{matrix}$

(6)

where p and q are elements in the set ${- 1, 0, 1}$ and $n_{[p q]}$ is the number of sub-blocks $p q$ in the string $S (ϵ)$ , where $p \neq q$ .

3.5. Particle Swarm Optimization

Particle swarm optimization (PSO) is a population-based optimization algorithm [39]. It emulates the social behavior of bird flocks to find optimal solutions through cooperation and competition among individuals, known as particles. In PSO, each particle represents a potential solution, with its position and velocity continuously updated in the search space. Particles adjust their trajectories based on individual and swarm experiences to seek the global optimum. The updating formulas are defined in Equation (7):

\{\begin{matrix} v_{i}^{t + 1} & = w \cdot v_{i}^{t} + c_{1} \cdot r_{1} \cdot (p_{i}^{t} - x_{i}^{t}) + c_{2} \cdot r_{2} \cdot (g^{t} - x_{i}^{t}) \\ x_{i}^{t + 1} & = x_{i}^{t} + v_{i}^{t + 1}, \end{matrix}

(7)

where

v_{i}^{t}

represents the velocity of particle i at time t, w denotes the inertia weight, and

c_{1}

and

c_{2}

are the learning rates, while

r_{1}

and

r_{2}

are random numbers. Additionally,

p_{i}^{t}

is the historical best position of particle i until the t-th generation,

g^{t}

is the global best position, and

x_{i}^{t}

is the position of particle i at time t.

4. Proposed Algorithm

In this section, the landscape-aware discrete particle swarm optimization proposed for the influence maximization problem is described in detail. Figure 1 shows the LA-DPSO flowchart, which includes the following key operations:

Operation 1: Initialize the velocity vector V and the position vector X for each individual in the population, as well as the local best position vector

P b e s t

and the global best position vector

G b e s t

, according to Algorithm 1.

Operation 2: Update the velocity vector V and the position vector X of the individuals in the population according to Equation (8) and Equation (10), respectively.

Operation 3: Compare each particle’s current fitness with its

P b e s t

fitness. If the current fitness is superior, update

P b e s t

with the current position. Then, identify the particle with the best fitness among all

P b e s t

values. If the fitness exceeds the current

G b e s t

, update

G b e s t

with the particle’s position.

Operation 4: Enhance the global best position vector

G b e s t

according to the refining search strategy (Algorithm 2).

Operation 5: Calculate the entropy metric H of the local best position vector according to Equation (6) to determine whether it is trapped in a local optimum.

Operation 6: If the H value in the current generation is different from the previous generation, continue the iteration.

Operation 7: If the H value in the current generation does not change from the previous generation, then implement the fitness landscape-aware evolution. Calculate the correlation coefficient r of the population by Equation (11), based on which of the population is divided into an ordinary subpopulation and an elite subpopulation.

Operation 8: A new population is generated after applying a global search mechanism (Algorithm 3) to the ordinary subpopulation and a variable neighborhood search (Algorithm 4) to the elite subpopulation. The operations

2 \sim 8

are performed iteratively until the stopping criterion, i.e., the maximal iteration, is met.

Algorithm 1 Initialization

(G, p o p, k)

Input: Graph

G = (V, E)

, the population size

p o p

, and the seed set size k.
Output: Initial the velocity vector V, the position vector X, the local best

P b e s t

, and the global best

G b e s t

.

1:: for $i = 1$ to $p o p$ do
2:: $X [i] \leftarrow Select k highest degree nodes .$
3:: for $j = 1$ to k do
4:: if $rand (0, 1) < 0.5$ then
5:: $X [i] [j] \leftarrow replace (X [i] [j], V ∖ X [i])$ ;
6:: $P b e s t [i] [j] \leftarrow X [i] [j]$ ;
7:: $V [i] [j] \leftarrow 0$ ;
8:: Evaluate the fitness value based on $E D V (P b e s t [i])$ .
9:: $G b e s t \leftarrow Select the best fitness value from P b e s t$ .
10:: return $V, X, P b e s t, G b e s t$

Algorithm 2 Refining_Search_Mechanism

(G, S)

Input: Graph

G = (V, E)

, and the candidate seed set S.
Output: Enhanced seed set S.

1:: for each $u \in S$ do
2:: for each $v \in G . n e i g h b o r (u)$ do;
3:: $S^{'} \leftarrow (S ∖ {u}) \cup {v}$ ;
4:: if $E D V (S^{'}) > E D V (S)$ then
5:: $S \leftarrow S^{'}$ ;
6:: return S

Algorithm 3 Global_Search_Mechanism

(G, k)

Input: Graph

G = (V, E)

, and the size of seed set k.
Output: Node set S.

1:: $S \leftarrow \emptyset$ ;
2:: Sort nodes V by eigenvector centrality in descending order to form the list of nodes $n o d e l i s t$ .
3:: for $i = 1$ to k do
4:: $u p_b o u n d \leftarrow i + k$ ;
5:: Select $n o d e$ randomly from $n o d e l i s t [1 : u p_b o u n d]$ ;
6:: $S \leftarrow S \cup {n o d e}$ ;
7:: return S

Algorithm 4 Variable_Neighbourhood_Search

(G, S)

Input: Graph

G = (V, E)

, and the candidate seed set S
Output: Improved seed set

S^{*}

1:: $S^{*} \leftarrow S$ ;
2:: for each $u \in S$ do
3:: $S 1 \leftarrow S_{1} (G, u) ∖ S$ ;
4:: $S 2 \leftarrow S_{2} (G, u) ∖ (S 1 \cup S)$ ;
5:: for each $n o d e \in S 1$ do
6:: $S^{'} \leftarrow (S^{*} ∖ {u}) \cup {n o d e}$ ;
7:: if $E D V (S^{'}) > E D V (S^{*})$ then
8:: $S^{*} \leftarrow S^{'}$ ;
9:: return $S^{*}$ ;
10:: for each $n o d e \in S 2$ do
11:: $S^{'} \leftarrow (S^{*} ∖ {u}) \cup {n o d e}$ ;
12:: if $E D V (S^{'}) > E D V (S^{*})$ then
13:: $S^{*} \leftarrow S^{'}$ ;
14:: return $S^{*}$ ;
15:: return $S^{*}$

4.1. Initialization

In the initialization, the velocity vectors of all particles in the population are set to zero, indicating an initial velocity of zero. For the position vector of each individual, the k nodes with the highest degree values are selected as the initial position. To enhance diversity, a perturbation factor that assigns a random probability in

[0, 1]

is applied to each element of the position vector

X [i]

. When the perturbation threshold exceeds 0.5, replace

X [i] [j]

with a random node from the set V, which ensures that no duplicate nodes exist in the modified particle. The local best position is initialized with each individual’s initial position, while the global best position is determined by the local optimum, with the highest fitness value being calculated by the

E D V

function. The specific pseudocode is shown in Algorithm 1.

4.2. Evolutionary Rules

4.2.1. Updating Mechanism for Velocity

In the particle swarm optimization, the velocity vector is crucial for guiding particles toward promising regions. Inspired by the updating mechanism of DPSO [10], the velocity updating rule is defined, as shown in Equation (8).

V_{i} \leftarrow F (ω V_{i} + c_{1} r_{1} (X_{i} \cap P b e s t_{i}) + c_{2} r_{2} (X_{i} \cap G b e s t)),

(8)

where w is the inertia weight, which controls the magnitude of velocity updating. The terms

r_{1}

and

r_{2}

are random numbers between 0 and 1, introducing randomness into the velocity updating. The learning factors

c_{1}

and

c_{2}

determine the effect of the personal best and global best positions, respectively. Adjusting these parameters allows the algorithm to balance its preference between local exploitation and global exploration.

The operation “∩” is defined similarly to an intersection. This operator helps identify the common nodes between two candidate seed sets, which are likely to have higher influence.

For example, considering

A = {1, 3, 5, 7}

as the

X_{i}

and

B = {3, 4, 5, 6}

as the

P b e s t_{i}

, the intersection

A \cap B

yields an

0 - 1

vector

(1, 0, 0, 1)

, where 0 represents that the corresponding node in the set is a potential influencial node, while 1 indicates that the corresponding node is a less influencial node. The argument to

F (\cdot)

is a velocity vector. Assuming that the argument is

X_{i}

, the function

F (X_{i})

is expressed as

F (X_{i}) = (h_{1} (x_{i 1}), h_{2} (x_{i 2}), \dots, h_{k} (x_{i k}))

, where

h_{j} (x_{i j})

(for

j = 1, 2, \dots, k

) is defined as a threshold function, as shown in Equation (9).

h_{j} (x_{i j}) = \{\begin{matrix} 0, & x_{i j} < 2 \\ 1, & otherwise . \end{matrix}

(9)

4.2.2. Updating Rule for Position

The position vector determines a specific solution within the search space and directly impacts the quality of the optimization results. When solving the influence maximization problem with the PSO, the updating mechanism for the position vector is redefined, as shown in Equation (10).

X_{i} \leftarrow X_{i} \oplus V_{i},

(10)

where “⊕” indicates that if the element

v_{i j}

in

V_{i}

is 0, the corresponding position

x_{i j}

in

X_{i}

remains unchanged. Otherwise, a node is randomly selected from the set of candidates not present in

X_{i}

to replace

x_{i j}

.

4.3. Refining Search Mechanism

To further optimize the current global optimal solution, the candidate seed set is refined through local search to generate an improved seed set. The specific pseudocode is shown in Algorithm 2. The process involves generating a new candidate seed set by replacing each node u in the set S iteratively with its neighbor v. If the fitness value of the new generated set is superior, then the newly generated set is preserved for the next iteration. This strategy enhances the expected influence diffusion by gradually increasing the influence of the seed set.

4.4. Fitness Landscape-Aware Evolution

4.4.1. Population Partition

Inspired by the fitness distance correlation coefficient in the fitness landscape metric, we propose a novel discrete metric for the influence maximization problem, termed the fitness correlation coefficient

r_{F D}

. The specific calculation formula is shown in Equation (11).

\{\begin{matrix} r_{F D} = d_{i} {(f_{i} - f_{G b e s t})}^{2} \\ d_{i} = \sum_{j = 1}^{k} (x [j] \neq G b e s t [j]), \end{matrix}

(11)

where x represents individuals in the population and

G b e s t

is the global best position vector. The fitness value of an individual x is represented by

f_{i}

, while the global optimal fitness value is indicated by

f_{G b e s t}

. The distance between each individual and the global best is denoted by

d_{i}

.

The correlation coefficient r is used to divide the population, with smaller values indicating closer proximity to the global optimum. This symmetric partitioning strategy ensures that both elite and ordinary subpopulations maintain balanced contributions to the exploration and exploitation, mimicking the symmetry principles observed in natural swarm behaviors. Individuals near the global optimum are classified into the elite subpopulation, while those farther away are assigned to the ordinary subpopulation. This division allows the elite subpopulation to utilize the VNS mechanism for further optimization, whereas the ordinary subpopulation performs the global search strategy, thus enhancing the overall algorithm’s search efficiency and quality.

4.4.2. Global Search Mechanism

To thoroughly explore the solution space, a global exploration mechanism is designed, as detailed in Algorithm 3. Eigenvector centrality assesses a node’s importance by considering the centrality of its neighbors. Nodes are then ranked in descending order based on the eigenvector centrality to create a list. To ensure diversity, a node is randomly selected from the node set to join the set S, and this process is repeated k times. Then, the current ordinary individual is replaced with the newly generated candidate set.

4.4.3. Variable Neighbourhood Search

The VNS method [40] exhibits strong robustness and stability by dynamically altering the searching neighborhood structures, thereby enhancing search capability and avoiding local optima. To improve the accuracy of the elite subpopulation, a modified VNS algorithm is proposed, as detailed in Algorithm 4. The first-order neighborhood consists of the other nodes within the same clique as the present node in the elite individual, which are utilized to help the population quickly escape the local optima. The second-order neighborhood excludes the first-order neighborhood and is defined within the node’s k-core layer to broaden the search scope. If a superior solution is found in the first-order neighborhood, the search stops. Otherwise, it continues to search in the second-order neighborhood.

4.5. Complexity Analysis

In this section, we analyze the time complexity of LA-DPSO, where

T_{m a x}

,

p o p

, k, and

\hat{D}

represent the maximum number of iterations, population size, seed set size, and the maximum degree of the graph, respectively. The time complexity of the initialization phase is

O (p o p \cdot k \cdot \hat{D})

. In the update phase, the velocity update has a time complexity of

O (p o p \cdot k \cdot log k)

, and the position update is

O (p o p \cdot k)

. The time complexity for updating the local best is

O (p o p \cdot k \cdot \hat{D})

, and updating the global best is also

O (p o p \cdot k \cdot \hat{D})

. Thus, the total time complexity of the update phase is

O (p o p \cdot k \cdot \hat{D})

. The time complexity of the refine search for the global best is

O (p o p \cdot | V |)

. In the population division strategy stage, the worst-case time complexity for the ordinary population is

O (\frac{p o p}{2} \cdot | V | \cdot log | V |)

, and for the elite population, it is

O (\frac{p o p}{2} \cdot k \cdot {\hat{D}}^{2})

. According to the simplification rules of big O notation, the final time complexity can be expressed as

O (T_{m a x} \cdot p o p \cdot max (k \cdot \hat{D}, | V |) \cdot \hat{D})

.

5. Experimental Results and Analysis

5.1. Datasets and Baselines

In this study, six real-world networks were selected based on network scale, density, and type. A summary of these datasets is provided in Table 1, where

| V |

represents the number of nodes,

| E |

represents the number of edges,

d_{a v e}

represents the average degree, and

c_{a v e}

represents the average clustering coefficient. These datasets can be downloaded from the SNAP (http://snap.stanford.edu/data/, accessed on 1 January 2025).

To evaluate the performance of the proposed LA-DPSO, five state-of-the-art algorithms were selected for comparison.

CELF [6]: proposed in 2007, this is a greedy algorithm that prioritizes the selection of nodes with the highest marginal gain by conducting tens of thousands Monte-Carlo simulations on each node in the iterative rounds.
DPSO [10]: proposed in 2016, this is a swarm intelligence-based meta-heuristic algorithm that simulates the foraging behavior of bird flocks to identify the global best solution.
TS-VA-MODE [33]: proposed in 2022, this solves the influence maximization problem by reducing the number of candidate nodes through a multi-criteria decision-making approach. It integrates an improved differential evolution algorithm with multiple search operators to enhance performance.
DCGM++ [9]: proposed in 2023, this algorithm measures the importance of a node by calculating the degree of the node and the average degree of its neighboring nodes to select the greater influence node in the whole network.
ENIMNR [41]: proposed in 2024, this algorithm reduces the search space by combining shell decomposition with node representation, while employing a deep learning-based node embedding technique to detect key nodes.

In the experiment, the parameter settings of the above algorithms follow the configurations from their original papers. All the procedures were encoded in Python and executed on an Intel^® Xeon^® 5218R CPU @ 2.10 GHz with 64 G memory in a Windows system.

5.2. Parameter Settings

Parameter settings play important roles in helping the algorithm performs well in addressing the IM problem. However, trying to optimize the best setting for one parameter while fixing other parameters with predefined values is empirical and imprecise. To optimize the balance between solution quality and algorithm runtime, the orthogonal experiment method was employed to determine the optimal parameter configuration. This approach can effectively analyze the impact of each factor through simplified experimental combinations. The proposed algorithm mainly involves five parameters, each with five levels, making the

L_{25}

orthogonal array design suitable for the experiment. The seed set size k and activation probability p are set to 30 and 0.05, respectively. Table 2 and Table 3 present the orthogonal experiment results, specifically showing the mean and standard deviation of the influence spread of each algorithm on the six networks across thirty independent runs for each parameter combination. In the tables,

N u m b e r

denotes the sequence number of each parameter configuration,

T_{m a x}

represents the number of iterations,

p o p

indicates the population size, w denotes the inertia weight, and

c_{1}

and

c_{2}

represent the learning factors.

The orthogonal experimental results in Table 2 show that the parameter settings of number 19 perform as the optimal in the NetScience, Email, and CA-GrQc networks, while number 20 is optimal merely in the NetHEHT network, and number 18 returns the optimal value in the Blog and CA-HepTh network. In the original PSO, the learning rates

c_{1}

and

c_{2}

are typically set as equal to 2.0 to balance individual and group learning. However, the original PSO is designed for continuous problems. For combinatorial optimization problems, the settings of these parameters must be adjusted according to the specific problem. The experimental results indicate that when

c_{1} = 1.8

and

c_{2} = 1.8

, the proposed LA-DPSO can obtain the optimal value in most scenarios. This is because a higher learning rate can effectively balance the solution quality with computational efficiency. Although larger iterations (

T_{m a x}

), population size (

p o p

), and inertia weight (w) typically can the enhance the performance, they also increase the algorithm’s time complexity inevitably. Table 3 shows that number 19 has the smallest standard deviation value in most cases, which suggests minimal volatility and relative stability. To further balance the efficiency and effectiveness, the parameter configuration of number 19 was selected as the optimal parameter setting for the proposed LA-DPSO.

5.3. Comparison Experiments

In this section, we present and discuss in detail the results of the comparison experiments on the typical evaluation metrics. The experiments were conducted by varying the seed set size from 10 to 50 (with an interval of 10) under the activation probability p =

0.05

and setting the number of Monte Carlo simulations to 1000. In addition, other key parameters were set as follows: the number of iterations

T_{m a x} = 150

, population size

p o p = 40

, inertia weight

w = 0.8

, and learning factors

c_{1} = 1.8

and

c_{2} = 1.8

. To ensure the reliability of the experimental results, each algorithm was independently executed 30 times and the average value was recorded.

5.3.1. Ablation Study

Ablation experiments were conducted to evaluate the contribution of model components to the overall performance. We conducted an ablation experiment to show the impact of the population partition strategy; meanwhile, the No_LA-DPSO that has no the partition strategy is implemented. The results are presented in Figure 2.

The experimental results show that the LA-DPSO algorithm significantly outperforms the No_LA-DPSO in influence propagation across all networks, confirming the effectiveness of the population partition strategy based on the fitness landscape. It demonstrates the fact that this strategy can effectively balance the global exploration and local exploitation by dynamically partitioning the population based on the r-value and guiding them to explore diverse search spaces, thereby avoiding premature convergence caused by population homogenization in traditional algorithms. With the network size and seed set increase, the LA-DPSO can identify key nodes in a more collective way and enhance the influence spread. The LA-DPSO consistently shows advantages in networks such as CA-GrQc, CA-HepTh, and NetHEHT, indicating strong robustness to various network topologies.

5.3.2. Comparison on the Convergence Speed

Since both LA-DPSO and DPSO are extensions of the PSO algorithm, we analyze their convergence speed on the six real networks at

k = 30

. Figure 3 compares the convergence speed of LA-DPSO and DPSO on these networks.

The experimental results indicate that LA-DPSO outperforms DPSO in both exploration and exploitation during the optimization process, enabling it to identify and approach the global optimum more rapidly. In the evolutionary process, LA-DPSO enhances the population diversity by dividing the population into elite and ordinary sub-populations based on the fitness distance correlation coefficient. The figure clearly shows that the iterative process of LA-DPSO exhibits a steady upward trend, attributed to guidance based on fitness landscape entropy, which prevents it from getting trapped into local optima. In contrast, DPSO evolves only intermittently and is prone to premature convergence. This is because DPSO employs a simplistic discretization of the PSO mechanism, which causes the population to fall into local optima easily. Therefore, the LA-DPSO makes a trade-off between the solution quality and the convergence speed.

5.3.3. Comparison on Influence Spread

To verify the accuracy of LA-DPSO on different types of networks, we compare the influence propagation of each algorithm under the IC model. The influence propagation is defined as the number of nodes activated by the seed nodes and is measured by simulating one thousand Monte Carlo independent propagations to ensure the accuracy. A larger propagation range indicates higher effectiveness in information dissemination. Figure 4 presents the influence propagation results of each algorithm on the six networks.

The experimental results show that CELF consistently outperforms other algorithms across all seed set sizes, primarily due to the global optimal guarantee provided by its greedy hill-climbing strategy. The LA-DPSO exhibits superior stability, ranking second only to CELF, particularly in complex solution spaces. This performance stems from its innovative integration of a dynamic population division mechanism based on the fitness landscape, which effectively maintains the population diversity. As evidenced in Figure 4c,d, LA-DPSO demonstrates significant advantages over other algorithms (excluding CELF) in high-density, low-average degree networks, where the entropy metric mechanism prevents the algorithm from being trapped into local optima.

In contrast, DPSO relies solely on the discretization mapping of standard PSO and lacks a dynamic parameter adjustment mechanism, which leads to local optima and unstable performance. The centrality-based DCGM++ algorithm exhibits reduced effectiveness in complex real-world networks due to its sensitivity to network structural characteristics. Furthermore, TS-VA-MODE and ENIMNR exhibit obvious inferior performance, primarily due to their heavy dependence on precise candidate node selection.

5.3.4. Comparison on Running Time

To verify the efficiency of the proposed LA-DPSO on different network types, we compare the running time of the six algorithms under identical conditions. Figure 5 presents the running time of each algorithm on the six real networks at

k = 30

and

k = 50

.

The experimental results illustrates that the running time efficiency of LA-DPSO shows super performance on the computational efficiency while maintaining high accuracy, particularly when compared to CELF. However, due to the high time complexity of its local search mechanism, LA-DPSO can be slower than TS-VA-MODE in cases when

k = 30

. Nonetheless, when

k = 50

, LA-DPSO outperforms TS-VA-MODE in terms of speed, as the latter requires more iterations to find satisfactory solutions for larger seed set sizes. The DCGM++ exhibits the shortest running time across all networks but is less accurate, as it focuses solely on the structural characteristics and overlooks other key factors.

5.4. Statistical Tests

To verify the statistical significance of the influence spread outcomes, statistical tests were conducted by using SPSS software. In the experiments, five values of k (

k = 10, \dots, 50

) were treated as independent optimization problems. A Wilcoxon signed-rank test was performed separately for each k value across the six networks. To ensure reliability, each algorithm was independently executed 30 times, and the best value, mean, and standard deviation for each algorithm across the six datasets were recorded in Table 4 and Table 5. The average values were used for statistical analysis, with the confidence level

α

set to 0.05. Table 6 presents the results, with Z indicating the magnitude of the performance differences and the p-value representing the likelihood that these differences are due to chance. Typically, a p-value below 0.05 signifies a statistically significant difference between the samples. In the table,

N +

and

N -

denote the number of instances where the benchmark algorithm (e.g., LA-DPSO) performs better or worse than other algorithms (e.g., DPSO) in the comparison.

Based on the data in Table 6, the proposed LA-DPSO algorithm shows a clear advantage (

p < 0.05

) over DCGM++ and ENIMNR in most test cases. Although differences between the CELF and LA-DPSO are not statistically significant for some sample sizes, LA-DPSO still achieves superior overall performance. Compared to the DPSO, LA-DPSO demonstrates significantly better solution performance. Additionally, LA-DPSO outperforms TS-VA-MODE across most sample sizes, except at

k = 10

and

k = 20

, where the differences are not significant. Overall, LA-DPSO exhibits higher stability and robustness, with statistically significant improvements over other algorithms.

6. Conclusions

This paper proposes a landscape-aware discrete particle swarm optimization algorithm to solve the influence maximization problem in social networks. The fitness distance correlation and the fitness landscape entropy metric are introduced for the first time to depict the characteristics of the solution, and then a novel population partition strategy is introduced. Three distinct search strategies are designed to balance the exploitation and exploration to avoid local optima and improve the solution quality. Experimental results demonstrate that LA-DPSO outperforms the state-of-the-art algorithms across various networks. However, it is worth noting that the population partition strategy relies on relatively simple measures, without considering a mixed model of multiple indicators. Future research will focus on developing more comprehensive metrics and efficient search strategies with low time complexity to further enhance the algorithm’s performance.

Author Contributions

Conceptualization, J.F. and J.T.; methodology, J.F.; software, J.F.; validation, B.C.; formal analysis, R.Z.; investigation, B.C.; resources, J.T.; data curation, J.F.; writing—original draft preparation, J.F.; writing—review and editing, J.T.; visualization, J.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the National Natural Science Foundation of China under grant number 62162040, the Gansu Provincial University Teachers Innovation Foundation under grant number 2024A-024, the Gansu Provincial Science Fund for Distinguished Young Scholars under grant number 23JRRA766, and the Gansu Provincial Science Fund for Technological Innovation Guidance Plan under grant number 24CXGA046.

Data Availability Statement

The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jaouadi, M.; Romdhane, L.B. A survey on influence maximization models. Expert Syst. Appl. 2024, 248, 123429. [Google Scholar] [CrossRef]
Tavasoli, A.; Shakeri, H.; Ardjmand, E.; Young, W.A., II. Incentive rate determination in viral marketing. Eur. J. Oper. Res. 2021, 289, 1169–1187. [Google Scholar] [CrossRef]
Peng, Y.; Zhao, Y.; Hu, J. On the role of community structure in evolution of opinion formation: A new bounded confidence opinion dynamics. Inf. Sci. 2023, 621, 672–690. [Google Scholar] [CrossRef]
Zhong, X.; Yang, Y.; Deng, F.; Liu, G. Rumor propagation control with anti-rumor mechanism and intermittent control strategies. IEEE Trans. Comput. Soc. Syst. 2023, 11, 2397–2409. [Google Scholar] [CrossRef]
Kempe, D.; Kleinberg, J.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar]
Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; Glance, N. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 420–429. [Google Scholar]
Goyal, A.; Lu, W.; Lakshmanan, L.V. Celf++ optimizing the greedy algorithm for influence maximization in social networks. In Proceedings of the 20th International Conference Companion on World Wide Web, Hyderabad, India, 28 March–1 April 2011; pp. 47–48. [Google Scholar]
Zhang, K.; Zhou, Y.; Long, H.; Wang, C.; Hong, H.; Armaghan, S.M. Towards identifying influential nodes in complex networks using semi-local centrality metrics. J. King Saud. Univ. Comput. Inf. Sci. 2023, 35, 101798. [Google Scholar] [CrossRef]
Chen, D.; Su, H. Identification of influential nodes in complex networks with degree and average neighbor degree. IEEE J. Emerg. Sel. Top. Circuits Syst. 2023, 13, 734–742. [Google Scholar] [CrossRef]
Gong, M.; Yan, J.; Shen, B.; Ma, L.; Cai, Q. Influence maximization in social networks based on discrete particle swarm optimization. Inf. Sci. 2016, 367, 600–614. [Google Scholar] [CrossRef]
Li, H.; Zhang, R.; Zhao, Z.; Liu, X.; Yuan, Y. Identification of top-k influential nodes based on discrete crow search algorithm optimization for influence maximization. Appl. Intell. 2021, 51, 7749–7765. [Google Scholar] [CrossRef]
Khatri, I.; Choudhry, A.; Rao, A.; Tyagi, A.; Vishwakarma, D.K.; Prasad, M. Influence Maximization in social networks using discretized Harris’ Hawks Optimization algorithm. Appl. Soft Comput. 2023, 149, 111037. [Google Scholar] [CrossRef]
Zhu, E.; Wang, H.; Zhang, Y.; Zhang, K.; Liu, C. PHEE: Identifying influential nodes in social networks with a phased evaluation-enhanced search. Neurocomputing 2024, 572, 127195. [Google Scholar] [CrossRef]
Wright, S. The Roles of Mutation, Inbreeding, Crossbreeding, and Selection in Evolution. 1932. Available online: http://www.esp.org/books/6th-congress/facsimile/contents/6th-cong-p356-wright.pdf (accessed on 5 March 2025).
Zou, F.; Chen, D.; Liu, H.; Cao, S.; Ji, X.; Zhang, Y. A survey of fitness landscape analysis for optimization. Neurocomputing 2022, 503, 129–139. [Google Scholar] [CrossRef]
Lu, W.X.; Zhou, C.; Wu, J. Big social network influence maximization via recursively estimating influence spread. Knowl. Based Syst. 2016, 113, 143–154. [Google Scholar] [CrossRef]
Lozano-Osorio, I.; Sánchez-Oro, J.; Duarte, A.; Cordón, Ó. A quick GRASP-based method for influence maximization in social networks. J. Ambient Intell. Humaniz. Comput. 2023, 14, 3767–3779. [Google Scholar] [CrossRef]
Singh, S.S.; Srivastva, D.; Verma, M.; Singh, J. Influence maximization frameworks, performance, challenges and directions on social network: A theoretical study. J. King. Saud. Univ. Comput. Inf. Sci. 2022, 34, 7570–7603. [Google Scholar] [CrossRef]
Yang, P.L.; Xu, G.Q.; Yu, Q.; Guo, J.W. An adaptive heuristic clustering algorithm for influence maximization in complex networks. Chaos Interdiscip. J. Nonlinear Sci. 2020, 30, 093106. [Google Scholar] [CrossRef]
Li, W.; Zhong, K.; Wang, J.; Chen, D. A dynamic algorithm based on cohesive entropy for influence maximization in social networks. Expert Syst. Appl. 2021, 169, 114207. [Google Scholar] [CrossRef]
Kianian, S.; Rostamnia, M. An efficient path-based approach for influence maximization in social networks. Expert Syst. Appl. 2021, 167, 114168. [Google Scholar] [CrossRef]
Xiao, Y.; Chen, Y.; Zhang, H.; Zhu, X.; Yang, Y.; Zhu, X. A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood. Artif. Intell. Rev. 2024, 57, 115. [Google Scholar] [CrossRef]
Rao, K.V.; Chowdary, C.R. CBIM: Community-based influence maximization in multilayer networks. Inf. Sci. 2022, 609, 578–594. [Google Scholar]
Bouyer, A.; Beni, H.A.; Arasteh, B.; Aghaee, Z.; Ghanbarzadeh, R. FIP: A fast overlapping community-based Influence Maximization Algorithm using probability coefficient of global diffusion in social networks. Expert Syst. Appl. 2023, 213, 118869. [Google Scholar] [CrossRef]
Liu, X.; Ye, S.; Fiumara, G.; De Meo, P. Influence nodes identifying method via community-based backward generating network framework. IEEE Trans. Netw. Sci. Eng. 2023, 11, 236–253. [Google Scholar] [CrossRef]
Ma, K.; Xu, X.; Yang, H.; Cao, R.; Zhang, L. Fair Influence Maximization in Social Networks: A Community-Based Evolutionary Algorithm. IEEE Trans. Emerg. Top. Comput. 2024, 13, 262–275. [Google Scholar] [CrossRef]
Li, H.; Xu, M.; Bhowmick, S.S.; Rayhan, J.S.; Sun, C.; Cui, J. PIANO: Influence maximization meets deep reinforcement learning. IEEE Trans. Comput. Soc. Syst. 2022, 10, 1288–1300. [Google Scholar] [CrossRef]
Kumar, S.; Mallik, A.; Khetarpal, A.; Panda, B.S. Influence maximization in social networks using graph embedding and graph neural network. Inf. Sci. 2022, 607, 1617–1636. [Google Scholar] [CrossRef]
Kumar, S.; Mallik, A.; Panda, B. Influence maximization in social networks using transfer learning via graph-based LSTM. Expert Syst. Appl. 2023, 212, 118770. [Google Scholar] [CrossRef]
Tang, J.; Qu, J.; Song, S.; Zhao, Z.; Du, Q. GCNT: Identify influential seed set effectively in social networks by integrating graph convolutional networks with graph transformers. J. King. Saud. Univ. Comput. Inf. Sci. 2024, 36, 102183. [Google Scholar] [CrossRef]
Li, Y.; Lu, T.; Li, W.; Zhang, P. HCCKshell: A heterogeneous cross-comparison improved Kshell algorithm for Influence Maximization. Infor. Process. Manag. 2024, 61, 103681. [Google Scholar] [CrossRef]
Tang, J.; Zhang, R.; Yao, Y.; Zhao, Z.; Wang, P.; Li, H.; Yuan, J. Maximizing the spread of influence via the collective intelligence of discrete bat algorithm. Knowl. Based Syst. 2018, 160, 88–103. [Google Scholar] [CrossRef]
Biswas, T.K.; Abbasi, A.; Chakrabortty, R.K. A two-stage VIKOR assisted multi-operator differential evolution approach for Influence Maximization in social networks. Expert Syst. Appl. 2022, 192, 116342. [Google Scholar] [CrossRef]
Li, W.; Hu, Y.; Jiang, C.; Wu, S.; Bai, Q.; Lai, E. ABEM: An adaptive agent-based evolutionary approach for influence maximization in dynamic social networks. Appl. Soft Comput. 2023, 136, 110062. [Google Scholar] [CrossRef]
Wang, L.; Ma, L.; Wang, C.; Xie, N.G.; Koh, J.M.; Cheong, K.H. Identifying influential spreaders in social networks through discrete moth-flame optimization. IEEE Trans. Evolut. Comput. 2021, 25, 1091–1102. [Google Scholar] [CrossRef]
Cui, L.; Hu, H.; Yu, S.; Yan, Q.; Ming, Z.; Wen, Z.; Lu, N. DDSE: A novel evolutionary algorithm based on degree-descending search strategy for influence maximization in social networks. J. Netw. Comput. Appl. 2018, 103, 119–130. [Google Scholar] [CrossRef]
Fang, J.; Liu, H.L.; Gu, F. A constrained multi-objective evolutionary algorithm based on fitness landscape indicator. Appl. Soft Comput. 2024, 166, 112128. [Google Scholar] [CrossRef]
Malan, K.M.; Engelbrecht, A.P. Quantifying ruggedness of continuous landscapes using entropy. In Proceedings of the 2009 IEEE Congress on Evolutionary Computation, Trondheim, Norway, 18–21 May 2009; pp. 1440–1447. [Google Scholar]
Zhao, F.; Ji, F.; Xu, T.; Zhu, N. Hierarchical parallel search with automatic parameter configuration for particle swarm optimization. Appl. Soft Comput. 2024, 151, 111126. [Google Scholar] [CrossRef]
Koyuncuoğlu, M.U.; Demir, L. An adaptive hybrid variable-large neighborhood search algorithm for profit maximization problem in designing production lines. Comput. Ind. Eng. 2023, 175, 108871. [Google Scholar] [CrossRef]
Wei, P.; Zhou, J.; Yan, B.; Zeng, Y. ENIMNR: Enhanced node influence maximization through node representation in social networks. Chaos Solitons Fract. 2024, 186, 115192. [Google Scholar] [CrossRef]

Figure 1. Flowchart for the proposed LA-DPSO.

Figure 2. Comparisons of influence spread between LA-DPSO and No_LA-DPSO on the six networks under

p = 0.05

.

Figure 2. Comparisons of influence spread between LA-DPSO and No_LA-DPSO on the six networks under

p = 0.05

.

Figure 3. Comparison on the convergence speed between LA-DPSO and No_LA-DPSO on the six networks under

p = 0.05

.

Figure 3. Comparison on the convergence speed between LA-DPSO and No_LA-DPSO on the six networks under

p = 0.05

.

Figure 4. Comparison on the influence propagation of the six algorithms on six networks at different k.

Figure 5. The running time of the different algorithms in the six networks under different seed set sizes.

Table 1. Statistical characteristics of the networks.

ID	Networks	$\| V \|$	$\| E \|$	$d_{ave}$	$c_{ave}$
1	NetScience	379	914	4.82	0.74
2	Email	1133	5451	9.62	0.22
3	Blog	3982	6803	3.42	0.28
4	CA-GrQc	5242	14,496	5.53	0.53
5	CA-HepTh	9877	25,998	5.26	0.47
6	NetHEHT	15,229	31,376	4.12	0.50

Table 2. The mean value of the proposed algorithm under different parameter settings based on the orthogonal experiment on the six datasets.

Number	$T_{\max}$	$pop$	w	$c_{1}$	$c_{2}$	NetScience	Email	Blog	CA-GrQc	CA-HepTh	NetHEHT
1	5	10	0.2	1.2	1.2	45.065	145.116	96.096	161.043	181.960	195.630
2	5	20	0.6	1.8	2.0	44.518	146.696	97.600	162.499	184.042	197.906
3	5	30	1.0	1.4	1.8	45.251	147.405	97.528	163.687	185.216	198.949
4	5	40	0.4	2.0	1.6	45.126	147.494	98.029	164.818	185.671	200.093
5	5	50	0.8	1.6	1.4	45.458	147.500	98.075	166.025	185.913	201.024
6	50	10	1.0	1.8	1.6	45.851	147.655	98.089	165.613	186.352	200.568
7	50	20	0.4	1.4	1.4	45.407	149.046	99.229	165.319	186.619	200.189
8	50	30	0.8	2.0	1.2	46.475	148.351	98.341	165.334	186.535	200.263
9	50	40	0.2	1.6	2.0	46.617	148.663	98.352	165.593	186.388	200.383
10	50	50	0.6	1.2	1.8	46.714	148.932	98.459	165.513	186.835	200.589
11	100	10	0.8	1.4	2.0	46.896	148.787	100.344	165.476	185.443	199.950
12	100	20	0.2	2.0	1.8	47.150	148.996	100.615	166.650	186.852	200.380
13	100	30	0.6	1.6	1.6	47.218	149.069	100.399	166.734	186.723	200.527
14	100	40	1.0	1.2	1.4	48.593	149.191	101.028	166.975	186.843	201.167
15	100	50	0.4	1.8	1.2	48.533	149.289	101.048	166.165	187.183	201.493
16	150	10	0.6	2.0	1.4	48.597	148.192	100.093	166.107	187.352	202.126
17	150	20	1.0	1.6	1.2	48.949	148.327	102.046	167.382	187.711	201.660
18	150	30	0.4	1.2	2.0	49.549	150.198	103.035	167.562	187.813	202.983
19	150	40	0.8	1.8	1.8	50.078	150.276	103.030	167.997	186.837	203.011
20	150	50	0.2	1.4	1.6	49.744	150.198	102.999	166.959	186.801	203.035
21	200	10	0.4	1.6	1.8	47.951	149.060	99.966	165.111	186.929	200.064
22	200	20	0.8	1.2	1.6	47.921	149.426	100.354	166.080	186.952	201.698
23	200	30	0.2	1.8	1.4	48.445	149.513	100.410	166.184	187.045	201.833
24	200	40	0.6	1.4	1.2	47.460	148.643	100.119	165.541	185.925	201.221
25	200	50	1.0	2.0	2.0	47.191	149.113	100.063	165.101	185.026	201.033

Table 3. The standard deviation of the proposed algorithm under different parameter settings based on the orthogonal experiment on the six datasets.

Number	$T_{\max}$	$pop$	w	$c_{1}$	$c_{2}$	NetScience	Email	Blog	CA-GrQc	CA-HepTh	NetHEHT
1	5	10	0.2	1.2	1.2	1.037	2.315	2.999	2.817	1.537	1.146
2	5	20	0.6	1.8	2.0	1.401	2.446	2.211	3.024	1.475	2.606
3	5	30	1.0	1.4	1.8	1.528	3.386	2.510	2.785	4.036	2.706
4	5	40	0.4	2.0	1.6	1.103	1.692	2.782	3.181	3.109	1.294
5	5	50	0.8	1.6	1.4	1.536	2.754	2.631	3.730	3.042	2.242
6	50	10	1.0	1.8	1.6	1.421	2.635	2.927	2.654	2.365	1.436
7	50	20	0.4	1.4	1.4	1.123	1.574	2.851	2.183	2.821	4.090
8	50	30	0.8	2.0	1.2	0.978	1.049	1.077	1.702	2.012	3.283
9	50	40	0.2	1.6	2.0	1.502	2.386	2.253	1.724	3.038	2.863
10	50	50	0.6	1.2	1.8	1.366	2.832	1.587	2.085	2.707	3.755
11	100	10	0.8	1.4	2.0	1.197	1.232	2.204	1.895	2.633	2.288
12	100	20	0.2	2.0	1.8	1.395	2.242	1.862	1.445	3.310	2.120
13	100	30	0.6	1.6	1.6	0.320	0.741	1.466	0.719	2.381	0.448
14	100	40	1.0	1.2	1.4	0.213	0.547	1.675	0.697	2.091	1.942
15	100	50	0.4	1.8	1.2	0.293	1.240	0.373	0.582	0.624	1.167
16	150	10	0.6	2.0	1.4	0.214	0.577	0.646	0.438	0.493	0.432
17	150	20	1.0	1.6	1.2	0.244	0.888	0.781	0.536	0.668	0.937
18	150	30	0.4	1.2	2.0	0.208	0.494	0.641	0.262	0.457	0.944
19	150	40	0.8	1.8	1.8	0.210	0.389	0.335	0.337	0.251	1.216
20	150	50	0.2	1.4	1.6	0.234	0.398	0.453	0.452	0.257	1.277
21	200	10	0.4	1.6	1.8	0.258	1.177	0.396	0.280	0.488	0.912
22	200	20	0.8	1.2	1.6	0.292	0.600	0.531	0.412	0.439	0.441
23	200	30	0.2	1.8	1.4	0.232	0.742	0.699	0.576	0.368	0.616
24	200	40	0.6	1.4	1.2	0.330	0.417	1.151	0.492	0.810	0.659
25	200	50	1.0	2.0	2.0	0.349	1.323	1.122	0.473	0.504	0.707

Table 4. The best value, mean, and standard deviation of CELF, DPSO, and TS_VA_MODE on the six datasets.

		CELF			DPSO			TS_VA_MODE
Network	$k$	Mean	SD	Max	Mean	SD	Max	Mean	SD	Max
	10	21.773	0.060	21.857	20.160	1.428	22.872	19.060	2.273	22.462
	20	37.875	0.053	37.950	34.222	1.687	36.595	33.032	2.530	36.738
Netscience	30	51.462	0.051	51.550	46.553	1.491	48.960	45.066	2.136	48.353
	40	64.038	0.057	64.130	60.960	1.471	63.344	59.703	2.490	63.011
	50	75.936	0.058	76.011	73.064	1.618	75.431	72.057	2.255	75.754
	10	89.630	0.062	89.735	83.335	1.540	85.547	87.715	2.145	91.664
	20	123.504	0.058	123.596	110.755	1.486	112.943	121.989	2.412	125.529
Email	30	149.429	0.062	149.518	133.205	1.450	135.503	147.766	2.259	151.031
	40	168.807	0.065	168.898	156.555	1.393	159.105	166.944	2.400	170.420
	50	187.997	0.159	188.414	174.407	1.449	177.059	183.002	1.950	186.636
	10	55.191	0.051	55.286	48.644	1.571	51.215	48.158	2.169	51.633
	20	88.598	0.053	88.685	73.516	1.348	75.746	69.874	1.765	73.343
Blog	30	116.510	0.060	116.599	94.703	1.397	97.062	88.202	2.027	92.009
	40	141.814	0.062	141.908	115.753	1.250	118.016	101.634	2.205	105.925
	50	164.787	0.161	164.991	125.691	1.332	128.012	118.806	2.227	122.176
	10	153.595	0.003	153.783	123.627	1.838	126.593	75.453	2.525	80.068
	20	190.492	0.102	190.647	134.428	2.434	138.034	86.669	3.043	91.329
CA-GrQc	30	220.964	0.106	221.161	153.496	1.921	156.546	116.426	2.888	121.246
	40	242.715	0.106	242.904	167.015	2.176	170.532	136.232	3.064	140.690
	50	271.175	0.106	271.367	203.875	1.892	207.210	151.902	3.141	157.217
	10	103.093	0.025	103.275	82.444	2.160	86.001	37.061	3.241	42.211
	20	160.491	0.111	160.673	140.166	1.982	143.333	64.676	3.115	68.696
CA-HepTh	30	204.406	0.114	204.546	173.207	2.042	176.208	101.849	2.781	106.482
	40	242.115	0.101	242.288	196.222	1.884	199.212	122.978	3.006	126.936
	50	274.074	0.199	274.561	234.062	1.856	237.918	163.398	2.876	168.003
	10	107.151	0.109	107.358	91.591	2.045	95.359	90.153	2.673	94.540
	20	160.247	0.116	160.432	129.159	2.114	133.020	139.327	2.982	144.211
NetHEHT	30	205.017	0.134	205.218	165.761	1.999	169.923	195.416	2.872	199.742
	40	237.770	0.112	237.936	199.913	2.336	203.515	227.393	2.873	231.906
	50	267.914	0.114	268.104	220.307	1.994	222.977	255.084	2.510	259.009

Table 5. The best value, mean, and standard deviation of DCGM++, ENIMNR, and LA-DPSO on the six datasets.

		DCGM++			ENIMNR			LA-DPSO
Network	$k$	Mean	SD	Max	Mean	SD	Max	Mean	SD	Max
	10	21.097	0.097	21.297	14.972	2.104	18.307	20.980	0.241	21.378
	20	32.679	0.096	32.809	27.735	1.817	30.947	35.158	0.277	35.485
Netscience	30	46.068	0.118	46.296	37.555	2.081	40.294	47.861	0.192	48.199
	40	58.449	0.126	58.638	47.814	1.812	51.688	61.698	0.203	62.137
	50	69.227	0.114	69.426	59.141	2.290	62.600	74.960	0.190	75.394
	10	86.724	0.119	86.943	85.375	1.997	88.384	85.897	0.226	86.265
	20	116.540	0.092	116.701	112.918	1.868	116.821	121.916	0.238	122.283
Email	30	139.937	0.109	140.141	138.641	2.006	141.708	148.932	0.240	149.339
	40	158.126	0.113	158.326	157.798	1.926	161.802	168.550	0.222	168.972
	50	173.786	0.125	173.985	171.678	2.041	174.866	188.135	0.259	188.514
	10	47.180	0.112	47.366	43.124	1.784	46.508	50.006	0.232	50.359
	20	64.575	0.130	64.746	59.385	2.207	63.012	74.052	0.222	74.507
Blog	30	80.978	0.130	81.148	70.176	2.231	73.934	99.836	0.220	100.279
	40	94.632	0.116	94.793	83.857	2.025	87.005	126.186	0.222	126.583
	50	112.765	0.116	112.955	95.143	1.964	98.536	136.675	0.197	137.057
	10	75.404	0.325	75.927	75.027	2.560	79.879	127.327	0.432	127.970
	20	87.490	0.342	88.079	88.205	2.825	92.389	139.428	0.340	140.096
CA-GrQc	30	99.269	0.380	99.856	123.039	2.287	127.338	165.897	0.394	166.617
	40	115.695	0.288	116.248	135.490	3.057	140.131	186.401	0.439	187.076
	50	128.132	0.363	128.704	144.075	2.812	147.969	223.293	0.403	223.779
	10	91.570	0.353	92.198	28.465	2.797	32.612	93.235	0.207	93.701
	20	129.354	0.380	129.910	44.825	2.504	49.003	135.327	0.356	136.050
CA-HepTh	30	181.726	0.385	182.228	79.812	2.385	84.394	186.424	0.515	187.113
	40	209.922	0.310	210.545	96.093	2.536	101.100	211.692	0.365	212.532
	50	235.687	0.314	236.177	152.718	2.627	156.991	244.504	0.414	245.100
	10	92.091	0.309	92.638	80.690	2.655	84.140	99.140	0.264	99.735
	20	135.330	0.378	135.889	92.171	2.943	96.149	148.537	0.365	149.129
NetHEHT	30	160.538	0.320	161.173	108.008	2.673	112.294	200.148	0.418	200.736
	40	192.992	0.403	193.590	152.413	2.577	156.275	230.193	0.380	230.767
	50	227.806	0.320	228.370	189.976	2.250	193.943	265.507	0.372	266.146

Table 6. The statistical results of the Wilcoxon test on the six algorithms at

α = 0.05

on the six networks.

Table 6. The statistical results of the Wilcoxon test on the six algorithms at

α = 0.05

on the six networks.

LA-DPSO	k	$N -$	$N +$	Z	p-Value
vs.
	10	6	0	−2.201	0.028
	20	6	0	−2.201	0.028
CELF	30	6	0	−2.201	0.028
	40	6	0	−2.201	0.028
	50	5	1	−1.992	0.046
	10	0	6	−2.201	0.028
	20	0	6	−2.201	0.028
DPSO	30	0	6	−2.201	0.028
	40	0	6	−2.201	0.028
	50	0	6	−2.201	0.028
	10	2	4	−1.572	0.116
	20	2	4	−1.572	0.116
TS-VA-MODE	30	0	6	−2.201	0.028
	40	0	6	−2.201	0.028
	50	0	6	−2.201	0.028
	10	1	5	−1.992	0.046
	20	1	5	−1.992	0.046
DCGM++	30	0	6	−2.201	0.028
	40	0	6	−2.201	0.028
	50	0	6	−2.201	0.028
	10	0	6	−2.201	0.028
	20	0	6	−2.201	0.028
ENIMNR	30	0	6	−2.201	0.028
	40	0	6	−2.201	0.028
	50	0	6	−2.201	0.028

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chai, B.; Fu, J.; Zhang, R.; Tang, J. A Landscape-Aware Discrete Particle Swarm Optimization for the Influence Maximization Problem in Social Networks. Symmetry 2025, 17, 435. https://doi.org/10.3390/sym17030435

AMA Style

Chai B, Fu J, Zhang R, Tang J. A Landscape-Aware Discrete Particle Swarm Optimization for the Influence Maximization Problem in Social Networks. Symmetry. 2025; 17(3):435. https://doi.org/10.3390/sym17030435

Chicago/Turabian Style

Chai, Baoqiang, Jiaqiang Fu, Ruisheng Zhang, and Jianxin Tang. 2025. "A Landscape-Aware Discrete Particle Swarm Optimization for the Influence Maximization Problem in Social Networks" Symmetry 17, no. 3: 435. https://doi.org/10.3390/sym17030435

APA Style

Chai, B., Fu, J., Zhang, R., & Tang, J. (2025). A Landscape-Aware Discrete Particle Swarm Optimization for the Influence Maximization Problem in Social Networks. Symmetry, 17(3), 435. https://doi.org/10.3390/sym17030435

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Landscape-Aware Discrete Particle Swarm Optimization for the Influence Maximization Problem in Social Networks

Abstract

1. Introduction

2. Related Work

2.1. Greedy Algorithms

2.2. Heuristic Algorithms

2.3. Community-Based Algorithms

2.4. Machine Learning-Based Algorithms

2.5. Meta-Heuristic Algorithms

3. Preliminaries

3.1. Influence Maximization Problem

3.2. Diffusion Models

3.3. Influence Estimating Function

3.4. Fitness Landscape Metrics

3.5. Particle Swarm Optimization

4. Proposed Algorithm

4.1. Initialization

4.2. Evolutionary Rules

4.2.1. Updating Mechanism for Velocity

4.2.2. Updating Rule for Position

4.3. Refining Search Mechanism

4.4. Fitness Landscape-Aware Evolution

4.4.1. Population Partition

4.4.2. Global Search Mechanism

4.4.3. Variable Neighbourhood Search

4.5. Complexity Analysis

5. Experimental Results and Analysis

5.1. Datasets and Baselines

5.2. Parameter Settings

5.3. Comparison Experiments

5.3.1. Ablation Study

5.3.2. Comparison on the Convergence Speed

5.3.3. Comparison on Influence Spread

5.3.4. Comparison on Running Time

5.4. Statistical Tests

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI