HEDV-Greedy: An Advanced Algorithm for Influence Maximization in Hypergraphs

Wang, Haosen; Pan, Qingtao; Tang, Jun

doi:10.3390/math12071041

Open AccessArticle

HEDV-Greedy: An Advanced Algorithm for Influence Maximization in Hypergraphs

by

Haosen Wang

^†

,

Qingtao Pan

^† and

Jun Tang

^*

College of Systems Engineering, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2024, 12(7), 1041; https://doi.org/10.3390/math12071041

Submission received: 8 January 2024 / Revised: 22 March 2024 / Accepted: 28 March 2024 / Published: 30 March 2024

(This article belongs to the Section Network Science)

Download

Browse Figures

Versions Notes

Abstract

:

Influence maximization (IM) has shown wide applicability in various fields over the past few decades, e.g., viral marketing, rumor control, and prevention of infectious diseases. Nevertheless, existing research on IM primarily focuses on ordinary networks with pairwise connections between nodes, which fall short in the representation of higher-order relations. Influence maximization on hypergraphs (HIM) has received limited research attention. A novel evaluation function, which aims to evaluate the spreading influence of selected nodes on hypergraphs, i.e., expected diffusion value on hypergraph (HEDV), is proposed in this work. Then, an advanced greedy-based algorithm, termed HEDV-greedy, is proposed to select seed nodes with maximum spreading influence on the hypergraph. We conduct extensive experiments on eight real-world hypergraph datasets, benchmarking HEDV-greedy against eight state-of-the-art methods for the HIM problem. Extensive experiments conducted on real-world datasets highlight the effectiveness and efficiency of our proposed methods. The HEDV-greedy algorithm demonstrates a marked reduction in time complexity by two orders of magnitude compared to the conventional greedy method. Moreover, HEDV-greedy outperforms other state-of-the-art algorithms across all datasets. Specifically, under conditions of lower propagation probability, HEDV-greedy exhibits an average improvement in solution accuracy of 25.76%.

Keywords:

influence maximization; hypergraph; spreading dynamic; complex network

MSC:

05C90

1. Introduction

In recent times, social media platforms like Facebook, TikTok, Twitter, and Instagram have gained widespread popularity on a global scale, catering to the increasing need for communication and social interaction. Since 2023, the impact of social media on our daily lives has continued to surge, with individuals now dedicating more than two and a half hours per day to these platforms, exceeding the time spent on traditional broadcast and cable television by 40 min [1]. This escalating engagement underscores the pivotal role these platforms play in modern communication.

Moreover, given their expansive user base and rapid information dissemination, social media channels have evolved into indispensable marketing avenues for businesses and advertisers. In a bid to maximize marketing effectiveness while minimizing costs, the strategy of viral marketing has emerged [2]. This approach involves targeting a select group of initial users within a social network and leveraging their recommendations and sharing activities to promote a product effectively. Domingos and Richardson drew inspiration from viral marketing, utilizing Markov random field to simulate the dynamics of information propagation [3]. They conceptualized the challenge as an optimization task, introducing the influence maximization (IM) problem. The IM problem seeks to strategically select a predefined number of nodes within a given network, forming a seed set, and aims to maximize influence diffusion according to a specified information propagation model. Beyond its applications in viral marketing, IM holds substantial relevance in areas like rumor control [4], the prevention of infectious diseases [5], and others [6,7,8]. In the context of rumor control, the identification of crucial nodes and the prompt implementation of interventions can effectively curtail the spread and impact of rumors. In the realm of infectious disease prevention, the identification of individuals with high transmission capabilities and the implementation of targeted treatment and control measures can effectively contain the spread of diseases.

Existing research on the IM problem primarily focuses on ordinary networks with pairwise connections between nodes. However, in scenarios involving relationships among three or more entities, ordinary networks fall short in representation. In fact, besides ordinary networks, self-similar graphs and hypergraphs can also be used to construct social networks [9]. Hypergraphs introduced by Berge utilize hyperedges to represent interactions among multiple nodes. A hyperedge can encompass multiple nodes, where the nodes within the same hyperedge are interconnected in a fully connected manner. Hypergraphs find broad applications in modeling multidimensional relationships, such as co-authorship in scientific research papers, user group relationships in social networking platforms, and the combinations of various proteins in organic compounds [10]. Despite their robust representation capabilities, research on hypergraph influence maximization (HIM) remains limited, with few studies in this area. Challenges in HIM research involve proposing novel diffusion models for hypergraphs to more accurately depict real-world spreading processes and developing solution methods that balance computational complexity and accuracy.

The greedy method, initially developed by Kempe et al. for the IM problem on ordinary networks [11], is crucial for addressing the HIM problem but suffers from high computational complexity. The primary concern is reducing the time expenditure in selecting nodes with maximum marginal benefit. Building upon the expected diffusion value (EDV) framework [12], we introduce the expected diffusion value on hypergraph (HEDV) framework, accounting for the unique topology and spreading processes of hypergraphs. HEDV approximates the global diffusion capacity of seed nodes by considering the spread within their neighbor range. The HEDV-greedy method utilizes HEDV to select seed nodes with maximum spreading influence on the hypergraph. Our algorithm leverages HEDV to approximate the marginal benefit of candidate nodes, enhancing efficiency without compromising accuracy through extensive Monte Carlo simulations. In brief, our key contributions are summarized as follows:

We propose a novel function, HEDV, designed to evaluate the spreading influence of seed nodes on hypergraphs. To the best of our knowledge, HEDV is the first quantitative function utilized for evaluating node influence on hypergraphs within this domain. HEDV considers both the topological structure and spreading processes, leading to highly precise evaluations.
Based on the HEDV function, we propose an efficient solution to the HIM problem, termed HEDV-greedy. This method represents the first attempt to combine an alternative evaluation method with greedy selection strategies for seed node selection. While avoiding the significant time overhead associated with Monte Carlo simulations, it is capable of obtaining highly accurate solutions.
We further conduct extensive experiments on real-world hypergraph datasets, benchmarking HEDV-greedy against other state-of-the-art methods for the HIM problem. The results demonstrate that HEDV-greedy significantly reduces time complexity by two orders of magnitude compared to traditional greedy method. Moreover, HEDV-greedy outperforms other baselines across all datasets, achieving an average improvement of 25.76%.
We analyze experimental results under various parameter settings, utilizing visualization techniques and non-parametric tests. The results indicate that HEDV-greedy demonstrates high stability and consistently provides superior solutions across a variety of scenarios. Furthermore, we undertake a comprehensive analysis to achieve a more profound comprehension of the factors that influence the effectiveness of the proposed method.

Organization of the Article

The remaining sections of this paper are organized as follows: Section 2 provides an overview of the hypergraphs and the existing related research, especially the HIM problem. Section 3 introduces the information diffusion model on hypergraphs used in this study, as well as a detailed description of the proposed HEDV and HEDV-greedy. Subsequently, Section 4 presents the experimental setup, including the datasets and algorithms for comparison, and analyzes the experimental results using visualization, non-parametric tests, and other methods. Finally, Section 5 provides a concise conclusion.

2. Related Works

Kempe et al. demonstrated that the IM problem in regular networks is an NP-hard problem, implying that its resolution cannot be achieved in polynomial time [11]. Over the past decade, researchers across diverse domains have undertaken extensive investigations, introducing various algorithms to address the IM problem from distinct angles. The methodologies for tackling IM issues typically fall into five main categories: greedy algorithms, heuristic solutions, approaches based on reverse influence sampling (RIS), community-based strategies, and meta-heuristic optimization algorithms.

A greedy approach was initially introduced by Kempe et al. for addressing the IM problem, whose time expenditure is deemed unacceptable. To enhance efficiency, subsequent greedy-based methods such as CELF [13], CELF++ [14], NewGreedy, MixGreedy [15], and others have been successively proposed. Despite notable reductions in running time achieved by these algorithms, they still grapple with excessive time costs when handling extensive social networks. Furthermore, the prevailing belief is that a node’s location within a network significantly influences the extent of influence propagation. Nodes closer to the network center are deemed more critical, possessing greater diffusion potential. Building on this premise, seed node selection methods inspired by centrality, such as betweenness centrality [16], degree centrality, and closeness centrality [17], have been proposed. Examples of RIS-based approaches include RIS [18], TIM/TIM+ [19], IMM [20], and SSA/D-SSA [21], which overcome the shortcomings of heuristic solutions, such as the absence of theoretical guarantees and unstable solution quality. Nonetheless, the precise control of the required number of reverse reachable sets remains an avenue for further research. Community-based approaches effectively minimize the overlap of influences among seed nodes. Examples of such approaches include CGA [22], INCIM [23], and ComPath [24]. Optimization algorithms inspired by natural phenomena or biological behaviors, known as meta-heuristics, have proven effective in tackling the IM problem. Jiang et al. introduced the EDV evaluation function, utilizing the simulated annealing (SA) algorithm to identify the most influential nodes [12]. Gong et al. proposed the local influence estimation (LIE) function, accounting for influence within the two-hop neighborhood of seed nodes. They utilized the discrete particle swarm optimization (DPSO) algorithm to optimize it [25]. While these methods enhance runtime efficiency compared to greedy strategies, challenges persist in designing precise evaluation functions and efficient optimization mechanisms.

In some instances, the conventional approach of representing intricate networks through simple or directed graphs may fall short in capturing the full complexity of real-world systems under investigation [10]. For example, consider a research collaboration network depicted as a simple graph, where nodes denote authors and an edge connects two nodes if those authors have collaborated on a paper. Within this simplified graph-based representation, we can only discern binary author collaboration, unable to identify scenarios where three or more authors have collaborated due to the inherent limitations of simple and directed graphs, which only allow for pairwise relationships. Hypergraphs offer a more natural means of representing such systems [10]. In a hypergraph, hyperedges can encompass three or more nodes, symbolizing combinations of nodes sharing specific relationships. Consequently, employing a hypergraph to represent a research collaboration network proves highly effective. While authors are still represented as nodes, hyperedges now represent combinations of authors who have collaborated on papers. By examining these hyperedges, we can conclusively detect collaboration involving multiple authors. Given the distinctive topology of hypergraphs, researchers have delved into further exploration of their network structure and topological indicators. For instance, Estrada et al. examined subgraph centrality and clustering in hypergraphs, offering illustrative examples for analysis [26]. Ma and Liu proposed four topological indicators tailored for hypergraphs, including node superdegree, superedge degree, superedge–superedge distance, and superedge overlap [27]. The current landscape of hypergraph research encompasses evolving models, information diffusion models, the HIM problem, community detection [28,29], hyperlink prediction [30], and more. This article places primary emphasis on the HIM problem.

Kempe et al. established that the IM problem is NP-hard in conventional networks [11]. The HIM problem, which can be viewed as an extension of the IM problem in conventional networks, is also NP-hard [31]. In essence, this implies that solving the HIM problem is equally computationally challenging within polynomial time. Presently, research on the IM problem in the context of hypergraphs remains limited and confronts two principal challenges: formulating effective propagation models and developing robust algorithms for selecting seed nodes.

In the realm of propagation models, Du introduced the information dissemination model based on probabilistic hypergraphs [32]. Bodó, acknowledging both community structure and the nonlinear dependence of infection pressure on the number of infected neighbors, extended the susceptible–infected–susceptible (SIS) epidemic model to hypergraphs [33]. Suo et al. utilized hypergraphs to represent clustering relationships among multiple entities and proposed two SIS-based information propagation models, incorporating reaction process strategy and contact process strategy [34]. Recognizing the dynamic nature of social networks, Jiang et al. established an SIS-based information propagation model with three dynamic processes [35]. Antelmi et al. innovatively introduced a propagation model where a hyperedge becomes infected when the number of infected nodes in the hyperedge reaches a certain threshold [36], and a node is considered infected when the number of infected hyperedges it is involved in reaches a specific threshold. Wang et al. defined various states of nodes to characterize diverse attitudes of individuals in the social information spreading process, corresponding to different spreading behaviors. Notably, the state of a node can evolve over time. To capture the dynamics of individual behaviors and information transmission, they devised a two-layer coupling mechanism termed PN-UHTR [37]. Owing to the distinct topological structure and information propagation processes of hypergraphs in comparison to ordinary networks, there exists a compelling necessity for the development of new propagation models better suited for real-world systems.

Moreover, achieving equilibrium between the time expenditure and search precision of search algorithms in the HIM problem stands as a noteworthy focus of research. Xiao et al. [38] and Kapoor et al. [39] introduced innovative multi-criteria measurement approaches. These approaches consider factors such as node degree, star degree, and betweenness to discern pivotal nodes within hypergraphs. Zhu et al. [31] employed directed hypergraphs for modeling social networks and devised a sandwich computing framework to tackle the HIM problem. Nonetheless, the time complexity associated with their methodologies is notably high. In a separate investigation, Zheng et al. [40] delved into the subset influence maximization problem within hypergraphs, presenting an enhanced greedy algorithm with an approximate ratio of

1 - 1 / e^{- 1 / (Δ + 1)}

. Additionally, Antelmi [41] proposed a series of greedy-based heuristic solutions to identify the minimum influence set in hypergraphs. In recent research, Xie et al. [42] put forward an adaptive degree discount heuristic algorithm called hyper adaptive degree pruning (HADP) to address the HIM problem. This method iteratively selects nodes with low influence overlap as seeds. Additionally, they extended various methods used to solve IM problems on ordinary networks to hypergraphs as baseline methods for comparison. Experimental results demonstrate that HADP outperforms other baseline methods in terms of effectiveness and efficiency. Despite these endeavors, a common reliance on heuristics rooted in node centrality or greedy strategies prevails in most of these methods, potentially falling short of meeting the accuracy and efficiency criteria for the HIM problem.

3. The Proposed Algorithm

3.1. SIS-Based Information Diffusion Model in Hypergraphs

Suo et al. explored the interplay of information dissemination processes and network structures on hypergraphs, proposing the potential transmission of information through reactive process (RP) or contact process (CP) strategies [34]. Their study introduced two diffusion models, aligned with these strategies, within the SIS epidemic framework. In the RP strategy, infected nodes have the potential to infect all neighboring nodes with a specified probability at each time step. In the CP strategy, activated nodes randomly choose a hyperedge they belong to and subsequently disseminate to all neighbors within the selected hyperedge, guided by a designated probability. Our contention is that the CP strategy more accurately mirrors the actual spread dynamics within social networks compared to the RP strategy. Notably, the CP strategy permits selective sharing of information within distinct groups, resembling real-world scenarios observed in social media platforms. Furthermore, the RP strategy restricts the spread dynamics, as even if multiple common hyperedges exist between an infected node and a susceptible node, the latter only has one chance to be infected by the former at a given time step. Conversely, the susceptible node should logically have a higher likelihood of propagation in such situations.

Hence, this paper adopts the susceptible–infected spreading model with contact process dynamics (SICP) to assess the efficacy of the proposed algorithm, given these considerations. The details are given in Algorithm 1 and Figure 1.

Assuming the initial time step is

t = 0

, and a node

v_{3}

is randomly designated to be in the infected state, with all other nodes remaining susceptible, by

t = 1

,

v_{3}

randomly chooses one hyperedge (

e_{1}

) from its affiliations and transmits the infection to its neighboring nodes within

e_{1}

, governed by probability

β

. This leads to the infection of node

v_{2}

, prompting its transition to the infected state. Moving to

t = 2

,

v_{3}

selects hyperedge

e_{2}

instead for propagation, successfully infecting nodes

v_{7}

and

v_{8}

. Meanwhile,

v_{2}

, confined to hyperedge

e_{1}

, fails to infect any additional nodes. Advancing to

t = 3

, nodes

v_{2}

,

v_{3}

,

v_{7}

, and

v_{8}

select hyperedges

e_{1}

,

e_{2}

,

e_{3}

, and

e_{4}

, respectively, for further propagation. Consequently, nodes

v_{4}

and

v_{6}

become infected as a result of this multi-hyperedge spreading dynamic.

Algorithm 1: The susceptible–infected spreading model with contact process dynamics (SICP)
Input: Output:	Seed node set $S$ Max time step $T$ Hypergraph $H (V, E)$ Spreading probability $β$ Infected nodes set $I$
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:	Initialization: $t \leftarrow 1$ , $I \leftarrow S$ while $t \leq T$ do: for $v_{i} i n I$ do: $E (v_{i})$ ← Hyperedges of node $v_{i}$ $e_{i}$ ← Randomly select from $E (v_{i})$ ${N (e}_{i})$ ← Nodes of hyperedge $e_{i}$ for $v_{j} i n {N (e}_{i}) \ I$ do: $R \leftarrow$ Generate a number between $[0, 1]$ randomly If $R \leq β$ : $I \leftarrow I \cup \{v_{j}\}$ end end end $t \leftarrow t + 1$ end

3.2. HEDV: A Function for Evaluating the Diffusion Capacity of Nodes on Hypergraphs

Evaluating the spreading influence of selected seed nodes on an ordinary network is a #P-hard problem. The conventional Monte Carlo method necessitates thousands or even millions of simulations to achieve a reasonably accurate estimation of the selected nodes’ diffusion capacity, leading to substantial resource and time consumption. To address this issue, Jiang et al. proposed the EDV function to approximate the diffusion capacity of seed nodes in the independent cascade (IC) spreading model [12]. The EDV function considers the local spreading process and calculates the expected number of infected neighboring nodes. It provides a good approximation of the diffusion capacity, especially when the spreading probability

β

is small. The EDV function is defined as follows:

E D V (S) = K + \sum_{v_{i} \in N_{s}} (1 - {(1 - p)}^{τ (v_{i})})

(1)

where

S

represents the seed node set,

K = |S|

, is the size of

S

, and

N_{s}

represents the direct neighbors of

S

(exclusive of nodes in

S

).

p

is the spreading probability in the IC model.

τ (v_{i}) = |S \cap N_{v_{i}}|

represents the number of seed nodes that are the direct neighbors of

v_{i}

.

Within the SICP model applied to hypergraphs, activated nodes undergo a distinctive propagation process: they randomly choose one hyperedge they participate in and disseminate to all uninfected nodes within that hyperedge with a specific probability. This mechanism markedly differs from the conventional propagation observed in regular networks. To accommodate this unique behavior, we introduce a novel evaluation function termed HEDV, building upon the foundation of the existing EDV function.

H E D V (S) = K + \sum_{v_{i} \in N_{s}} (1 - \prod_{v_{j} \in S \cap N_{v_{i}}} (1 - P_{j, i} \cdot \frac{n_{(v_{i}, v_{j})}}{n_{(v_{j})}}))

(2)

where

v_{j}

represents the seed nodes that are also the direct neighbors of

v_{i}

.

P_{j, i}

represents the probability that

v_{i}

is infected by

v_{j}

, which in SCIP is the constant spreading probability

β

.

n_{(v_{j})}

represents the number of hyperedges that involve

v_{j}

and

n_{(v_{i}, v_{j})}

represents the number of hyperedges that simultaneously involve both

v_{j}

and

v_{i}

.

n_{(v_{i}, v_{j})} / n_{(v_{j})}

represents the probability that

v_{j}

selects hyperedges that involve

v_{i}

to spread in. The HEDV is proposed, considering the unique topology and the SICP model on hypergraphs. The probability of

v_{i}

being infected by seed node

v_{j}

in the proposed HEDV is the product of spreading probability

β

and

n_{(v_{i}, v_{j})} / n_{(v_{j})}

, rather than the fixed constant

β

in traditional EDV, which enables HEDV to make more accurate assessments.

To illustrate the computation process of HEDV, let us consider nodes

v_{5}

and

v_{8}

as seed nodes and refer to the hypergraph depicted in Figure 2b. Additionally, we establish a corresponding ordinary network for comparative analysis in Figure 2a, wherein the nodes within each hyperedge are interconnected. For ease of computation, we fix the maximum time step

T = 1

and set the spreading probability

β

to 0.5. Subsequently, we compute the projected extent of influence spreading utilizing both the EDV and HEDV functions.

When using EDV, if a node is a neighbor of any seed node, it has the possibility of

β

of being infected by the latter. Therefore, calculating EDV on hypergraphs in Figure 2b or the corresponding ordinary network in Figure 2a is the same. The value of

τ (v_{i})

for nodes

v_{3}

,

v_{4}

,

v_{6}

, and

v_{7}

, which are neighbors of both seed nodes

v_{5}

and

v_{8}

, is equal to 2. However, for node

v_{9}

, whose neighbors only include

v_{8}

, the value of

τ (v_{i})

is equal to 1. Nodes

v_{1}

and

v_{2}

have no chance of being infected. Therefore,

E D V (v_{5}, v_{8}) = 2 + 4 * (1 - (1 - β)^{\land} 2) + 1 * β = 5.5

.

For HEDV, the probability of a node

v_{i}

being infected by seed node

v_{j}

is the product of spreading probability

β

and

n_{(v_{i}, v_{j})} / n_{(v_{j})}

, which is shown more clearly in the hypergraph in Figure 2b. Seed nodes

v_{5}

and

v_{8}

can select hyperedges

{e_{2}, e_{4}}

and

{e_{2}, e_{3}}

, respectively, for propagation. The probability of seed nodes

v_{5}

and

v_{8}

selecting hyperedges containing nodes

v_{3}

,

v_{4}

,

v_{6}

, and

v_{7}

for propagation is equal to 0.5. For node

v_{9}

, there is a possibility of being infected only when seed node

v_{8}

selects hyperedge

e_{3}

for propagation. Therefore,

H E D V (v_{5}, v_{8}) = 2 + 4 * (1 - (1 - p * 0.5)^{\land} 2) + 1 * p * 0.5 = 4

.

We conduct 2000 Monte Carlo simulations on the hypergraph in Figure 2b, with a maximum time step of 1 and a spreading probability

β

of 0.5. The average scale of influence spreading is found to be 4.0013, which proves the accuracy of the HEDV function. Additionally, Monte Carlo takes 4.074 s, while HEDV takes 0.043 s, demonstrating the efficiency of the HEDV function.

3.3. HEDV-Greedy: Algorithm for Selecting the Seed Nodes with Maximum Influence

Kempe et al. presented the greedy method as a solution to the IM problem on regular networks [11]. This method incrementally includes nodes with the maximum marginal benefit into the seed set. Despite its notable accuracy and stability, the greedy method necessitates extensive Monte Carlo simulations for evaluating marginal benefits at each iteration, leading to considerable time overhead. In response to this, we introduce a novel approach named HEDV-greedy. This method leverages the HEDV function, as previously proposed, to offer an approximate assessment of marginal benefits, which is shown in Algorithm 2. It systematically adds nodes with the highest HEDV values to the seed set, aiming to mitigate the computational demands associated with the greedy method.

Algorithm 2: HEDV-Greedy
Input: Output:		Size of seed nodes $K$ Hypergraph $H (V, E)$ Seed node set $S$
1: 2: 3: 4: 5: 6:	Initialization: $S_{0} \leftarrow \emptyset$ , $k \leftarrow 1$ while $\|S\| \leq K$ do: $v_{k} \leftarrow {a r g m a x}_{v \in {V \ S}_{k - 1}} {H E D V (S_{k - 1} \cup {v}) - H E D V (S_{k - 1})}$ $S_{k} \leftarrow S_{k - 1} \cup {v_{k}}$ $k \leftarrow k + 1$ end

HEDV-greedy: We denote

S_{k - 1}

as the seed nodes that are selected at round

k - 1

. The expected influence spread by

S_{k - 1}

is given by

H E D V (S_{k - 1})

. The marginal benefit of candidate node

v

is given by

H E D V (S_{k - 1} \cup {v}) - H E D V (S_{k - 1})

. At the beginning of the algorithm,

S

is set to be empty. At round

k

, we calculate the marginal benefit

H E D V (S_{k - 1} \cup {v}) - H E D V (S_{k - 1})

for each candidate node

v

, where

v \in {V \ S}_{k - 1}

. Node

v_{k}

with the maximum marginal benefit is inserted into the seed set, i.e.,

S_{k} \leftarrow S_{k - 1} \cup {v_{k}}

. The algorithm is terminated until the seeds set contains

K

nodes. The HEDV-greedy algorithm efficiently selects nodes with the maximum marginal benefit by utilizing the HEDV function. Such a strategy avoids extensive Monte Carlo simulations, resulting in improved search efficiency while ensuring the preservation of optimization quality.

4. Experiment

The efficacy and efficiency of the proposed algorithm were assessed through comprehensive experiments involving eight hypergraphs generated from real-world data. All algorithms were implemented in Python and executed on a Windows 11 PC platform, featuring an Intel Core i9-13900HX CPU (2.20 GHz) and 16 GB of memory.

4.1. Datasets

We utilized eight real-world hypergraph datasets sourced from diverse domains such as mathematics, social networks, business, medicine, biochemistry, and politics [30,43,44]. These datasets exhibit variations in size, with the number of nodes spanning from 143 to 1668, and the number of hyperedges ranging from 315 to 2351. A summary of the topological characteristics of these datasets is presented in Table 1. Further details are elaborated below, and the abbreviated names in parentheses are employed to reference the respective datasets in the subsequent sections.

Cat-edge-algebra-questions (algebra) and cat-edge-geometry-questions (geometry): The two datasets provide a description of the interaction patterns among different users on the mathematical website named MathOverflow. Cat-edge-madison-restaurant-reviews (restaurant): The dataset is derived from Yelp, an online platform for business reviews and recommendations. Email-Enron (email): The dataset contains email communication data from 150 senior executives at the Enron Corporation. It was publicly released online during the investigation conducted by the Federal Energy Regulatory Commission. Senate-committees (committees): This dataset depicts the relationships among members of the United States Senate, derived from congressional data compiled by Charles Stewart and Jonathan Woon. NDC-classes (NDC): Under the Drug Listing Act of 1972, the U.S. Food and Drug Administration disseminates data concerning all commercially available drugs regulated by the agency, resulting in the creation of the National Drug Code (NDC) Directory. Diseasome (diseasome): This dataset comprises a collection of diseases and their corresponding associated genes. iAF1260b (iAF1260b): This dataset illustrates the interrelationships between reactions and metabolics.

4.2. Algorithms for Comparison

At present, there are limited methods proposed for addressing the HIM problem. Existing algorithms often directly adapt from approaches designed for ordinary networks, neglecting the distinctive topological features of hypergraphs. In order to thoroughly assess the effectiveness of the proposed algorithms, we chose eight additional state-of-the-art methods from categories including greedy algorithms, heuristic solutions, and methods based on reverse influence sampling [42]. These selected methods underwent experimentation on the same set of datasets for comprehensive comparative analysis. Details are as follows:

General-greedy (greedy): An extension of the greedy algorithm, which was first proposed to solve the IM problem on an ordinary network. This method iteratively adds nodes that have the maximum marginal benefit for the current seeds, which are evaluated by Monte Carlo simulations. The number of Monte Carlo simulations was set to 500.

Hyper reverse influence sampling (H-RIS): An extension of the reverse influence sampling (RIS) algorithm, which solves the IM problem on ordinary networks. The algorithm samples RRS and selects the node with the highest occurrence frequency as the seed node. RRS is generated by randomly selecting nodes, performing reverse propagation on them, and collecting the encountered nodes.

Hyper single degree pruning (HSDP): A simplified degree discount heuristic algorithm that iteratively adds nodes with the highest degree to the seed node set. To minimize influence overlap, the degree of neighbors of the seed nodes is uniformly reduced as a punishment.

Hyper adaptive degree pruning (HADP): An adaptive degree discount heuristic algorithm, which is an improved version of HSDP. The algorithm applies different levels of punishment based on the number of neighbors a punished node has in the seeds. Nodes with more neighbors in the seeds receive a stronger punishment.

Hyper collective influence (H-CI): An extension of the collective influence (CI) method, which also solves the IM problem on ordinary networks. The algorithm operates based on the degree of the node itself and the node whose distance from it is l, where l is an adjustable parameter, which is set to 1 and 2 in subsequent work, referred to as H-CI (l = 1) and H-CI (l = 2), respectively. H-CI calculates the CI of all nodes and selects the seed nodes with the highest CI.

Hyper degree (H-degree): An hyperdegree-based heuristic algorithm, which calculates the hyperdegree of all nodes in the hypergraph and arranges them in descending order. Then, the K nodes with the highest rankings are chosen as the seed nodes.

Degree (degree): A degree-based heuristic algorithm, which is similar to H-degree. The algorithm calculates the degree of all nodes in the hypergraph and arranges them in descending order. Then, the K nodes with the highest rankings are chosen as the seed nodes.

4.3. Effectiveness on Real-World Datasets

To assess the performance of our proposed algorithm, we conducted comparisons with other baselines. Initially, we sequentially applied each algorithm to identify the seed node sets and recorded the corresponding time expenses. Subsequently, we utilized the Monte Carlo method to evaluate the propagation capability of each seed set. Finally, based on the propagation capability of different seed sets, we can infer the performance of the respective algorithms. The scale of influence spreading for each seed set is recorded as the average of 2000 Monte Carlo simulations. The stability of the proposed algorithm is evaluated by varying the spreading probability

β

and maximum time step

T

, with the seed set size

K

ranging from 1 to 30.

When

β = 0.01

and

T = 25

, the propagation capability of the seed sets selected by different algorithms is illustrated in Figure 3, with the x-axis representing the seed set size

K

and the y-axis representing the number of infected nodes. Table 2 offers the normalized area under the curve (AUC) obtained by each algorithm shown in Figure 3. It can be found that the greedy method, employing 500 Monte Carlo simulations to evaluate the marginal benefit of adding nodes, considering the global network topology and propagation process, achieves the best solution. The HEDV-greedy method enhances efficiency by evaluating the marginal benefit of nodes using HEDV, taking into account the local network topology and propagation process within the nodes’ neighborhood, outperforming other state-of-the-art algorithms across all datasets. The HEDV-greedy method achieves a maximum improvement of 56.8% on the diseasome dataset, and, respectively, obtains improvements of 20.2%, 23.6%, 23.2%, and 20.9% on the email, iAF1260b, NDC, and restaurant datasets. On average, across the eight datasets, it enhances the performance by 21.68% compared to the best-performing algorithm among the baselines.

Notably, greedy and HEDV-greedy exhibit particularly significant advantages compared to other baselines when the seed set size

K

is small. For instance, on the algebra, geometry, iAF1260b, and restaurant datasets, greedy and HEDV-greedy select highly influential seed nodes when

K = 1 - 15

. This is attributed to their consideration of both network topology and the actual propagation process, unlike heuristic methods relying solely on node degree/hyperdegree or other centrality measures. This emphasizes that centrality can only offer an approximate estimate and does not fully capture a node’s true propagation capability.

Moreover, across all datasets, HADP and HSDP outperform degree, suggesting that adaptively reducing the degree of neighboring nodes near seed nodes helps diminish influence overlap and enhance the performance of solving the HIM problem. Additionally, HADP consistently outperforms HSDP, indicating that applying different degrees of punishment for penalized nodes is more effective. Furthermore, we observe that degree consistently outperforms H-degree, indicating that the number of connected nodes remains the primary factor in determining node spreading capacity. For example, a node involved in numerous hyperedges but with a low number of nodes per hyperedge may not exhibit strong propagation capability.

To better illustrate the discrepancy between greedy and HEDV-greedy, we also present the results of 2000 simulations of the two algorithms on all datasets, and include HADP as a point of reference, as shown in Figure 4. HEDV-greedy, greedy, and HADP are represented by solid lines in red, blue, and green, respectively, to indicate their average spreading scale in multiple propagation simulations. The lighter colored lines represent the simulations of each algorithm, forming three backgrounds with different colors. Figure 4 indicates that the red and blue backgrounds largely overlap, suggesting that HEDV-greedy performs very close to greedy, and even outperforms greedy when T is small. Both algorithms surpass HADP significantly.

Figure 5 depicts the distribution of the spreading scale at different time steps

(T = 1 - 25)

for various algorithms when

β = 0.01

and

K = 30

. The spreading scale for each algorithm initially increases linearly with the number of propagation iterations and then levels off. This pattern arises because, in the early stages of propagation, most nodes are not yet infected, allowing for the seed nodes to propagate more effectively. However, as the propagation progresses, a majority of the network nodes become infected, leading to a decrease in the number of newly infected nodes and a leveling off of the spreading scale. Additionally, the different algorithms exhibit varying slopes of linear increase in the early stages, with some algorithms demonstrating rapid growth and others proceeding at a slower pace, resulting in a lower overall spreading scale by the end of

T = 25

. This underscores the significance of selected seed nodes being able to propagate quickly and widely in the early stages, influencing the final spreading scale.

4.4. Efficiency

Table 3 showcases the time expenditure for selecting the seed node set (

β = 0.01, T = 25, K = 30

). While the Greedy method achieves the highest spreading scale, it incurs significant time overhead, taking several days to select nodes. H-RIS and HCI (l = 2) also exhibit high computational complexity. In contrast, the H-degree and degree methods have the lowest time expenditure, followed by HADP and HSDP. However, their AUCs are relatively low. In comparison, HEDV-greedy substantially reduces running time by at least two orders of magnitude compared to the traditional greedy method, while surpassing all other baselines.

4.5. Parameters Sensitivities

In this subsection, we investigate the sensitivity of the proposed method to the parameters about the propagation probability

β

and time steps

T

in order to examine the robustness of the method. Table 4, Table 5 and Table 6, respectively, present the AUCs under the different conditions of

T = 35, β = 0.005

;

T = 20, β = 0.02

; and

T = 30, β = 0.015

. The results illustrate that the HEDV-greedy method consistently outperforms all other baselines across various parameter configurations. In comparison with the best-performing baseline, the performance of HEDV-greedy shows improvements of 25.8%, 16.8%, and 13.6% under three distinct parameter settings, respectively. Our experiment also showed that HEDV-greedy outperformed the traditional greedy method under the parameter setting of

T = 35

and

β = 0.005

. HEDV exhibits greater accuracy in assessing node propagation capacity, particularly when

β

is small. The above conclusions indicate that the HEDV-greedy method is robust to variations in propagation parameters.

4.6. Non-Parametric Test

To further scrutinize the simulation results, we conducted two non-parametric tests: the Wilcoxon signed-rank test and the Friedman test. The Wilcoxon signed-rank test compares the medians of two independent or paired samples, while the Friedman test compares the medians of three or more related samples [45]. These tests allow for us to ascertain if there are significant differences between the proposed algorithms and the benchmark algorithms. The results of these tests are presented in Table 7 and Table 8.

In the Wilcoxon signed-rank test, for each dataset, the simulation results of each algorithm can be paired with those of HEDV-greedy, forming two paired samples of size

K = 30

. The absolute differences between the two samples are ranked, and the ranks are assigned with the corresponding signs of the differences to obtain positive ranks (R+) and negative ranks (R−). The signed ranks are then used as the test statistic to determine if there is a significant difference in medians between the two samples. The results across all datasets are shown in Table 7 (

β = 0.01, T = 25

). The P-values indicate the probability of the null hypothesis (no significant difference between the two algorithms) being true, with smaller P-values suggesting a higher likelihood of significant differences between the algorithms. Table 7 demonstrates that across all datasets, HEDV-greedy consistently exhibits significant superiority over all baselines with a p-value of 0.

In the Friedman test, for each dataset, the simulation results of all nine algorithms are ranked under the same seed size

K = 1 - 30

, forming a total of 30 rankings. Subsequently, the average rank is calculated for each algorithm, representing their Friedman rank. The Friedman ranks are used as the test statistic to determine if there is a significant difference in medians among the algorithms. The results of the Friedman test on the datasets are shown in Table 8. HEDV-greedy consistently demonstrates a significant advantage over other baselines, aligning with the results of the Wilcoxon test. Meanwhile, the performances of HADP and HSDP consistently rank second and third. Degree and H-RIS follow, while the H-CI and H-degree methods demonstrate the lowest effectiveness.

4.7. Comprehensive Analysis

Table 9 provides a detailed description of the influence evaluation methods and seed selection strategies employed by algorithms of different categories, including greedy-based algorithms, heuristic solutions, and the proposed HEDV-greedy. A further analysis of the advantages and disadvantages of these strategies was conducted. Specifically, greedy-based algorithms utilize Monte Carlo simulations, offering high accuracy but requiring significant time overhead. Heuristic solutions assess node centrality for faster speed yet struggle to meet accuracy requirements. In contrast, the HEDV function can deliver relatively accurate results within a shorter timeframe. Regarding seed node selection, heuristic solutions directly choose the top-K nodes as seeds, leading to significant overlap in influence. On the other hand, greedy strategies iteratively select nodes with maximum marginal value (MV), greatly enhancing the accuracy of the analysis. The innovative HEDV-greedy approach pioneers a fusion methodology by integrating the heuristic information from the HEDV function with the greedy selection strategy, combining the strengths of both techniques to yield excellent results within a brief period.

Within this framework, the effectiveness of algorithms in addressing HIM problem primarily depends on two aspects: influence evaluation methods and seed selection strategies. Fast and accurate node influence evaluation methods, along with seed selection strategies that are able to minimize influence overlap to the greatest extent, are crucial to the algorithm’s performance.

5. Conclusions

Until now, scholars have presented a range of methodologies and perspectives to address the IM problem in conventional networks. These include algorithms based on greedy strategies, heuristic solutions, community-based approaches, meta-heuristic optimization algorithms, and methods utilizing reverse influence sampling. Despite these efforts, the HIM problem remains a formidable challenge with limited research focus. This study introduces an evaluation function, termed HEDV, to estimate the influence propagation scale of seed nodes on hypergraphs. The integration of HEDV into a greedy strategy yields a method named HEDV-greedy. In HEDV-greedy, HEDV serves as a metric for the marginal benefit of candidate nodes during seed node selection. To the best of our knowledge, HEDV is the pioneering quantitative function utilized for evaluating node influence on hypergraphs within this domain. Furthermore, HEDV-greedy represents the first integrated approach that combines a novel influence evaluation method with a greedy selection strategy. To validate the efficacy of the proposed algorithm, we conducted experiments comparing it with various state-of-the-art algorithms, including those from previous researchers and extensions of ordinary network algorithms. The experiments involved hypergraphs generated from real data in diverse domains, with results visualized and analyzed using the AUCs and non-parametric tests. The HEDV-greedy algorithm showcases a significant reduction in time complexity by two orders of magnitude in comparison to the traditional greedy method. Furthermore, HEDV-greedy surpasses other cutting-edge algorithms in performance across all datasets. Particularly, in scenarios with lower propagation probability, HEDV-greedy demonstrates an average enhancement in solution accuracy of 25.76%. Simultaneously, two non-parametric testing methods also confirmed that HEDV-greedy significantly outperforms all other baselines by a clear margin.

The noteworthy aspect of HEDV-greedy is the incorporation of heuristic information into a greedy strategy. Experimental results underscore that such approaches can markedly reduce time complexity while maintaining a specified level of accuracy, offering fresh perspectives on addressing the IM problem in large-scale networks. Additionally, the scarcity of methods utilizing meta-heuristic optimization algorithms for solving the HIM problem is acknowledged, primarily due to the absence of quick indicators for evaluating influence size in a given seed set on hypergraphs. The proposed approximate function, HEDV, not only addresses this gap but also opens avenues for designing various meta-heuristic optimization algorithms for HIM. As research progresses, exploration in this avenue aims to develop methods that are increasingly accurate, less time-consuming, and more robust. In the future, we will propose more efficient algorithms and apply them to real-world engineering problems to assess their practical utility.

Author Contributions

Conceptualization, H.W.; Methodology, H.W. and Q.P.; Validation, H.W. and Q.P.; Writing—original draft, H.W.; Writing—review & editing, Q.P. and J.T.; Visualization, H.W.; Supervision, Q.P. and J.T.; Project administration, J.T.; Funding acquisition, J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China, grant number 62073330 and grant number 72101265.

Data Availability Statement

All data generated or analyzed during this study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dong, C.; Xu, G.; Yang, P.; Meng, L. TSIFIM: A Three-Stage Iterative Framework for Influence Maximization in Complex Networks. Expert Syst. Appl. 2023, 212, 118702. [Google Scholar] [CrossRef]
Song, G.; Zhou, X.; Wang, Y.; Xie, K. Influence Maximization on Large-Scale Mobile Social Network: A Divide-and-Conquer Method. IEEE Trans. Parallel Distrib. Syst. 2015, 26, 1379–1392. [Google Scholar] [CrossRef]
Domingos, P.; Richardson, M. Mining the Network Value of Customers. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001; pp. 57–66. [Google Scholar]
Pei, S.; Makse, H.A. Spreading Dynamics in Complex Networks. J. Stat. Mech. 2013, 2013, P12002. [Google Scholar] [CrossRef]
Li, Y.; Li, L.; Liu, Y.; Li, Q. MAHE-IM: Multiple Aggregation of Heterogeneous Relation Embedding for Influence Maximization on Heterogeneous Information Network. Expert Syst. Appl. 2022, 202, 117289. [Google Scholar] [CrossRef]
Poulik, S.; Ghorai, G.; Xin, Q. Explication of Crossroads Order Based on Randic Index of Graph with Fuzzy Information. Soft Comput. 2024, 28, 1851–1864. [Google Scholar] [CrossRef]
Mathew, S.; Mordeson, J.N. Connectivity Concepts in Fuzzy Incidence Graphs. Inf. Sci. 2017, 382–383, 326–333. [Google Scholar] [CrossRef]
Das, S.; Poulik, S.; Ghorai, G. Picture Fuzzy ϕ-Tolerance Competition Graphs with Its Application. J. Ambient. Intell. Humaniz. Comput. 2024, 15, 547–559. [Google Scholar] [CrossRef]
Kak, S. Power Series Models of Self-Similarity in Social Networks. Inf. Sci. 2017, 376, 31–38. [Google Scholar] [CrossRef]
Bretto, A. Hypergraph Theory: An Introduction; Mathematical Engineering; Springer: Cham, Switzerland; Heidelberg, Germany; New York, NY, USA; Dordrecht, The Netherlands; London, UK, 2013; ISBN 978-3-319-00079-4. [Google Scholar]
Kempe, D.; Kleinberg, J.; Tardos, É. Maximizing the Spread of Influence through a Social Network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar]
Jiang, Q.; Song, G.; Gao, C.; Wang, Y.; Si, W.; Xie, K. Simulated Annealing Based Influence Maximization in Social Networks. Proc. AAAI Conf. Artif. Intell. 2011, 25, 127–132. [Google Scholar] [CrossRef]
Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; Glance, N. Cost-Effective Outbreak Detection in Networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 420–429. [Google Scholar]
Goyal, A.; Lu, W.; Lakshmanan, L.V.S. CELF++: Optimizing the Greedy Algorithm for Influence Maximization in Social Networks. In Proceedings of the 20th International Conference Companion on World Wide Web, Hyderabad, India, 28 March–1 April 2011; pp. 47–48. [Google Scholar]
Chen, W.; Wang, Y.; Yang, S. Efficient Influence Maximization in Social Networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 199–208. [Google Scholar]
Estrada, E.; Rodríguez-Velázquez, J.A. Subgraph Centrality in Complex Networks. Phys. Rev. E 2005, 71, 056103. [Google Scholar] [CrossRef]
Freeman, L.C. Centrality in Social Networks Conceptual Clarification. Soc. Netw. 1978, 1, 215–239. [Google Scholar] [CrossRef]
Borgs, C.; Brautbar, M.; Chayes, J.; Lucier, B. Maximizing Social Influence in Nearly Optimal Time. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, Portland, OR, USA, 5–7 January 2014; pp. 946–957. [Google Scholar]
Tang, Y.; Xiao, X.; Shi, Y. Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; pp. 75–86. [Google Scholar]
Tang, Y.; Shi, Y.; Xiao, X. Influence Maximization in Near-Linear Time: A Martingale Approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, VIC, Australia, 31 May–4 June 2015; pp. 1539–1554. [Google Scholar]
Nguyen, H.T.; Thai, M.T.; Dinh, T.N. Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-Scale Networks. In Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016; pp. 695–710. [Google Scholar]
Wang, Y.; Cong, G.; Song, G.; Xie, K. Community-Based Greedy Algorithm for Mining Top-K Influential Nodes in Mobile Social Networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; pp. 1039–1048. [Google Scholar]
Bozorgi, A.; Haghighi, H.; Sadegh Zahedi, M.; Rezvani, M. INCIM: A Community-Based Algorithm for Influence Maximization Problem under the Linear Threshold Model. Inf. Process. Manag. 2016, 52, 1188–1199. [Google Scholar] [CrossRef]
Rahimkhani, K.; Aleahmad, A.; Rahgozar, M.; Moeini, A. A Fast Algorithm for Finding Most Influential People Based on the Linear Threshold Model. Expert Syst. Appl. 2015, 42, 1353–1361. [Google Scholar] [CrossRef]
Gong, M.; Yan, J.; Shen, B.; Ma, L.; Cai, Q. Influence Maximization in Social Networks Based on Discrete Particle Swarm Optimization. Inf. Sci. 2016, 367–368, 600–614. [Google Scholar] [CrossRef]
Estrada, E.; Rodríguez-Velázquez, J.A. Subgraph Centrality and Clustering in Complex Hyper-Networks. Phys. A Stat. Mech. Its Appl. 2006, 364, 581–594. [Google Scholar] [CrossRef]
Ma, N.; Liu, Y. SuperedgeRank Algorithm and Its Application in Identifying Opinion Leader of Online Public Opinion Supernetwork. Expert Syst. Appl. 2014, 41, 1357–1368. [Google Scholar] [CrossRef]
Contisciani, M.; Battiston, F.; De Bacco, C. Inference of Hyperedges and Overlapping Communities in Hypergraphs. Nat. Commun. 2022, 13, 7229. [Google Scholar] [CrossRef]
Ruggeri, N.; Contisciani, M.; Battiston, F.; De Bacco, C. Community Detection in Large Hypergraphs. Sci. Adv. 2023, 9, eadg9159. [Google Scholar] [CrossRef]
Benson, A.R.; Abebe, R.; Schaub, M.T.; Jadbabaie, A.; Kleinberg, J. Simplicial Closure and Higher-Order Link Prediction. Proc. Natl. Acad. Sci. USA 2018, 115, E11221–E11230. [Google Scholar] [CrossRef]
Zhu, J.; Zhu, J.; Ghosh, S.; Wu, W.; Yuan, J. Social Influence Maximization in Hypergraph in Social Networks. IEEE Trans. Netw. Sci. Eng. 2019, 6, 801–811. [Google Scholar] [CrossRef]
Du, M. Research on Information Dissemination Model of Social Network Services Based on Probabilistic Hyper-Graph. Int. J. Signal Process. Image Process. Pattern Recognit. 2015, 8, 267–274. [Google Scholar] [CrossRef]
Bodó, Á.; Katona, G.Y.; Simon, P.L. SIS Epidemic Propagation on Hypergraphs. Bull. Math. Biol. 2016, 78, 713–735. [Google Scholar] [CrossRef] [PubMed]
Suo, Q.; Guo, J.-L.; Shen, A.-Z. Information Spreading Dynamics in Hypernetworks. Phys. A Stat. Mech. Its Appl. 2018, 495, 475–487. [Google Scholar] [CrossRef]
Jiang, X.; Wang, Z.; Liu, W. Information Dissemination in Dynamic Hypernetwork. Phys. A Stat. Mech. Its Appl. 2019, 532, 121578. [Google Scholar] [CrossRef]
Antelmi, A.; Cordasco, G.; Spagnuolo, C.; Szufel, P. Information Diffusion in Complex Networks: A Model Based on Hypergraphs and Its Analysis. In Algorithms and Models for the Web Graph; Kamiński, B., Prałat, P., Szufel, P., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12091, pp. 36–51. ISBN 978-3-030-48477-4. [Google Scholar]
Wang, J.; Wang, Z.; Yu, P.; Xu, Z. The Impact of Different Strategy Update Mechanisms on Information Dissemination under Hyper Network Vision. Commun. Nonlinear Sci. Numer. Simul. 2022, 113, 106585. [Google Scholar] [CrossRef]
Xiao, Q. A Method for Measuring Node Importance in Hypernetwork Model. Res. J. Appl. Sci. Eng. Technol. 2013, 5, 568–573. [Google Scholar] [CrossRef]
Kapoor, K.; Sharma, D.; Srivastava, J. Weighted Node Degree Centrality for Hypergraphs. In Proceedings of the 2013 IEEE 2nd Network Science Workshop (NSW), West Point, NY, USA, 29 April–1 May 2013; pp. 152–155. [Google Scholar]
Zheng, H.; Wang, N.; Wu, J. Non-Submodularity and Approximability: Influence Maximization in Online Social Networks. In Proceedings of the 2019 IEEE 20th International Symposium on “A World of Wireless, Mobile and Multimedia Networks” (WoWMoM), Washington, DC, USA, 10–12 June 2019; pp. 1–9. [Google Scholar]
Antelmi, A.; Cordasco, G.; Spagnuolo, C.; Szufel, P. Social Influence Maximization in Hypergraphs. Entropy 2021, 23, 796. [Google Scholar] [CrossRef] [PubMed]
Xie, M.; Zhan, X.-X.; Liu, C.; Zhang, Z.-K. An Efficient Adaptive Degree-Based Heuristic Algorithm for Influence Maximization in Hypergraphs. Inf. Process. Manag. 2023, 60, 103161. [Google Scholar] [CrossRef]
Goh, K.-I.; Cusick, M.E.; Valle, D.; Childs, B.; Vidal, M.; Barabási, A.-L. The Human Disease Network. Proc. Natl. Acad. Sci. USA 2007, 104, 8685–8690. [Google Scholar] [CrossRef]
King, Z.A.; Lu, J.; Dräger, A.; Miller, P.; Federowicz, S.; Lerman, J.A.; Ebrahim, A.; Palsson, B.O.; Lewis, N.E. BiGG Models: A Platform for Integrating, Standardizing and Sharing Genome-Scale Models. Nucleic Acids Res. 2016, 44, D515–D522. [Google Scholar] [CrossRef]
Pan, Q.; Tang, J.; Lao, S. EDOA: An Elastic Deformation Optimization Algorithm. Appl. Intell. 2022, 52, 17580–17599. [Google Scholar] [CrossRef]

Figure 1. The susceptible–infected spreading model with contact process dynamics.

Figure 2. Sample hypergraph and the corresponding ordinary network.

Figure 3. Influence spreading scale with various

K (K = 1 - 30)

for each algorithm in hypergraphs: (a) algebra; (b) diseasome; (c) email; (d) geometry; (e) iAF1260b; (f) NDC; (g) restaurant; (h) committees. We set

β = 0.01

;

T = 25

.

Figure 3. Influence spreading scale with various

K (K = 1 - 30)

for each algorithm in hypergraphs: (a) algebra; (b) diseasome; (c) email; (d) geometry; (e) iAF1260b; (f) NDC; (g) restaurant; (h) committees. We set

β = 0.01

;

T = 25

.

Figure 4. Influence spreading scale of multiple simulations with various

T (T = 1 - 25)

for HEDV-greedy, greedy, and HADP in hypergraphs: (a) algebra; (b) diseasome; (c) email; (d) geometry; (e) iAF1260b; (f) NDC; (g) restaurant; (h) committees. We set

β = 0.01

;

K = 30

.

Figure 4. Influence spreading scale of multiple simulations with various

T (T = 1 - 25)

for HEDV-greedy, greedy, and HADP in hypergraphs: (a) algebra; (b) diseasome; (c) email; (d) geometry; (e) iAF1260b; (f) NDC; (g) restaurant; (h) committees. We set

β = 0.01

;

K = 30

.

Figure 5. Influence spreading scale with various

T (T = 1 - 25)

for each algorithm in hypergraphs: (a) algebra; (b) diseasome; (c) email; (d) geometry; (e) iAF1260b; (f) NDC; (g) restaurant; (h) committees. We set

β = 0.01

;

K = 30

.

Figure 5. Influence spreading scale with various

T (T = 1 - 25)

for each algorithm in hypergraphs: (a) algebra; (b) diseasome; (c) email; (d) geometry; (e) iAF1260b; (f) NDC; (g) restaurant; (h) committees. We set

β = 0.01

;

K = 30

.

Table 1. Topological properties of the datasets.

Hypergraphs	$n$	$m$	$〈d e g〉$	$〈d^{H}〉$	$〈d^{E}〉$	$c$	$〈d〉$	$ξ$	$ρ$
Email	143	1542	25.17	32.5	3.01	0.59	2.07	4	0.18
Committees	282	315	100.77	19.26	17.24	0.68	1.69	3	0.36
Algebra	423	1268	78.9	19.53	6.52	0.8	1.95	5	0.19
Diseasome	516	903	4.6	3	1.72	0.64	6.5	15	0.01
Restaurant	565	601	79.75	8.14	7.66	0.54	1.98	5	0.14
Geometry	580	1193	164.79	21.53	10.47	0.82	1.75	4	0.28
NDC	1161	1088	10.72	5.55	5.92	0.61	3.5	9	0.01
iAF1260b	1668	2351	13.26	5.46	3.87	0.56	2.67	7	0.01

n represents the number of nodes and

m

represents the number of hyperedges.

〈d e g〉

is the average degree of nodes, while

〈d^{H}〉

represents the average hyperdegree.

〈d^{E}〉

is the average number of nodes contained in a hyperedge. Additionally,

c

is the clustering coefficient,

〈d〉

is the average shortest path length between two nodes,

ξ

represents the diameter, and

ρ

is the edge density of the corresponding ordinary network derived from the hypergraph.

Table 2. AUC scores obtained by each of the curves shown in Figure 3 (

T = 25, β = 0.01

). The maximal AUC value among all methods is shown in bold and the maximal AUC value among baselines is shown with *. The performance improvement is denoted with ↑.

Table 2. AUC scores obtained by each of the curves shown in Figure 3 (

T = 25, β = 0.01

). The maximal AUC value among all methods is shown in bold and the maximal AUC value among baselines is shown with *. The performance improvement is denoted with ↑.

Datasets	Our Method	Baselines							Boost
	HEDV-Greedy	HADP	HSDP	H-RIS	H-CI (l = 1)	H-CI (l = 2)	H-Degree	Degree	21.7%↑
Algebra	0.1600	0.1374 *	0.1128	0.1084	0.0791	0.0767	0.0779	0.0846	16.4%↑
Diseasome	0.1740	0.1030	0.1026	0.1110 *	0.0820	0.0811	0.0774	0.0969	56.8%↑
Email	0.1375	0.1144 *	0.1140	0.1139	0.0954	0.0980	0.0975	0.0995	20.2%↑
Geometry	0.1206	0.1134 *	0.1107	0.1115	0.1042	0.1042	0.1042	0.1046	6.30%↑
iAF1260b	0.1821	0.1473 *	0.1071	0.0759	0.0754	0.0722	0.0736	0.0816	23.6%↑
NDC	0.1448	0.1175	0.1216 *	0.0934	0.0902	0.0906	0.0906	0.0962	23.2%↑
Restaurant	0.1349	0.1116 *	0.1058	0.1084	0.1003	0.0959	0.0984	0.1055	20.9%↑
Committees	0.1216	0.1146 *	0.1102	0.1140	0.1033	0.1020	0.1028	0.1059	6.10%↑

Table 3. Time cost for each algorithm. The running times are given by the average over 10 realizations; the seed set size is set as

K = 30

(

T = 25, β = 0.01

). The unit of time is seconds (s).

Table 3. Time cost for each algorithm. The running times are given by the average over 10 realizations; the seed set size is set as

K = 30

(

T = 25, β = 0.01

). The unit of time is seconds (s).

Datasets	Running Time (Seconds)
	HEDV-Greedy	Greedy	HADP	HSDP	H-RIS	H-CI (l = 1)	H-CI (l = 2)	H-Degree	Degree
Algebra	1.07 × 10³	6.72 × 10⁵	8.85 × 10⁰	8.15 × 10⁻¹	1.81 × 10²	5.93 × 10⁻¹	2.26 × 10²	2.20 × 10⁻¹	7.92 × 10⁻¹
Diseasome	3.02 × 10¹	1.88 × 10⁵	4.50 × 10⁻¹	3.16 × 10⁻¹	7.80 × 10¹	1.66 × 10⁻¹	1.64 × 10²	2.52 × 10⁻¹	5.16 × 10⁻¹
Email	1.82 × 10²	2.51 × 10⁴	2.38 × 10⁰	4.20 × 10⁻¹	3.19 × 10¹	2.92 × 10⁻¹	3.36 × 10¹	8.40 × 10⁻²	3.54 × 10⁻¹
Geometry	3.75 × 10³	2.30 × 10⁶	2.02 × 10¹	1.26 × 10⁰	6.42 × 10²	1.01 × 10⁰	5.05 × 10²	4.84 × 10⁻¹	1.80 × 10⁰
iAF1260b	3.63 × 10³	1.46 × 10⁶	8.01 × 10⁰	2.11 × 10⁰	8.39 × 10³	1.19 × 10⁰	6.30 × 10⁴	1.34 × 10⁰	2.74 × 10⁰
NDC	3.59 × 10³	9.03 × 10⁵	3.83 × 10⁰	1.36 × 10⁰	1.37 × 10³	7.68 × 10⁻¹	1.98 × 10³	9.24 × 10⁻¹	1.94 × 10⁰
Restaurant	1.05 × 10³	9.92 × 10⁵	4.30 × 10⁰	7.04 × 10⁻¹	5.70 × 10¹	4.02 × 10⁻¹	9.80 × 10¹	2.84 × 10⁻¹	7.45 × 10⁻¹
Committees	7.82 × 10²	4.03 × 10⁵	5.23 × 10⁰	5.48 × 10⁻¹	7.81 × 10⁰	3.80 × 10⁻¹	9.91 × 10⁰	1.36 × 10⁻¹	4.85 × 10⁻¹

Table 4. AUC scores obtained by our algorithms and baselines (

T = 35, β = 0.005

). The maximal AUC value among all methods is shown in bold and the maximal AUC value among baselines is shown with *. The performance improvement is denoted with ↑.

Table 4. AUC scores obtained by our algorithms and baselines (

T = 35, β = 0.005

). The maximal AUC value among all methods is shown in bold and the maximal AUC value among baselines is shown with *. The performance improvement is denoted with ↑.

Datasets	Our Method	Baselines							Boost
	HEDV-Greedy	HADP	HSDP	H-RIS	H-CI (l = 1)	H-CI (l = 2)	H-Degree	Degree	25.8%↑
Algebra	0.1812	0.1477 *	0.1108	0.1016	0.0699	0.0671	0.0686	0.0761	22.7%↑
Diseasome	0.1644	0.1034	0.1034	0.1109 *	0.0877	0.0870	0.0841	0.0992	48.2%↑
Email	0.1332	0.1133 *	0.1131	0.1132	0.0984	0.1007	0.1002	0.1020	17.6%↑
Geometry	0.1296	0.1163 *	0.1110	0.1108	0.0991	0.0993	0.0991	0.1002	11.4%↑
iAF1260b	0.1872	0.1422 *	0.1052	0.0793	0.0770	0.0746	0.0754	0.0817	31.6%↑
NDC	0.1492	0.1132	0.1195 *	0.0896	0.0937	0.0939	0.0940	0.0988	24.9%↑
Restaurant	0.1560	0.1115 *	0.1021	0.1074	0.0948	0.0879	0.0914	0.1014	39.9%↑
Committees	0.1284	0.1169 *	0.1103	0.1157	0.0997	0.0979	0.0990	0.1038	9.80%↑

Table 5. AUC scores obtained by our algorithms and baselines (

T = 20, β = 0.02

). The maximal AUC value among all methods is shown in bold and the maximal AUC value among baselines is shown with *. The performance improvement is denoted with ↑.

Table 5. AUC scores obtained by our algorithms and baselines (

T = 20, β = 0.02

). The maximal AUC value among all methods is shown in bold and the maximal AUC value among baselines is shown with *. The performance improvement is denoted with ↑.

Datasets	Our Method	Baselines							Boost
	HEDV-Greedy	HADP	HSDP	H-RIS	H-CI (l = 1)	H-CI (l = 2)	H-Degree	Degree	16.8%↑
Algebra	0.1335	0.1231 *	0.1122	0.1132	0.0944	0.0929	0.0935	0.0974	8.40%↑
Diseasome	0.1814	0.1041 *	0.1033	0.1112	0.0754	0.0742	0.0696	0.0950	74.3%↑
Email	0.1408	0.1167 *	0.1150	0.1153	0.0917	0.0949	0.0942	0.0967	20.7%↑
Geometry	0.1153	0.1120	0.1109	0.1121 *	0.1075	0.1075	0.1075	0.1077	2.90%↑
iAF1260b	0.1660	0.1455 *	0.1098	0.0774	0.0809	0.0762	0.0781	0.0871	14.1%↑
NDC	0.1350	0.1249 *	0.1242	0.1028	0.0864	0.0864	0.0866	0.0936	8.10%↑
Restaurant	0.1158	0.1111 *	0.1093	0.1103	0.1068	0.1061	0.1066	0.1089	4.20%↑
Committees	0.1144	0.1127 *	0.1105	0.1118	0.1075	0.1071	0.1074	0.1084	1.50%↑

Table 6. AUC scores obtained by our algorithms and baselines (

T = 30, β = 0.015

). The maximal AUC value among all methods is shown in bold and the maximal AUC value among baselines is shown with *. The performance improvement is denoted with ↑.

Table 6. AUC scores obtained by our algorithms and baselines (

T = 30, β = 0.015

). The maximal AUC value among all methods is shown in bold and the maximal AUC value among baselines is shown with *. The performance improvement is denoted with ↑.

Datasets	Our Method	Baselines							Boost
	HEDV-Greedy	HADP	HSDP	H-RIS	H-CI (l = 1)	H-CI (l = 2)	H-Degree	Degree	13.6%↑
Algebra	0.1268	0.1194 *	0.1116	0.1136	0.0987	0.0978	0.0981	0.1008	6.20%↑
Diseasome	0.1814	0.1052	0.1040	0.1113 *	0.0743	0.0729	0.0680	0.0951	63.0%↑
Email	0.1408	0.1174 *	0.1153	0.1157	0.091	0.0944	0.0937	0.0962	19.9%↑
Geometry	0.1139	0.1117	0.1108	0.1121 *	0.1085	0.1085	0.1085	0.1087	1.60%↑
iAF1260b	0.1593	0.1424 *	0.1098	0.0800	0.0844	0.0796	0.0813	0.0899	11.9%↑
NDC	0.1322	0.1272 *	0.1251	0.1059	0.0855	0.0853	0.0856	0.0931	3.90%↑
Restaurant	0.1132	0.1109	0.1096	0.1116 *	0.1080	0.1080	0.1080	0.1093	1.40%↑
Committees	0.1131	0.1124 *	0.1107	0.1114	0.1084	0.1082	0.1083	0.1090	0.60%↑

Table 7. Wilcoxon signed-rank test for all datasets. R+, R− represent the rank by which the proposed algorithm is superior and inferior to that baseline, respectively. The p-values indicate the probability of the null hypothesis being true (

β = 0.01, T = 25

).

Table 7. Wilcoxon signed-rank test for all datasets. R+, R− represent the rank by which the proposed algorithm is superior and inferior to that baseline, respectively. The p-values indicate the probability of the null hypothesis being true (

β = 0.01, T = 25

).

Datasets	Baselines
	HADP			HSDP			H-RIS			H-CI (l = 1)			H-CI (l = 2)			H-Degree			Degree
	R+	R−	p	R+	R−	p	R+	R−	p	R+	R−	p	R+	R−	p	R+	R−	p	R+	R−	p
Algebra	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0
Diseasome	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0
Email	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0
Geometry	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0
iAF1260b	443	22	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0
NDC	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0
Restaurant	465	0	0	465	0	0	463	2	0	465	0	0	465	0	0	465	0	0	465	0	0
Committees	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0	465	0	0

Table 8. Friedman test for all datasets. Ranksum is the sum of the rank of the corresponding algorithm when

K

takes different values (

K = 1 - 30

) in the dataset. Friedman rank is the average of the ranksum. General rank is the final rank of the algorithm. The optimal method is shown in bold and highlighted with *.

Table 8. Friedman test for all datasets. Ranksum is the sum of the rank of the corresponding algorithm when

K

takes different values (

K = 1 - 30

) in the dataset. Friedman rank is the average of the ranksum. General rank is the final rank of the algorithm. The optimal method is shown in bold and highlighted with *.

Datasets		Our Method	Baselines
		HEDV-Greedy	HADP	HSDP	H-RIS	H-CI (I = 1)	H-CI (I = 2)	H-Degree	Degree
Algebra	Ranksum	30 *	65	97	116	180	223	214	155
	Friedman rank	1.00 *	2.17	3.23	3.87	6	7.43	7.13	5.17
	General rank	1 *	2	3	4	6	8	7	5
Diseasome	Ranksum	30 *	110	110	72	182	207	237	132
	Friedman rank	1.00 *	3.67	3.67	2.4	6.07	6.9	7.9	4.4
	General rank	1 *	3.5	3.5	2	6	7	8	5
Email	Ranksum	30 *	87	91	121	221	183	196	151
	Friedman rank	1.00 *	2.9	3.03	4.03	7.37	6.1	6.53	5.03
	General rank	1 *	2	3	4	8	6	7	5
Geometry	Ranksum	30 *	82	120	113	190	191	193	161
	Friedman rank	1.00 *	2.73	4	3.77	6.33	6.37	6.43	5.37
	General rank	1 *	2	4	3	6	7	8	5
iAF1260b	Ranksum	34 *	70	100	138	176	216	189	157
	Friedman rank	1.13 *	2.33	3.33	4.6	5.87	7.2	6.3	5.23
	General rank	1 *	2	3	4	6	8	7	5
NDC	Ranksum	30 *	93	73	174	190	189	190	141
	Friedman rank	1.00 *	3.1	2.43	5.8	6.33	6.3	6.33	4.7
	General rank	1 *	3	2	5	7.5	6	7.5	4
Restaurant	Ranksum	31 *	96	120	120	173	221	193	126
	Friedman rank	1.03 *	3.2	4	4	5.77	7.37	6.43	4.2
	General rank	1 *	2	3.5	3.5	6	8	7	5
Committees	Ranksum	30 *	82	114	78	182	237	207	150
	Friedman rank	1.00 *	2.73	3.8	2.6	6.07	7.9	6.9	5
	General rank	1 *	3	4	2	6	8	7	5

Table 9. Comparison of characteristics between greedy-based algorithms, heuristic solutions, and our proposed method.

Algorithms	Characteristics	Our Method	Greedy-Based Algorithms	Heuristic Solutions
Algorithms		HEDV-Greedy	Greedy, CELF, CELF++, etc.	H-Degree, Degree, etc.
Influence evaluation	Methods	HEDV	Monte Carlo	Centrality
	Efficiency	✓	-	✓
	Accuracy	✓	✓	-
Node selection	Methods	Maximum MV	Greedy strategy	Top-K
	Efficiency	✓	-	✓
	Accuracy	✓	✓	-
	Low overlap	✓	✓	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Pan, Q.; Tang, J. HEDV-Greedy: An Advanced Algorithm for Influence Maximization in Hypergraphs. Mathematics 2024, 12, 1041. https://doi.org/10.3390/math12071041

AMA Style

Wang H, Pan Q, Tang J. HEDV-Greedy: An Advanced Algorithm for Influence Maximization in Hypergraphs. Mathematics. 2024; 12(7):1041. https://doi.org/10.3390/math12071041

Chicago/Turabian Style

Wang, Haosen, Qingtao Pan, and Jun Tang. 2024. "HEDV-Greedy: An Advanced Algorithm for Influence Maximization in Hypergraphs" Mathematics 12, no. 7: 1041. https://doi.org/10.3390/math12071041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HEDV-Greedy: An Advanced Algorithm for Influence Maximization in Hypergraphs

Abstract

1. Introduction

Organization of the Article

2. Related Works

3. The Proposed Algorithm

3.1. SIS-Based Information Diffusion Model in Hypergraphs

3.2. HEDV: A Function for Evaluating the Diffusion Capacity of Nodes on Hypergraphs

3.3. HEDV-Greedy: Algorithm for Selecting the Seed Nodes with Maximum Influence

4. Experiment

4.1. Datasets

4.2. Algorithms for Comparison

4.3. Effectiveness on Real-World Datasets

4.4. Efficiency

4.5. Parameters Sensitivities

4.6. Non-Parametric Test

4.7. Comprehensive Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI