Given a hypergraph, influence maximization (IM) is to discover a seed set containing $k$ vertices that have the maximal influence. Although the existing vertex-based IM algorithms perform better than the hyperedge-based algorithms by generating random reverse researchable (RR) sets, they are inefficient because (i) they ignore important structural information associated with hyperedges and thus obtain inferior results, (ii) the frequently-used sampling methods for generating RR sets have low efficiency because of a large number of required samplings along with high sampling variances, and (iii) the vertex-based IM algorithms have large overheads in terms of running time and memory costs. To overcome these shortcomings, this paper proposes a novel approach, called \emph{HyperIM}. The key idea behind \emph{HyperIM} is to differentiate structural information of vertices for developing stratified sampling combined with highly-efficient strategies to generate the RR sets. With theoretical guarantees, \emph{HyperIM} is able to accelerate the influence spread, improve the sampling efficiency, and cut down the expected running time. To further reduce the running time and memory costs, we optimize \emph{HyperIM} by inferring the bound of the required number of RR sets in conjunction with stratified sampling. Experimental results on real-world hypergraphs show that \emph{HyperIM} is able to reduce the number of required RR sets and running time by orders of magnitude while increasing the influence spread by up to $2.73X$ on average, compared to the state-of-the-art IM algorithms.
翻译:给定一个超图,影响力最大化(IM)旨在发现一个包含 $k$ 个顶点、具有最大影响力的种子集。尽管现有的基于顶点的IM算法通过生成随机反向可达(RR)集,其性能优于基于超边的算法,但这些算法效率低下,原因在于:(i)它们忽略了与超边相关的重要结构信息,从而得到次优的结果;(ii)由于所需采样数量大且采样方差高,生成RR集时常用的采样方法效率较低;以及(iii)基于顶点的IM算法在运行时间和内存开销方面负担较大。为克服这些缺点,本文提出一种名为 \emph{HyperIM} 的新方法。\emph{HyperIM} 的核心思想是通过区分顶点的结构信息,结合高效策略开发分层采样来生成RR集。在理论保证下,\emph{HyperIM} 能够加速影响力传播、提高采样效率并减少预期运行时间。为进一步降低运行时间和内存成本,我们通过推断所需RR集数量的边界并结合分层采样来优化 \emph{HyperIM}。在真实世界超图上的实验结果表明,与最先进的IM算法相比,\emph{HyperIM} 能够将所需RR集数量和运行时间减少数个数量级,同时将影响力传播平均提升高达 $2.73$ 倍。