Finding the seed set that maximizes the influence spread over a network is a well-known NP-hard problem. Though a greedy algorithm can provide near-optimal solutions, the subproblem of influence estimation renders the solutions inefficient. In this work, we propose \textsc{Glie}, a graph neural network that learns how to estimate the influence spread of the independent cascade. GLIE relies on a theoretical upper bound that is tightened through supervised training.Experiments indicate that it provides accurate influence estimation for real graphs up to 10 times larger than the train set.Subsequently, we incorporate it into three influence maximization techniques.We first utilize Cost Effective Lazy Forward optimization substituting Monte Carlo simulations with GLIE, surpassing the benchmarks albeit with a computational overhead. To improve computational efficiency we first devise a Q-learning method that learns to choose seeds sequentially using GLIE's predictions. Finally, we arrive at the most efficient approach by developing a provably submodular influence spread based on GLIE's representations, to rank nodes while building the seed set adaptively. The proposed algorithms are inductive, meaning they are trained on graphs with less than 300 nodes and up to 5 seeds, and tested on graphs with millions of nodes and up to 200 seeds. The final method exhibits the most promising combination of time efficiency and influence quality, outperforming several baselines.
翻译:在网络上寻找使影响力传播最大化的种子集是一个已知的NP难问题。尽管贪心算法能提供接近最优的解,但影响力估计的子问题使得这些解效率低下。本文提出了一种图神经网络\textsc{Glie},它能够学习如何估计独立级联模型下的影响力传播。GLIE依赖于一个通过监督训练收紧的理论上界。实验表明,对于比训练集大10倍的真实图,它能提供准确的影响力估计。随后,我们将其整合到三种影响力最大化技术中。首先,我们利用成本效益懒惰前向优化,用GLIE替代蒙特卡洛模拟,尽管存在计算开销,但仍超越了基准方法。为了提升计算效率,我们设计了一种Q学习方法,该方法利用GLIE的预测来顺序选择种子。最后,我们通过基于GLIE表示开发可证明具有子模性的影响力传播,在自适应构建种子集的同时对节点进行排序,从而实现了最高效的方法。所提出的算法是归纳式的,即在节点数少于300、种子数最多为5的图上训练,并在节点数达百万、种子数最多为200的图上测试。最终方法在时间效率和影响力质量上展现出最佳组合,超越了多个基线方法。