Finding the seed set that maximizes the influence spread over a network is a well-known NP-hard problem. Though a greedy algorithm can provide near-optimal solutions, the subproblem of influence estimation renders the solutions inefficient. In this work, we propose \textsc{Glie}, a graph neural network that learns how to estimate the influence spread of the independent cascade. GLIE relies on a theoretical upper bound that is tightened through supervised training.Experiments indicate that it provides accurate influence estimation for real graphs up to 10 times larger than the train set.Subsequently, we incorporate it into three influence maximization techniques.We first utilize Cost Effective Lazy Forward optimization substituting Monte Carlo simulations with GLIE, surpassing the benchmarks albeit with a computational overhead. To improve computational efficiency we first devise a Q-learning method that learns to choose seeds sequentially using GLIE's predictions. Finally, we arrive at the most efficient approach by developing a provably submodular influence spread based on GLIE's representations, to rank nodes while building the seed set adaptively. The proposed algorithms are inductive, meaning they are trained on graphs with less than 300 nodes and up to 5 seeds, and tested on graphs with millions of nodes and up to 200 seeds. The final method exhibits the most promising combination of time efficiency and influence quality, outperforming several baselines.
翻译:在网络中寻找能最大化影响力传播的种子集是著名的NP-hard问题。尽管贪心算法能提供接近最优的解,但影响估计的子问题使得这些解效率低下。本文提出\textsc{Glie}——一种学习估计独立级联模型下影响力传播的图神经网络。GLIE依赖于一个通过监督训练收紧的理论上界。实验表明,该模型能对规模比训练集大10倍的现实图提供精确的影响估计。随后,我们将其融入三种影响力最大化技术中:首先利用GLIE替代蒙特卡洛模拟,结合成本效益惰性前向优化,虽能超越基准方法但存在计算开销;为提升计算效率,我们设计了一种基于GLIE预测的Q学习方法,使其能顺序选择种子;最终,我们通过开发基于GLIE表示的可证明次模影响传播函数,实现了在自适应构建种子集时对节点进行排序的最高效方法。所提算法具有归纳性——即在节点数少于300、种子数不超过5的图上训练,并在节点数达百万级、种子数达200的图上测试。最终方法在时间效率与影响质量之间展现出最具潜力的结合,显著优于多个基线方法。