Modeling information spread through a network is one of the key problems of network analysis, with applications in a wide array of areas such as marketing and public health. Most approaches assume that the spread is governed by some probabilistic diffusion model, often parameterized by the strength of connections between network members (edge weights), highlighting the need for methods that can accurately estimate them. Multiple prior works suggest such estimators for particular diffusion models; however, most of them lack a rigorous statistical analysis that would establish the asymptotic properties of the estimator and allow for uncertainty quantification. In this paper, we develop a likelihood-based approach to estimate edge weights from the observed information diffusion paths under the proposed General Linear Threshold (GLT) model, a broad class of discrete-time information diffusion models that includes both the well-known linear threshold (LT) and independent cascade (IC) models. We first derive necessary and sufficient conditions that make the edge weights identifiable under this model. Then, we derive a finite sample error bound for the estimator and demonstrate that it is asymptotically normal under mild conditions. We conclude by studying the GLT model in the context of the Influence Maximization (IM) problem, that is, the task of selecting a subset of $k$ nodes to start the diffusion, so that the average information spread is maximized. Extensive experiments on synthetic and real-world networks demonstrate that the flexibility of the proposed class of GLT models, coupled with the proposed estimation and inference framework for its parameters, can significantly improve estimation of spread from a given subset of nodes, prediction of node activation, and the quality of the IM problem solutions.
翻译:对网络中信息传播进行建模是网络分析的核心问题之一,在市场营销与公共卫生等广泛领域具有重要应用。多数方法假设传播受某种概率扩散模型支配,通常以网络成员间连接强度(边权重)为参数,这凸显了对能够准确估计这些参数的方法的需求。已有研究针对特定扩散模型提出了多种估计器,但大多缺乏严格的统计分析,无法确立估计器的渐近性质及实现不确定性量化。本文基于似然方法,从观测到的信息扩散路径中估计边权重,所提出的广义线性阈值(GLT)模型是一类离散时间信息扩散模型的广泛框架,涵盖了经典的线性阈值(LT)模型与独立级联(IC)模型。我们首先推导了在该模型下边权重可识别的充分必要条件。随后,给出了估计器的有限样本误差界,并证明其在温和条件下具有渐近正态性。最后,我们将GLT模型应用于影响力最大化(IM)问题中,即选择$k$个节点作为扩散起点以最大化平均信息传播范围。在合成与真实网络上的大量实验表明,所提出的GLT模型框架的灵活性,结合其参数的估计与推断方法,能够显著提升对给定节点子集的传播估计精度、节点激活预测能力以及IM问题解的质量。