Many existing statistical models for networks overlook the fact that many real world networks are formed through a growth process. To address this, we introduce the PAPER (Preferential Attachment Plus Erd\H{o}s--R\'{e}nyi) model for random networks, where we let a random network G be the union of a preferential attachment (PA) tree T and additional Erd\H{o}s--R\'{e}nyi (ER) random edges. The PA tree component captures the underlying growth/recruitment process of a network where vertices and edges are added sequentially, while the ER component can be regarded as random noise. Given only a single snapshot of the final network G, we study the problem of constructing confidence sets for the early history, in particular the root node, of the unobserved growth process; the root node can be patient zero in a disease infection network or the source of fake news in a social media network. We propose an inference algorithm based on Gibbs sampling that scales to networks with millions of nodes and provide theoretical analysis showing that the expected size of the confidence set is small so long as the noise level of the ER edges is not too large. We also propose variations of the model in which multiple growth processes occur simultaneously, reflecting the growth of multiple communities, and we use these models to provide a new approach to community detection.
翻译:许多现有的网络统计模型忽略了真实网络往往通过生长过程形成这一事实。为此,我们针对随机网络提出PAPER(偏好连接加Erdős–Rényi)模型,其中随机网络G由一棵偏好连接(PA)树T与额外的Erdős–Rényi(ER)随机边并集而成。PA树成分捕捉了网络顶点与边依次添加的潜在生长/招募过程,而ER成分可视为随机噪声。仅给定最终网络G的单次快照,我们研究如何为未观测生长过程的早期历史(特别是根节点)构建置信集的问题;根节点可以是疾病感染网络中的零号病人,或社交媒体网络中的虚假新闻源头。我们提出一种基于吉布斯采样的推断算法,可扩展至百万节点规模的网络,并提供理论分析表明,只要ER边的噪声水平不太高,置信集的期望规模即可保持较小。我们还提出该模型的若干变体,其中多个生长过程同时发生,以反映多个社群的生长,并利用这些模型提出一种新的社群检测方法。