A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn's algorithm, with promising empirical results. However, the statistical foundation for using IPF has not been well understood: under what settings does IPF provide principled estimation of a dynamic network from its marginals, and how well does it estimate the network? In this work, we establish such a setting, by identifying a generative network model whose maximum likelihood estimates are recovered by IPF. Our model both reveals implicit assumptions on the use of IPF in such settings and enables new analyses, such as structure-dependent error bounds on IPF's parameter estimates. When IPF fails to converge on sparse network data, we introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure. Finally, we conduct experiments with synthetic and real-world data, which demonstrate the practical value of our theoretical and algorithmic contributions.
翻译:一个由现实数据约束引发的常见网络推断问题,是如何从其时间聚合邻接矩阵与时变边际(即行和与列和)中推断动态网络。先前针对该问题的方法将经典迭代比例拟合(IPF)过程(也称为Sinkhorn算法)进行改造,并取得了具有前景的实证结果。然而,使用IPF的统计基础尚未得到充分理解:在何种设置下,IPF能从边际中提供动态网络的原则性估计?其对网络的估计效果又如何?在本工作中,我们通过识别一种生成式网络模型(其最大似然估计可通过IPF恢复),建立了此类设置条件。我们的模型既揭示了在此类场景中使用IPF的隐含假设,又支持新的分析方法(如IPF参数估计的结构依赖误差界)。当IPF在稀疏网络数据上无法收敛时,我们提出了一种保证IPF在网络结构最小变更下收敛的原则性算法。最后,我们通过合成数据与真实数据的实验,证明了理论与算法贡献的实践价值。