Mechanistic network models specify the mechanisms by which networks grow and change, allowing researchers to investigate complex systems using both simulation and analytical techniques. Unfortunately, it is difficult to write likelihoods for instances of graphs generated with mechanistic models because of a combinatorial explosion in outcomes of repeated applications of the mechanism. Thus it is near impossible to estimate the parameters using maximum likelihood estimation. In this paper, we propose treating node sequence in a growing network model as an additional parameter, or as a missing random variable, and maximizing over the resulting likelihood. We develop this framework in the context of a simple mechanistic network model, used to study gene duplication and divergence, and test a variety of algorithms for maximizing the likelihood in simulated graphs. We also run the best-performing algorithm on a human protein-protein interaction network and four non-human protein-protein interaction networks. Although we focus on a specific mechanistic network model here, the proposed framework is more generally applicable to reversible models.
翻译:机械网络模型通过指定网络生长和变化的机制,使研究者能够利用仿真与分析技术探究复杂系统。然而,由于机制反复作用会导致结果出现组合爆炸,为这类机制模型生成的图实例建立似然函数存在极大困难,因此几乎无法通过最大似然估计进行参数估计。本文提出将生长网络模型中的节点序列视为附加参数或缺失随机变量,并最大化由此产生的似然函数。我们以用于研究基因复制与分化的简单机械网络模型为背景发展该框架,在模拟图中测试多种最大化似然的算法。同时将最优算法应用于人类蛋白质-蛋白质相互作用网络及其他四个非人类物种的蛋白质-蛋白质相互作用网络。虽然本文聚焦于特定机械网络模型,但该框架可推广至可逆模型。