Parameter Estimation Procedures for Exponential-Family Random Graph Models on Count-Valued Networks: A Comparative Simulation Study

The exponential-family random graph models (ERGMs) have emerged as an important framework for modeling social networks for a wide variety of relational types. ERGMs for valued networks are less well-developed than their unvalued counterparts, and pose particular computational challenges. Network data with edge values on the non-negative integers (count-valued networks) is an important such case, with examples ranging from the magnitude of migration and trade flows between places to the frequency of interactions and encounters between individuals. Here, we propose an efficient parallelable subsampled maximum pseudo-likelihood estimation (MPLE) scheme for count-valued ERGMs, and compare its performance with existing Contrastive Divergence (CD) and Monte Carlo Maximum Likelihood Estimation (MCMLE) approaches via a simulation study based on migration flow networks in two U.S. states. Our results suggest that edge value variance is a key factor in method performance, while network size mainly influences their relative merits in computational time. For small-variance networks, all methods perform well in point estimations while CD greatly overestimates uncertainties, and MPLE underestimates them for dependence terms; all methods have fast estimation for small networks, but CD and subsampled multi-core MPLE provides speed advantages as network size increases. For large-variance networks, both MPLE and MCMLE offer high-quality estimates of coefficients and their uncertainty, but MPLE is significantly faster than MCMLE; MPLE is also a better seeding method for MCMLE than CD, as the latter makes MCMLE more prone to convergence failure.

翻译：指数族随机图模型（ERGMs）已成为建模多种关系类型社交网络的重要框架。相比于无权网络，适用于有权网络的ERGMs发展尚不充分，并面临特殊的计算挑战。边权值为非负整数的网络数据（计数型网络）是此类情况的重要实例，其应用范围涵盖从区域间迁移与贸易流规模到个体间互动与接触频率。本文提出一种高效可并行化的子采样最大伪似然估计（MPLE）方案用于计数型ERGMs，并基于美国两个州的迁移流动网络开展模拟研究，将其性能与现有的对比散度（CD）及蒙特卡洛最大似然估计（MCMLE）方法进行比较。结果表明：边权方差是影响方法性能的关键因素，而网络规模主要影响各方法在计算时间上的相对优势。对小方差网络，所有方法在点估计上表现良好，但CD会大幅高估不确定性，MPLE则低估依赖项的不确定性；所有方法对小规模网络均能快速估计，但随着网络规模增大，CD和子采样多核MPLE在速度上更具优势。对方差较大网络，MPLE和MCMLE均能提供高质量的系数及其不确定性估计，但MPLE显著快于MCMLE；此外，MPLE作为MCMLE的种子方法优于CD，因后者更易导致收敛失败。