Many popular models from the networks literature can be viewed through a common lens of contingency tables on network dyads, resulting in \emph{log-linear ERGMs}: exponential family models for random graphs whose sufficient statistics are linear on the dyads. We propose a new model in this family, the \emph{$p_1$-SBM}, which combines node and group effects common in network formation mechanisms. In particular, it is a generalization of several well-known ERGMs including the stochastic blockmodel for undirected graphs, the degree-corrected version of it, and the directed $p_1$ model without group structure. We frame the problem of testing model fit for the log-linear ERGM class through an exact conditional test whose $p$-value can be approximated efficiently in networks of both small and moderately large sizes. The sampling methods we build rely on a dynamic adaptation of Markov bases. We use quick estimation algorithms adapted from the contingency table literature and effective sampling methods rooted in graph theory and algebraic statistics. The performance and scalability of the method is demonstrated on two data sets from biology: the connectome of \emph{C. elegans} and the interactome of \emph{Arabidopsis thaliana}. These two networks -- a neuronal network and a protein-protein interaction network -- have been popular examples in the network science literature. Our work provides a model-based approach to studying them.
翻译:网络文献中的许多流行模型可通过网络对子上列联表的统一视角加以理解,由此衍生出\textit{对数线性指数随机图模型}:其充分统计量在对子上呈线性关系的随机图指数族模型。我们提出该族中的一个新模型——\textit{$p_1$-SBM},该模型融合了网络形成机制中常见的节点效应与群体效应。具体而言,它是对若干经典指数随机图模型的推广,包括无向图的随机块模型、其度修正版本,以及不含群体结构的有向$p_1$模型。我们通过精确条件检验来构建对数线性指数随机图模型类的拟合检验框架,其$p$值可在中小规模网络中高效近似。所构建的抽样方法依赖于马尔可夫基的动态适配。我们采用源自列联表文献的快速估计算法,以及根植于图论与代数统计的有效抽样方法。该方法在两类生物学数据集——秀丽隐杆线虫连接组与拟南芥相互作用组——中验证了其性能与可扩展性。这两个网络(神经网络与蛋白质-蛋白质相互作用网络)一直是网络科学文献中的经典案例。本研究为其提供了基于模型的分析路径。