We construct Bayesian and frequentist finite-sample goodness-of-fit tests for three different variants of the stochastic blockmodel for network data. Since all of the stochastic blockmodel variants are log-linear in form when block assignments are known, the tests for the \emph{latent} block model versions combine a block membership estimator with the algebraic statistics machinery for testing goodness-of-fit in log-linear models. We describe Markov bases and marginal polytopes of the variants of the stochastic blockmodel, and discuss how both facilitate the development of goodness-of-fit tests and understanding of model behavior. The general testing methodology developed here extends to any finite mixture of log-linear models on discrete data, and as such is the first application of the algebraic statistics machinery for latent-variable models.
翻译:我们针对网络数据的三种不同随机块模型变体,构建了贝叶斯与频率学派有限样本拟优度检验方法。由于当块分配已知时,所有随机块模型变体均呈现对数线性形式,因此对潜在块模型版本的检验需结合块成员估计算法与代数统计工具,以完成对数线性模型的拟优度检验。我们描述了随机块模型变体的马尔可夫基与边缘多面体,并探讨了二者如何促进拟优度检验的构建及对模型行为的理解。本文提出的通用检验方法可拓展至离散数据上任意有限混合的对数线性模型,因此也是代数统计工具在隐变量模型中的首次应用。