Random graphs have been widely used in statistics, for example in network and social interaction analysis. In some applications, data may contain an inherent hierarchical ordering among its vertices, which prevents any directed edge between pairs of vertices that do not respect this order. For example, in bibliometrics, older papers cannot cite newer ones. In such situations, the resulting graph forms a Directed Acyclic Graph. In this article, we propose an extension of the popular Stochastic Block Model (SBM) to account for the presence of a latent hierarchical ordering in the data. The proposed approach includes a topological ordering in the likelihood of the model, which allows a directed edge to have positive probability only if the corresponding pair of vertices respect the order. This latent ordering is treated as an unknown parameter and endowed with a prior distribution. We describe how to formalize the model and perform posterior inference for a Bayesian nonparametric version of the SBM in which both the hierarchical ordering and the number of latent blocks are learnt from the data. Finally, an illustration with a real-world dataset from bibliometrics is presented. Additional supplementary materials are available online.
翻译:随机图在统计学中已得到广泛应用,例如在网络分析和社交互动分析中。在某些应用中,数据可能包含其顶点间固有的层次排序,这种排序禁止任何违反该顺序的顶点对之间存在有向边。例如,在文献计量学中,较早的论文无法引用较新的论文。在此类情形下,生成的图构成一个有向无环图。本文提出对流行的随机分块模型的一种扩展,以处理数据中潜在层次排序的存在。所提出的方法在模型的似然函数中引入了拓扑排序,使得有向边仅当对应的顶点对遵循该顺序时才具有正概率。此潜在排序被视为未知参数,并被赋予先验分布。我们阐述了如何形式化该模型,并针对SBM的贝叶斯非参数版本执行后验推断,其中层次排序和潜在分块数量均可从数据中学习。最后,通过一个来自文献计量学的真实数据集进行了示例演示。其他补充材料可在线获取。