Previous statistical approaches to hierarchical clustering for social network analysis all construct an "ultrametric" hierarchy. While the assumption of ultrametricity has been discussed and studied in the phylogenetics literature, it has not yet been acknowledged in the social network literature. We show that "non-ultrametric structure" in the network introduces significant instabilities in the existing top-down recovery algorithms. To address this issue, we introduce an instability diagnostic plot and use it to examine a collection of empirical networks. These networks appear to violate the "ultrametric" assumption. We propose a deceptively simple and yet general class of probabilistic models called $\mathbb{T}$-Stochastic Graphs which impose no topological restrictions on the latent hierarchy. To illustrate this model, we propose six alternative forms of hierarchical network models and then show that all six are equivalent to the $\mathbb{T}$-Stochastic Graph model. These alternative models motivate a novel approach to hierarchical clustering that combines spectral techniques with the well-known Neighbor-Joining algorithm from phylogenetic reconstruction. We prove this spectral approach is statistically consistent.
翻译:以往对社会网络分析中层次聚类的统计方法都构建了一种"超度量"层次结构。虽然超度量性假设已在系统发育学文献中得到讨论和研究,但在社会网络文献中尚未被认可。我们证明网络中的"非超度量结构"会导致现有自上而下恢复算法出现显著不稳定性。为解决这一问题,我们引入了一种不稳定性诊断图,并用其检验了一组经验网络。这些网络似乎违反了"超度量"假设。我们提出了一类看似简单却具有普适性的概率模型,称为$\mathbb{T}$-随机图,该模型对潜在层次结构不施加任何拓扑限制。为阐释这一模型,我们提出了六种替代形式的层次网络模型,并证明这六种模型均等价于$\mathbb{T}$-随机图模型。这些替代模型启发了一种新型层次聚类方法,该方法将谱技术与系统发育重建中著名的邻接算法相结合。我们证明这种谱方法具有统计一致性。