Hierarchical clustering of networks consists in finding a tree of communities, such that lower levels of the hierarchy reveal finer-grained community structures. There are two main classes of algorithms tackling this problem. Divisive ($\textit{top-down}$) algorithms recursively partition the nodes into two communities, until a stopping rule indicates that no further split is needed. In contrast, agglomerative ($\textit{bottom-up}$) algorithms first identify the smallest community structure and then repeatedly merge the communities using a $\textit{linkage}$ method. In this article, we establish theoretical guarantees for the recovery of the hierarchical tree and community structure of a Hierarchical Stochastic Block Model by a bottom-up algorithm. We also establish that this bottom-up algorithm attains the information-theoretic threshold for exact recovery at intermediate levels of the hierarchy. Notably, these recovery conditions are less restrictive compared to those existing for top-down algorithms. This shows that bottom-up algorithms extend the feasible region for achieving exact recovery at intermediate levels. Numerical experiments on both synthetic and real data sets confirm the superiority of bottom-up algorithms over top-down algorithms. We also observe that top-down algorithms can produce dendrograms with inversions. These findings contribute to a better understanding of hierarchical clustering techniques and their applications in network analysis.
翻译:网络层次聚类旨在寻找一棵社区树,使得层次结构中较低的层级揭示更细粒度的社区结构。解决此问题的算法主要有两类。分裂型(自顶向下)算法递归地将节点划分为两个社区,直到停止规则指示无需进一步划分为止。相反,凝聚型(自底向上)算法首先识别最小的社区结构,然后使用链接方法反复合并社区。在本文中,我们为自底向上算法在层次随机块模型中恢复层次树和社区结构建立了理论保证。我们还证明,该自底向上算法在层次结构的中间层级达到了精确恢复的信息论阈值。值得注意的是,这些恢复条件比现有自顶向下算法的条件更为宽松。这表明自底向上算法扩展了在中间层级实现精确恢复的可行区域。在合成和真实数据集上的数值实验证实了自底向上算法优于自顶向下算法。我们还观察到,自顶向下算法可能产生带有倒置的树状图。这些发现有助于更好地理解层次聚类技术及其在网络分析中的应用。