Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work has addressed this question: given some network, does it exhibit statistically meaningful community structure? We answer this question in a principled manner by framing it as a statistical hypothesis test in terms of a general and model-agnostic community structure parameter. Leveraging this parameter, we propose a simple and interpretable test statistic used to formulate two separate hypothesis testing frameworks. The first is an asymptotic test against a baseline value of the parameter while the second tests against a baseline model using bootstrap-based thresholds. We prove theoretical properties of these tests and demonstrate how the proposed method yields rich insights into real-world data sets.
翻译:研究者理论认为,许多现实网络存在社区结构,其中社区内边出现的概率高于社区间边。尽管已有多种方法可将节点聚类为不同社区,但针对"给定网络是否具有统计显著的社区结构"这一问题,相关研究仍较为有限。我们通过将该问题构建为基于通用且与模型无关的社区结构参数的统计假设检验,以严谨方式给出解答。借助该参数,我们提出一种简洁且可解释的检验统计量,用于构建两种独立的假设检验框架:第一种针对参数基准值进行渐近检验,第二种则基于自助法阈值对基准模型进行检验。我们证明了这些检验的理论性质,并展示了所提方法如何为现实数据集提供深刻洞见。