Since network data commonly consists of observations from a single large network, researchers often partition the network into clusters in order to apply cluster-robust inference methods. Existing such methods require clusters to be asymptotically independent. Under mild conditions, we prove that, for this requirement to hold for network-dependent data, it is necessary and sufficient that clusters have low conductance, the ratio of edge boundary size to volume. This yields a simple measure of cluster quality. We find in simulations that when clusters have low conductance, cluster-robust methods control size better than HAC estimators. However, for important classes of networks lacking low-conductance clusters, the former can exhibit substantial size distortion. To determine the number of low-conductance clusters and construct them, we draw on results in spectral graph theory that connect conductance to the spectrum of the graph Laplacian. Based on these results, we propose to use the spectrum to determine the number of low-conductance clusters and spectral clustering to construct them.
翻译:由于网络数据通常来自单个大规模网络的观测,研究者常将网络划分为簇以应用簇稳健推断方法。现有此类方法要求簇具有渐近独立性。在温和条件下,我们证明对于网络依赖数据,该要求成立的充要条件是簇具有低电导率——即边缘边界大小与体积之比。这为簇质量提供了简洁的度量标准。仿真实验表明,当簇具有低电导率时,簇稳健方法比HAC估计量能更好地控制检验水平。然而,对缺乏低电导率簇的重要网络类别,前者可能出现显著的检验水平扭曲。为确定低电导率簇的数量并构建它们,我们借鉴谱图理论中关于电导率与图拉普拉斯谱关联的结论,提出利用谱来确定低电导率簇的数量,并采用谱聚类来构建这些簇。