Graph cuts are among the most prominent tools for clustering and classification analysis. While intensively studied from geometric and algorithmic perspectives, graph cut-based statistical inference still remains elusive to a certain extent. Distributional limits are fundamental in understanding and designing such statistical procedures on randomly sampled data. We provide explicit limiting distributions for balanced graph cuts in general on a fixed but arbitrary discretization. In particular, we show that Minimum Cut, Ratio Cut and Normalized Cut behave asymptotically as the minimum of Gaussians as sample size increases. Interestingly, our results reveal a dichotomy for Cheeger Cut: The limiting distribution of the optimal objective value is the minimum of Gaussians only when the optimal partition yields two sets of unequal volumes, while otherwise the limiting distribution is the minimum of a random mixture of Gaussians. Further, we show the bootstrap consistency for all types of graph cuts by utilizing the directional differentiability of cut functionals. We validate these theoretical findings by Monte Carlo experiments, and examine differences between the cuts and the dependency on the underlying distribution. Additionally, we expand our theoretical findings to the Xist algorithm, a computational surrogate of graph cuts recently proposed in Suchan, Li and Munk (arXiv, 2023), thus demonstrating the practical applicability of our findings e.g. in statistical tests.
翻译:图割是聚类与分类分析中最主要的工具之一。尽管已从几何与算法角度得到深入研究,基于图割的统计推断在一定程度上仍显不足。分布极限对于理解和设计随机采样数据上的此类统计过程至关重要。本文针对任意固定离散化网格上的平衡图割,给出了其明确的极限分布。具体而言,我们证明了随着样本量增加,最小割、比率割和归一化割渐近表现为高斯分布的最小值。有趣的是,我们的结果揭示了Cheeger割的二象性:仅当最优划分产生两个不等体积的集合时,最优目标值的极限分布才是高斯分布的最小值;否则其极限分布为高斯随机混合的最小值。此外,通过利用割泛函的方向可微性,我们证明了所有类型图割的自举一致性。我们通过蒙特卡洛实验验证了这些理论发现,并检验了不同割法之间的差异及其对基础分布的依赖性。进一步地,我们将理论结果拓展至Xist算法——这是Suchan、Li和Munk(arXiv,2023)近期提出的图割计算替代方法,从而证明了我们研究结果在实际应用(例如统计检验)中的实用性。