We analyze the convergence properties of Fermat distances, a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure. Fermat distances may be defined either on discrete samples from the underlying measure, in which case they are random, or in the continuum setting, in which they are induced by geodesics under a density-distorted Riemannian metric. We prove that discrete, sample-based Fermat distances converge to their continuum analogues in small neighborhoods with a precise rate that depends on the intrinsic dimensionality of the data and the parameter governing the extent of density weighting in Fermat distances. This is done by leveraging novel geometric and statistical arguments in percolation theory that allow for non-uniform densities and curved domains. Our results are then used to prove that discrete graph Laplacians based on discrete, sample-driven Fermat distances converge to corresponding continuum operators. In particular, we show the discrete eigenvalues and eigenvectors converge to their continuum analogues at a dimension-dependent rate, which allows us to interpret the efficacy of discrete spectral clustering using Fermat distances in terms of the resulting continuum limit. The perspective afforded by our discrete-to-continuum Fermat distance analysis leads to new clustering algorithms for data and related insights into efficient computations associated to density-driven spectral clustering. Our theoretical analysis is supported with numerical simulations and experiments on synthetic and real image data.
翻译:我们分析了Fermat距离的收敛性质,这是一族定义在带有相关概率测度的黎曼流形上的密度驱动度量。Fermat距离既可以在底层测度的离散样本上定义(此时是随机性的),也可以在连续统设定下定义(此时由密度畸变黎曼度量下的测地线导出)。我们证明,基于离散样本的Fermat距离以依赖于数据内在维度和控制Fermat距离中密度加权程度的参数的精确速率,在局部邻域内收敛到其连续统对应物。这一结果通过利用渗流理论中允许非均匀密度和曲率域的新颖几何与统计论证得以实现。随后,我们的结果被用于证明基于离散样本驱动的Fermat距离的离散图拉普拉斯算子收敛到相应的连续统算子。特别地,我们展示了离散特征值和特征向量以依赖于维度的速率收敛到连续统对应物,这使得我们能够根据所得的连续统极限,解释使用Fermat距离进行离散谱聚类的有效性。从Fermat距离离散到连续统分析中获得的视角,催生了适用于数据的新聚类算法,以及与密度驱动谱聚类相关的高效计算洞见。我们的理论分析得到了数值模拟及在合成与真实图像数据上的实验支持。