Persistence diagrams are used as signatures of point cloud data assumed to be sampled from manifolds, and represent their topology in a compact fashion. Further, two given clouds of points can be compared by directly comparing their persistence diagrams using the bottleneck distance, d_B. But one potential drawback of this pipeline is that point clouds sampled from topologically similar manifolds can have arbitrarily large d_B values when there is a large degree of scaling between them. This situation is typical in dimension reduction frameworks that are also aiming to preserve topology. We define a new scale-invariant distance between persistence diagrams termed normalized bottleneck distance, d_N, and study its properties. In defining d_N, we also develop a broader framework called metric decomposition for comparing finite metric spaces of equal cardinality with a bijection. We utilize metric decomposition to prove a stability result for d_N by deriving an explicit bound on the distortion of the associated bijective map. We then study two popular dimension reduction techniques, Johnson-Lindenstrauss (JL) projections and metric multidimensional scaling (mMDS), and a third class of general biLipschitz mappings. We provide new bounds on how well these dimension reduction techniques preserve homology with respect to d_N. For a JL map f that transforms input X to f(X), we show that d_N(dgm(X),dgm(f(X)) < e, where dgm(X) is the Vietoris-Rips persistence diagram of X, and 0 < e < 1 is the tolerance up to which pairwise distances are preserved by f. For mMDS, we present new bounds for both d_B and d_N between persistence diagrams of X and its projection in terms of the eigenvalues of the covariance matrix. And for k-biLipschitz maps, we show that d_N is bounded by the product of (k^2-1)/k and the ratio of diameters of X and f(X).
翻译:持久性图被用作假设从流形采样的点云数据的特征签名,并以紧凑方式表示其拓扑结构。此外,通过使用瓶颈距离d_B直接比较两个持久性图,可以比较给定的两个点云。但该流程的一个潜在缺陷是:当采样自拓扑相似流形的点云之间存在较大尺度差异时,其d_B值可能任意大。这种情况在旨在保持拓扑结构的降维框架中尤为典型。我们定义了一种新的持久性图间尺度不变距离——归一化瓶颈距离d_N,并研究其性质。在定义d_N的过程中,我们开发了一个更广泛的框架——度量分解,用于通过双射比较等基数有限度量空间。我们利用度量分解推导了相关双射映射畸变的显式界,从而证明了d_N的稳定性结果。随后,我们研究了两种主流降维技术——Johnson-Lindenstrauss (JL)投影与度量多维缩放(mMDS),以及第三类一般双Lipschitz映射。针对d_N,我们给出了这些降维技术保持同调性能的新界。对于将输入X映射为f(X)的JL映射f,我们证明d_N(dgm(X), dgm(f(X)) < e,其中dgm(X)为X的Vietoris-Rips持久性图,0 < e < 1为f保持成对距离的容忍度。对于mMDS,我们基于协方差矩阵特征值给出了X与其投影的持久性图间d_B和d_N的新界。对于k-双Lipschitz映射,我们证明d_N受(k²-1)/k与X和f(X)直径之比的乘积约束。