A Normalized Bottleneck Distance on Persistence Diagrams and Homology Preservation under Dimension Reduction

Persistence diagrams are used as signatures of point cloud data assumed to be sampled from manifolds, and represent their topology in a compact fashion. Further, two given clouds of points can be compared by directly comparing their persistence diagrams using the bottleneck distance, d_B. But one potential drawback of this pipeline is that point clouds sampled from topologically similar manifolds can have arbitrarily large d_B values when there is a large degree of scaling between them. This situation is typical in dimension reduction frameworks that are also aiming to preserve topology. We define a new scale-invariant distance between persistence diagrams termed normalized bottleneck distance, d_N, and study its properties. In defining d_N, we also develop a broader framework called metric decomposition for comparing finite metric spaces of equal cardinality with a bijection. We utilize metric decomposition to prove a stability result for d_N by deriving an explicit bound on the distortion of the associated bijective map. We then study two popular dimension reduction techniques, Johnson-Lindenstrauss (JL) projections and metric multidimensional scaling (mMDS), and a third class of general biLipschitz mappings. We provide new bounds on how well these dimension reduction techniques preserve homology with respect to d_N. For a JL map f that transforms input X to f(X), we show that d_N(dgm(X),dgm(f(X)) < e, where dgm(X) is the Vietoris-Rips persistence diagram of X, and 0 < e < 1 is the tolerance up to which pairwise distances are preserved by f. For mMDS, we present new bounds for both d_B and d_N between persistence diagrams of X and its projection in terms of the eigenvalues of the covariance matrix. And for k-biLipschitz maps, we show that d_N is bounded by the product of (k^2-1)/k and the ratio of diameters of X and f(X).

翻译：持久性图被用作假设从流形采样的点云数据的特征签名，并以紧凑方式表示其拓扑结构。此外，通过使用瓶颈距离d_B直接比较两个持久性图，可以比较给定的两个点云。但该流程的一个潜在缺陷是：当采样自拓扑相似流形的点云之间存在较大尺度差异时，其d_B值可能任意大。这种情况在旨在保持拓扑结构的降维框架中尤为典型。我们定义了一种新的持久性图间尺度不变距离——归一化瓶颈距离d_N，并研究其性质。在定义d_N的过程中，我们开发了一个更广泛的框架——度量分解，用于通过双射比较等基数有限度量空间。我们利用度量分解推导了相关双射映射畸变的显式界，从而证明了d_N的稳定性结果。随后，我们研究了两种主流降维技术——Johnson-Lindenstrauss (JL)投影与度量多维缩放(mMDS)，以及第三类一般双Lipschitz映射。针对d_N，我们给出了这些降维技术保持同调性能的新界。对于将输入X映射为f(X)的JL映射f，我们证明d_N(dgm(X), dgm(f(X)) < e，其中dgm(X)为X的Vietoris-Rips持久性图，0 < e < 1为f保持成对距离的容忍度。对于mMDS，我们基于协方差矩阵特征值给出了X与其投影的持久性图间d_B和d_N的新界。对于k-双Lipschitz映射，我们证明d_N受(k²-1)/k与X和f(X)直径之比的乘积约束。