Persistent homology barcodes and diagrams are a cornerstone of topological data analysis. Widely used in many real data settings, they relate variation in topological information (as measured by cellular homology) with variation in data, however, they are challenging to use in statistical settings due to their complex geometric structure. In this paper, we revisit the persistent homology rank function -- an invariant measure of ``shape" that was introduced before barcodes and persistence diagrams and captures the same information in a form that is more amenable to data and computation. In particular, since they are functions, techniques from functional data analysis -- a domain of statistics adapted for functions -- apply directly to persistent homology when represented by rank functions. Rank functions, however, have been less popular than barcodes because they face the challenge that stability -- a property that is crucial to validate their use in data analysis -- is difficult to guarantee, mainly due to metric concerns on rank function space. However, rank functions extend more naturally to the increasingly popular and important case of multiparameter persistent homology. In this paper, we study the performance of rank functions in functional inferential statistics and machine learning on both simulated and real data, and in both single and multiparameter persistent homology. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing approaches. We then provide theoretical justification for our numerical experiments and applications to data by deriving several stability results for single- and multiparameter persistence rank functions under various metrics with the underlying aim of computational feasibility and interpretability.
翻译:持久同调条形码和持续性图是拓扑数据分析的基石。它们广泛应用于许多真实数据场景,将拓扑信息(通过胞腔同调测量)的变化与数据变化联系起来,但由于其复杂的几何结构,在统计环境中使用颇具挑战。本文重新审视持久同调秩函数——一种在条形码和持久性图之前引入的“形状”不变度量,它以更适用于数据和计算的形式捕捉相同信息。具体而言,由于秩函数是函数形式,函数型数据分析(适用于函数的统计学领域)技术可直接应用于以秩函数表示的持久同调。然而,秩函数不如条形码流行,因为它们面临稳定性(验证其在数据分析中使用的关键属性)难以保证的挑战,这主要源于秩函数空间上的度量问题。但秩函数能更自然地扩展到日益流行且重要的多参数持久同调情形。本文在模拟和真实数据上,针对单参数和多参数持久同调,研究秩函数在函数推断统计和机器学习中的性能。我们发现,利用秩函数捕捉持久同调的方法相较于现有方法有明显改进。随后,通过推导单参数和多参数持久秩函数在各种度量下的若干稳定性结果,为数值实验及数据应用提供理论依据,其根本目标在于实现计算可行性和可解释性。