Topological data analysis provides a set of tools to uncover low-dimensional structure in noisy point clouds. Prominent amongst the tools is persistence homology, which summarizes birth-death times of homological features using data objects known as persistence diagrams. To better aid statistical analysis, a functional representation of the diagrams, known as persistence landscapes, enable use of functional data analysis and machine learning tools. Topological and geometric variabilities inherent in point clouds are confounded in both persistence diagrams and landscapes, and it is important to distinguish topological signal from noise to draw reliable conclusions on the structure of the point clouds when using persistence homology. We develop a framework for decomposing variability in persistence diagrams into topological signal and topological noise through alignment of persistence landscapes using an elastic Riemannian metric. Aligned landscapes (amplitude) isolate the topological signal. Reparameterizations used for landscape alignment (phase) are linked to a resolution parameter used to generate persistence diagrams, and capture topological noise in the form of geometric, global scaling and sampling variabilities. We illustrate the importance of decoupling topological signal and topological noise in persistence diagrams (landscapes) using several simulated examples. We also demonstrate that our approach provides novel insights in two real data studies.
翻译:拓扑数据分析提供了一套用于揭示含噪点云中低维结构的工具。其中,持续同调是核心方法之一,它通过称为持续图的数据对象,总结同调特征的生灭时间。为了更有效地辅助统计分析,研究人员提出了持续图的函数化表示——持续景观,从而能够利用函数型数据分析与机器学习工具。点云中固有的拓扑变异性和几何变异性在持续图与持续景观中相互混杂。在使用持续同调分析点云结构时,必须区分拓扑信号与噪声,才能得出可靠的结论。我们构建了一个框架,通过弹性黎曼度量对齐持续景观,将持续图中的变异性分解为拓扑信号与拓扑噪声。对齐后的景观(振幅)能够分离出拓扑信号。用于景观对齐的重新参数化(相位)与生成持续图的分辨率参数相关联,并以几何、全局缩放和采样变异性的形式捕捉拓扑噪声。通过多个模拟实例,我们阐明了在持续图(景观)中解耦拓扑信号与拓扑噪声的重要性。同时,我们的方法在两个真实数据研究中提供了全新的见解。