Topological data analysis (TDA) is an area of data science that focuses on using invariants from algebraic topology to provide multiscale shape descriptors for geometric data sets such as point clouds. One of the most important such descriptors is {\em persistent homology}, which encodes the change in shape as a filtration parameter changes; a typical parameter is the feature scale. For many data sets, it is useful to simultaneously vary multiple filtration parameters, for example feature scale and density. While the theoretical properties of single parameter persistent homology are well understood, less is known about the multiparameter case. In particular, a central question is the problem of representing multiparameter persistent homology by elements of a vector space for integration with standard machine learning algorithms. Existing approaches to this problem either ignore most of the multiparameter information to reduce to the one-parameter case or are heuristic and potentially unstable in the face of noise. In this article, we introduce a new general representation framework that leverages recent results on {\em decompositions} of multiparameter persistent homology. This framework is rich in information, fast to compute, and encompasses previous approaches. Moreover, we establish theoretical stability guarantees under this framework as well as efficient algorithms for practical computation, making this framework an applicable and versatile tool for analyzing geometric and point cloud data. We validate our stability results and algorithms with numerical experiments that demonstrate statistical convergence, prediction accuracy, and fast running times on several real data sets.
翻译:拓扑数据分析(TDA)是数据科学的一个领域,侧重于利用代数拓扑中的不变量为点云等几何数据集提供多尺度形状描述符。其中最重要的描述符之一是"持续同调",它编码了随着过滤参数(典型参数为特征尺度)变化时的形状变化。对于许多数据集而言,同时变化多个过滤参数(例如特征尺度和密度)十分有用。虽然单参数持续同调的理论性质已得到充分理解,但多参数情形的研究仍不充分。特别地,一个核心问题是将多参数持续同调表示为向量空间元素,以便与标准机器学习算法集成。现有方法要么忽略大部分多参数信息以简化至单参数情形,要么是启发式的且可能对噪声敏感。本文引入了一种新的通用表示框架,该框架利用了多参数持续同调"分解"的最新成果。该框架信息丰富、计算快速且涵盖了先前方法。此外,我们建立了该框架下的理论稳定性保证以及高效的实际计算算法,使其成为分析几何与点云数据的实用通用工具。我们通过数值实验验证了稳定性结果和算法,在多个真实数据集上展示了统计收敛性、预测准确性和快速运行时间。