Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.
翻译:持续同调(PH)为几何数据(如加权图)提供可解释、对扰动稳定且在重标号等变换下保持不变的拓扑描述符。大多数PH应用集中于单参数情形——此时描述符总结数据在单一感兴趣量过滤下的拓扑变化——且目前已存在多种方法使单参数PH描述符能够以希尔伯特空间元素的形式用于数据科学,这些方法依赖于这些描述符的稳定向量化。尽管通过多个感兴趣量过滤数据的多参数PH(MPH)能编码比单参数对应物丰富得多的信息,但MPH描述符稳定性结果的匮乏迄今限制了MPH稳定向量化的可用方案。本文旨在融合两者优势,通过展示如何将带符号条形码(最新一类MPH描述符)作为带符号测度的解释,将向量化策略从单参数自然推广至多参数。由此产生的特征向量易于定义、可计算且具有可证明的稳定性。尽管作为概念验证我们仅聚焦于简单的带符号条形码与向量化选择,但在多种数据类型上,我们的特征向量相较于最先进的基于拓扑的方法已展现出显著的性能提升。