Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.
翻译:持续同调(PH)为加权图等几何数据提供了拓扑描述符,这些描述符具有可解释性、对扰动的稳定性,并在重新标记等变换下保持不变。PH的大多数应用集中于单参数情形——其中描述符通过单个感兴趣的量对数据进行过滤,总结数据拓扑结构的变化——目前已有多种方法能够利用单参数PH描述符作为希尔伯特空间中的元素实现其在数据科学中的稳定向量化。尽管通过多个感兴趣的量对数据进行过滤得到的多参数持续同调(MPH)比单参数情形编码了更丰富的信息,但MPH描述符稳定性结果的稀缺性至今限制了其稳定向量化的可选方案。本文旨在融合两者优势,通过展示如何将符号条形码(一种新兴的MPH描述符家族)解释为符号测度,从而将向量化策略从单参数自然扩展到多参数。所得特征向量易于定义与计算,且具有可证明的稳定性。尽管作为概念验证,我们仅关注符号条形码与向量化的简单选择,但在各类数据上将我们的特征向量与最先进的基于拓扑的方法相比,已观察到显著的性能提升。