Persistent homology is a widely-used tool in topological data analysis (TDA) for understanding the underlying shape of complex data. By constructing a filtration of simplicial complexes from data points, it captures topological features such as connected components, loops, and voids across multiple scales. These features are encoded in persistence diagrams (PDs), which provide a concise summary of the data's topological structure. However, the non-Hilbert nature of the space of PDs poses challenges for their direct use in machine learning applications. To address this, kernel methods and vectorization techniques have been developed to transform PDs into machine-learning-compatible formats. In this paper, we introduce a new software package designed to streamline the vectorization of PDs, offering an intuitive workflow and advanced functionalities. We demonstrate the necessity of the package through practical examples and provide a detailed discussion on its contributions to applied TDA. Definitions of all vectorization summaries used in the package are included in the appendix.
翻译:持久同调是拓扑数据分析中广泛使用的工具,用于理解复杂数据的底层形状结构。该方法通过从数据点构建单纯复形滤链,捕获跨多个尺度的拓扑特征(如连通分量、环状结构及空洞)。这些特征被编码在持久性图中,从而对数据的拓扑结构提供简洁的概要描述。然而,持久性图空间的非希尔伯特特性使其难以直接应用于机器学习。为此,研究者开发了核方法与向量化技术,将持久性图转换为机器学习兼容的格式。本文介绍一款旨在简化持久性图向量化流程的新型软件包,该工具提供直观的工作流与高级功能。我们通过实际案例论证该软件包的必要性,并详细探讨其对应用拓扑数据分析的贡献。软件包中使用的所有向量化概要定义均收录于附录中。