Topological data analysis (TDA) is gaining prominence across a wide spectrum of machine learning tasks that spans from manifold learning to graph classification. A pivotal technique within TDA is persistent homology (PH), which furnishes an exclusive topological imprint of data by tracing the evolution of latent structures as a scale parameter changes. Present PH tools are confined to analyzing data through a single filter parameter. However, many scenarios necessitate the consideration of multiple relevant parameters to attain finer insights into the data. We address this issue by introducing the Effective Multidimensional Persistence (EMP) framework. This framework empowers the exploration of data by simultaneously varying multiple scale parameters. The framework integrates descriptor functions into the analysis process, yielding a highly expressive data summary. It seamlessly integrates established single PH summaries into multidimensional counterparts like EMP Landscapes, Silhouettes, Images, and Surfaces. These summaries represent data's multidimensional aspects as matrices and arrays, aligning effectively with diverse ML models. We provide theoretical guarantees and stability proofs for EMP summaries. We demonstrate EMP's utility in graph classification tasks, showing its effectiveness. Results reveal that EMP enhances various single PH descriptors, outperforming cutting-edge methods on multiple benchmark datasets.
翻译:拓扑数据分析(TDA)正逐渐在从流形学习到图分类的各类机器学习任务中崭露头角。持久同调(PH)作为TDA的核心技术,通过追踪尺度参数变化时潜在结构的演化,为数据提供独特的拓扑印记。现有PH工具仅限于通过单一过滤参数分析数据,但许多场景需要同时考虑多个相关参数以获取更精细的数据洞见。为解决这一问题,我们首次提出有效多维持久化(EMP)框架。该框架通过同时调整多个尺度参数实现数据探索,将描述符函数融入分析过程,生成高度表达性的数据概要。它能够将经典单参数PH摘要无缝转化为多维对应形式,例如EMP景观、轮廓图、影像与曲面。这些摘要以矩阵与数组形式表征数据的多维特征,与多种机器学习模型高效兼容。我们提供了EMP摘要的理论保证与稳定性证明,并在图分类任务中验证其有效性。结果表明,EMP能增强各类单参数PH描述符的性能,在多个基准数据集上超越前沿方法。