In this paper, we explore how to use topological tools to compare dimension reduction methods. We first make a brief overview of some of the methods often used dimension reduction such as Isometric Feature Mapping, Laplacian Eigenmaps, Fast Independent Component Analysis, Kernel Ridge Regression, t-distributed Stochastic Neighbor Embedding. We then give a brief overview of some topological notions used in topological data analysis, such as, barcodes, persistent homology, and Wasserstein distance. Theoretically, these methods applied on a data set can be interpreted differently. From EEG data embedded into a manifold of high dimension, we apply these methods and we compare them across persistent homologies of dimension 0, 1, and 2, that is, across connected components, tunnels and holes, shells around voids or cavities. We find that from three dimension clouds of points, it is not clear how distinct from each other the methods are, but Wasserstein and Bottleneck distances, topological tests of hypothesis, and various methods show that the methods qualitatively and significantly differ across homologies.
翻译:本文探讨如何利用拓扑工具比较降维方法。首先简要概述几种常用降维方法,包括等距特征映射、拉普拉斯特征映射、快速独立成分分析、核岭回归、t分布随机邻域嵌入。随后简要介绍拓扑数据分析中的若干拓扑概念,如条形码、持续同调及Wasserstein距离。理论上,这些方法应用于数据集时可产生不同的解释。我们将这些方法应用于嵌入高维流形的脑电图数据,并通过0维、1维、2维持续同调(即连通分量、隧道与孔洞、空腔周围壳层)进行比较。研究发现,从三维点云数据来看,各方法间的差异并不明确,但Wasserstein距离、Bottleneck距离、拓扑假设检验及多种方法表明,这些方法在同调维度上存在定性与显著性差异。