Topological data analysis is an emerging field that applies the study of topological invariants to data. Perhaps the simplest of these invariants is the number of connected components or clusters. In this work, we explore a topological framework for cluster analysis and show how it can be used as a basis for explainability in unsupervised data analysis. Our main object of study will be hierarchical data structures referred to as Topological Hierarchical Decompositions (THDs). We give a number of examples of how traditional clustering algorithms can be topologized, and provide preliminary results on the THDs associated with Reeb graphs and the mapper algorithm. In particular, we give a generalized construction of the mapper functor as a pixelization of a cosheaf in order to generalize multiscale mapper.
翻译:拓扑数据分析是一个新兴领域,它应用拓扑不变量对数据进行分析。在这些不变量中,最简单的大概是连通分量(即簇)的数量。本文探索了一种用于聚类分析的拓扑框架,并展示了它如何作为无监督数据分析可解释性的基础。我们的主要研究对象是称为拓扑层级分解(THDs)的层级数据结构。我们给出了若干示例,说明如何将传统聚类算法拓扑化,并提供了与Reeb图和mapper算法相关的THDs初步结果。特别地,我们给出了一种泛化构造,将mapper函子作为余层的像素化实现,以推广多尺度mapper。