Topological data analysis (TDA) allows us to explore the topological features of a dataset. Among topological features, lower dimensional ones have recently drawn the attention of practitioners in mathematics and statistics due to their potential to aid the discovery of low dimensional structure in a data set. However, lower dimensional features are usually challenging to detect based on finite samples and using TDA methods that ignore the probabilistic mechanism that generates the data. In this paper, lower dimensional topological features occurring as zero-density regions of density functions are introduced and thoroughly investigated. Specifically, we consider sequences of coverings for the support of a density function in which the coverings are comprised of balls with shrinking radii. We show that, when these coverings satisfy certain sufficient conditions as the sample size goes to infinity, we can detect lower dimensional, zero-density regions with increasingly higher probability while guarding against false detection. We supplement the theoretical developments with the discussion of simulated experiments that elucidate the behavior of the methodology for different choices of the tuning parameters that govern the construction of the covering sequences and characterize the asymptotic results.
翻译:拓扑数据分析(TDA)使我们能够探索数据集的拓扑特征。在拓扑特征中,低维特征最近引起了数学与统计学实践者的关注,因其可能有助于发现数据集中的低维结构。然而,利用忽略数据生成概率机制的TDA方法,基于有限样本检测低维特征通常具有挑战性。本文引入并深入研究了作为密度函数零密度区域出现的低维拓扑特征。具体而言,我们考虑密度函数支撑集的一系列覆盖,这些覆盖由半径递减的球体构成。研究表明,当这些覆盖满足某些充分条件且样本量趋于无穷时,我们能够以越来越高的概率检测到低维零密度区域,同时防范误检。我们通过讨论模拟实验来补充理论发展,这些实验揭示了在不同调参参数选择下该方法的运行特性——这些参数控制着覆盖序列的构建并刻画了渐近结果的特性。