We propose a statistical framework to identify topological differences in two populations of random geometric objects. The proposed framework involves first associating a topological signature with random geometric objects and then performing a two-sample test using the observed topological signatures. We associate persistence barcodes, a topological signature from topological data analysis, with each observed random geometric object. This, in turn, yields a two-sample problem on the space of persistence barcodes. As the space of persistence barcodes is not suitable for standard statistical analysis, we translate the two-sample problem on a suitable subset of a Euclidean space. In the course of this study, we embed the topological signatures in an ordered convex cone in a Euclidean space using functions from tropical geometry. We show that the embedding is a sufficient statistic for the persistence barcodes. This fact leads to the proposal of a two-sample test based on this sufficient statistic, and its equivalence to the two-sample problem on the barcode space is established. Finally, the consistency of the proposed test is studied.
翻译:我们提出一个统计框架来识别随机几何对象两个群体间的拓扑差异。所提出的框架首先将拓扑特征与随机几何对象关联,然后利用观测到的拓扑特征进行双样本检验。我们将持久性条形码——一种来自拓扑数据分析的拓扑特征——与每个观测到的随机几何对象相关联。这进而引出了持久性条形码空间上的双样本问题。由于持久性条形码空间不适合进行标准统计分析,我们将该双样本问题转换到欧几里得空间的适当子集上。在本研究过程中,我们利用热带几何中的函数将拓扑特征嵌入到欧几里得空间的有序凸锥中。我们证明该嵌入是持久性条形码的充分统计量。这一事实促使我们提出基于该充分统计量的双样本检验,并建立了其与条形码空间上双样本问题的等价性。最后,我们研究了所提出检验的一致性。