The stability of persistent homology has led to wide applications of the persistence diagram as a trusted topological descriptor in the presence of noise. However, with the increasing demand for high-dimension and low-sample-size data processing in modern science, it is questionable whether persistence diagrams retain their reliability in the presence of high-dimensional noise. This work aims to study the reliability of persistence diagrams in the high-dimension low-sample-size data setting. By analyzing the asymptotic behavior of persistence diagrams for high-dimensional random data, we show that persistence diagrams are no longer reliable descriptors of low-sample-size data under high-dimensional noise perturbations. We refer to this loss of reliability of persistence diagrams in such data settings as the curse of dimensionality on persistence diagrams. Next, we investigate the possibility of using normalized principal component analysis as a method for reducing the dimensionality of the high-dimensional observed data to resolve the curse of dimensionality. We show that this method can mitigate the curse of dimensionality on persistence diagrams. Our results shed some new light on the challenges of processing high-dimension low-sample-size data by persistence diagrams and provide a starting point for future research in this area.
翻译:持久同调的稳定性使得持久图作为在噪声存在下可信的拓扑描述符得到了广泛应用。然而,随着现代科学对高维低样本量数据处理的需求日益增长,持久图在高维噪声环境下能否保持其可靠性成为一个问题。本文旨在研究高维低样本量数据环境下持久图的可靠性。通过分析高维随机数据持久图的渐近行为,我们证明在高维噪声扰动下,持久图不再能可靠地描述低样本量数据。我们将持久图在这种数据环境下丧失可靠性的现象称为持久图上的维度灾难。接着,我们探讨了使用标准化主成分分析作为降维方法来解决维度灾难的可能性,并证明该方法能够缓解持久图上的维度灾难。我们的研究结果为通过持久图处理高维低样本量数据的挑战提供了新见解,并为该领域的未来研究奠定了起点。