The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the 'manifoldness' of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.
翻译:流形假设(认为数据位于或接近低本质维度的未知流形上)是现代机器学习研究的基本支柱。然而,近期研究表明真实世界数据展现出显著的非流形结构(即奇异点),这可能导致错误结论。因此,检测此类奇异点对于插值和推理任务而言至关重要。我们通过构建拓扑框架解决该问题,该框架能够(i)量化局部本质维度,以及(ii)生成一个用于评估多尺度下"流形性"的欧几里得度得分。我们的方法既能识别复杂空间中的奇异点,也能捕捉图像数据中的奇异结构与局部几何复杂度。