The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the 'manifoldness' of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.
翻译:流形假设(即数据位于或接近一个低本征维度的未知流形上)是现代机器学习研究的基础假设之一。然而,近期研究表明真实世界数据呈现出明显的非流形结构(即奇异点),这可能导致错误结论。因此,作为插值与推理任务的前置环节,检测这些奇异点至关重要。我们通过发展一个拓扑框架来解决该问题,该框架能够:(i)量化局部本征维度,(ii)得出一个评估点沿多个尺度“流形性”的欧几里得得分。我们的方法不仅能识别复杂空间中的奇异点,还能捕捉图像数据中的奇异结构与局部几何复杂性。