Many techniques in machine learning attempt explicitly or implicitly to infer a low-dimensional manifold structure of an underlying physical phenomenon from measurements without an explicit model of the phenomenon or the measurement apparatus. This paper presents a cautionary tale regarding the discrepancy between the geometry of measurements and the geometry of the underlying phenomenon in a benign setting. The deformation in the metric illustrated in this paper is mathematically straightforward and unavoidable in the general case, and it is only one of several similar effects. While this is not always problematic, we provide an example of an arguably standard and harmless data processing procedure where this effect leads to an incorrect answer to a seemingly simple question. Although we focus on manifold learning, these issues apply broadly to dimensionality reduction and unsupervised learning.
翻译:机器学习中的许多技术试图在缺乏显式模型的情况下,从测量数据中隐式或显式地推断底层物理现象的低维流形结构。本文在一个温和场景中提出警示:测量数据的几何结构与底层物理现象的几何结构之间存在不可忽视的偏差。文中展示的度量形变在数学上具有明确的必然性,且普遍情况下无法避免,仅是若干类似效应之一。尽管此问题并非始终导致严重后果,但我们提供了一个典型示例——看似无害的标准数据处理流程中,该效应使一个原本简单的问题得出错误结论。虽然本文聚焦于流形学习领域,但这些发现广泛适用于降维与无监督学习方法。