Many techniques in machine learning attempt explicitly or implicitly to infer a low-dimensional manifold structure of an underlying physical phenomenon from measurements without an explicit model of the phenomenon or the measurement apparatus. This paper presents a cautionary tale regarding the discrepancy between the geometry of measurements and the geometry of the underlying phenomenon in a benign setting. The deformation in the metric illustrated in this paper is mathematically straightforward and unavoidable in the general case, and it is only one of several similar effects. While this is not always problematic, we provide an example of an arguably standard and harmless data processing procedure where this effect leads to an incorrect answer to a seemingly simple question. Although we focus on manifold learning, these issues apply broadly to dimensionality reduction and unsupervised learning.
翻译:许多机器学习技术试图在缺乏现象或测量设备的显式模型情况下,从测量数据中显式或隐式地推断潜在物理现象的低维流形结构。本文提供了一个警示性案例,说明在良性设定下测量几何与潜在现象几何之间的差异。本文阐述的度量变形在数学上具有直接性,且在一般情况下不可避免,而这仅是若干类似效应之一。尽管这并非总会引发问题,但我们提供了一个看似标准且无害的数据处理流程实例,其中该效应导致对一个看似简单的问题给出了错误答案。虽然我们聚焦于流形学习,但这些问题广泛适用于降维和无监督学习。