Density-based distances (DBDs) offer an elegant solution to the problem of metric learning. By defining a Riemannian metric which increases with decreasing probability density, shortest paths naturally follow the data manifold and points are clustered according to the modes of the data. We show that existing methods to estimate Fermat distances, a particular choice of DBD, suffer from poor convergence in both low and high dimensions due to i) inaccurate density estimates and ii) reliance on graph-based paths which are increasingly rough in high dimensions. To address these issues, we propose learning the densities using a normalizing flow, a generative model with tractable density estimation, and employing a smooth relaxation method using a score model initialized from a graph-based proposal. Additionally, we introduce a dimension-adapted Fermat distance that exhibits more intuitive behavior when scaled to high dimensions and offers better numerical properties. Our work paves the way for practical use of density-based distances, especially in high-dimensional spaces.
翻译:基于密度的距离(DBD)为度量学习问题提供了一种优雅的解决方案。通过定义一个随概率密度减小而增加的黎曼度量,最短路径自然地遵循数据流形,且数据点根据数据分布的模态进行聚类。我们发现,现有估计费马距离(一种特定的DBD选择)的方法在低维和高维情况下均存在收敛性不佳的问题,其原因在于:i)密度估计不准确;ii)依赖基于图的路径,而此类路径在高维空间中会变得愈发粗糙。为解决这些问题,我们提出使用归一化流(一种具有可处理密度估计的生成模型)来学习密度,并采用基于图初始化提议的分数模型进行平滑松弛。此外,我们引入了一种维度自适应费马距离,该距离在高维缩放时表现出更直观的行为,并具有更优的数值特性。我们的工作为基于密度的距离的实际应用,尤其是在高维空间中的应用,铺平了道路。