We present a new approach to approximate nearest-neighbor queries in fixed dimension under a variety of non-Euclidean distances. We are given a set $S$ of $n$ points in $\mathbb{R}^d$, an approximation parameter $\varepsilon > 0$, and a distance function that satisfies certain smoothness and growth-rate assumptions. The objective is to preprocess $S$ into a data structure so that for any query point $q$ in $\mathbb{R}^d$, it is possible to efficiently report any point of $S$ whose distance from $q$ is within a factor of $1+\varepsilon$ of the actual closest point. Prior to this work, the most efficient data structures for approximate nearest-neighbor searching in spaces of constant dimensionality applied only to the Euclidean metric. This paper overcomes this limitation through a method called convexification. For admissible distance functions, the proposed data structures answer queries in logarithmic time using $O(n \log (1 / \varepsilon) / \varepsilon^{d/2})$ space, nearly matching the best known bounds for the Euclidean metric. These results apply to both convex scaling distance functions (including the Mahalanobis distance and weighted Minkowski metrics) and Bregman divergences (including the Kullback-Leibler divergence and the Itakura-Saito distance).
翻译:我们提出一种新方法,用于在固定维度下使用多种非欧几里得距离进行近似最近邻查询。给定点集 $S$ 包含 $\mathbb{R}^d$ 中的 $n$ 个点,近似参数 $\varepsilon > 0$,以及满足特定光滑性和增长率假设的距离函数。目标是预处理 $S$ 构建数据结构,使得对于 $\mathbb{R}^d$ 中的任意查询点 $q$,能够高效地报告 $S$ 中与 $q$ 距离在真实最近点距离的 $1+\varepsilon$ 因子范围内的任意点。在此工作之前,常维空间中最高效的近似最近邻搜索数据结构仅适用于欧几里得度量。本文通过一种称为凸化(convexification)的方法克服了这一局限。对于可容许距离函数,所提出的数据结构在对数时间内完成查询,空间复杂度为 $O(n \log (1 / \varepsilon) / \varepsilon^{d/2})$,几乎匹配欧几里得度量的已知最优界。这些结果适用于凸缩放距离函数(包括马氏距离和加权闵可夫斯基度量)以及Bregman散度(包括Kullback-Leibler散度和Itakura-Saito距离)。