Approximate K nearest neighbor (AKNN) search is a fundamental and challenging problem. We observe that in high-dimensional space, the time consumption of nearly all AKNN algorithms is dominated by that of the distance comparison operations (DCOs). For each operation, it scans full dimensions of an object and thus, runs in linear time wrt the dimensionality. To speed it up, we propose a randomized algorithm named ADSampling which runs in logarithmic time wrt to the dimensionality for the majority of DCOs and succeeds with high probability. In addition, based on ADSampling we develop one general and two algorithm-specific techniques as plugins to enhance existing AKNN algorithms. Both theoretical and empirical studies confirm that: (1) our techniques introduce nearly no accuracy loss and (2) they consistently improve the efficiency.
翻译:近似K近邻(AKNN)搜索是一个基础且具有挑战性的问题。我们观察到,在高维空间中,几乎所有AKNN算法的时间消耗主要取决于距离比较操作(DCO)。每次操作需扫描对象的全维度,因此其运行时间与维度呈线性关系。为加速该过程,我们提出一种名为ADSampling的随机化算法,该算法对大多数DCO的运行时间与维度呈对数关系,并以高概率保证成功。此外,基于ADSampling,我们开发了一种通用技术及两种算法特定技术作为插件,以增强现有AKNN算法。理论分析与实验研究均证实:(1)我们的技术几乎不引入精度损失;(2)它们能持续提升算法效率。