Approximate nearest neighbor search under universal L_p metrics (ANNS-U-L_p) is an important and challenging research problem, as it requires answering queries under all possible p (0<p <= 2) values simultaneously without building an index for each possible p value. The state-of-the-art solution, called MLSH, is a Locality-Sensitive Hashing (LSH)-based ANNS method with barely acceptable query performance. In contrast, graph-based ANNS methods, which offer significantly improved query efficiency on the ANNS-L_p problem (with a fixed p-value), cannot be naively extended to the ANNS-U-$L_p$ problem. In this paper, we propose U-HNSW, the first graph-based method for ANNS-U-L_p. Our scheme uses HNSW graph indexes built on two base metrics ($L_1$ and $L_2$) to generate promising nearest neighbors candidates, and then verifies these candidates with an early-termination strategy that substantially reduces the number of expensive L_p distance computations. Experimental results show that U-HNSW not only achieves up to 2670 times shorter query times than the original MLSH implementation running on a RAM disk (up to 15 times shorter than the idealized MLSH), but also outperforms the original HNSW on the ANNS-L_p problem (with a fixed p-value), except for a few special p values.
翻译:通用L_p度量下的近似最近邻搜索(ANNS-U-L_p)是一个重要且富有挑战性的研究问题,因为它要求在不针对每个可能的p值构建索引的情况下,同时回答所有可能的p(0 < p ≤ 2)值下的查询。现有最优方法MLSH是一种基于局部敏感哈希(LSH)的ANNS方法,其查询性能勉强可接受。相比之下,基于图的ANNS方法在ANNS-L_p问题(固定p值)上可显著提升查询效率,但不能直接推广到ANNS-U-L_p问题。本文提出U-HNSW,这是首个针对ANNS-U-L_p的基于图的方法。我们的方案利用基于两种基度量(L_1和L_2)构建的HNSW图索引来生成有希望的最近邻候选点,然后通过早期终止策略验证这些候选点,大幅减少昂贵L_p距离计算的次数。实验结果表明,U-HNSW不仅在RAM磁盘上运行的原始MLSH实现相比,查询时间缩短高达2670倍(与理想化MLSH相比缩短高达15倍),而且在ANNS-L_p问题(固定p值)上,除少数特殊p值外,其性能也优于原始HNSW。