Approximate nearest neighbor search under universal L_p metrics (ANNS-U-L_p) is an important and challenging research problem, as it requires answering queries under all possible p (0<p <= 2) values simultaneously without building an index for each possible p value. The state-of-the-art solution, called MLSH, is a Locality-Sensitive Hashing (LSH)-based ANNS method with barely acceptable query performance. In contrast, graph-based ANNS methods, which offer significantly improved query efficiency on the ANNS-L_p problem (with a fixed p-value), cannot be naively extended to the ANNS-U-$L_p$ problem. In this paper, we propose U-HNSW, the first graph-based method for ANNS-U-L_p. Our scheme uses HNSW graph indexes built on two base metrics ($L_1$ and $L_2$) to generate promising nearest neighbors candidates, and then verifies these candidates with an early-termination strategy that substantially reduces the number of expensive L_p distance computations. Experimental results show that U-HNSW not only achieves up to 2670 times shorter query times than the original MLSH implementation running on a RAM disk (up to 15 times shorter than the idealized MLSH), but also outperforms the original HNSW on the ANNS-L_p problem (with a fixed p-value), except for a few special p values.
翻译:通用L_p度量下的近似最近邻搜索(ANNS-U-L_p)是一个重要且富有挑战性的研究问题,因为它要求在不针对每个可能的p值构建索引的情况下,同时回答所有p(0<p≤2)值下的查询。现有最优方案MLSH是一种基于局部敏感哈希(LSH)的ANNS方法,其查询性能勉强可接受。相比之下,基于图的ANNS方法虽然在固定p值的ANNS-L_p问题上具有显著更优的查询效率,但无法直接推广至ANNS-U-L_p问题。本文提出U-HNSW,这是首个面向ANNS-U-L_p的基于图的方法。我们的方案利用基于两种基度量(L_1和L_2)构建的HNSW图索引生成有潜力的最近邻候选点,然后通过提前终止策略验证这些候选点,从而大幅减少昂贵的L_p距离计算次数。实验结果表明,U-HNSW不仅在ANNS-U-L_p问题上比运行于RAM磁盘上的原始MLSH实现实现高达2670倍的查询时间缩短(比理想化版MLSH快15倍),而且在固定p值的ANNS-L_p问题上也优于原始HNSW(除少数特殊p值外)。