In this paper, we study the angle testing problem in the context of similarity search in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, one designed for angle comparison and the other for angle thresholding. Unlike existing approaches that rely on random projection vectors drawn from Gaussian distributions, our approach leverages reference angles and adopts a deterministic structure for the projection vectors. Notably, our kernel functions do not require asymptotic assumptions, such as the number of projection vectors tending to infinity, and can be theoretically and experimentally shown to outperform Gaussian-distribution-based kernel functions. We apply the proposed kernel function to Approximate Nearest Neighbor Search (ANNS) and demonstrate that our approach achieves a 2.5x--3x higher query-per-second (QPS) throughput compared to the widely-used graph-based search algorithm HNSW.
翻译:本文研究高维欧几里得空间中相似性搜索背景下的角度测试问题,提出了两种基于投影的概率核函数,一种专为角度比较设计,另一种用于角度阈值判定。与现有方法依赖从高斯分布中抽取随机投影向量不同,我们的方法利用参考角度,并为投影向量采用确定性结构。值得注意的是,我们的核函数无需依赖渐近假设(例如投影向量数量趋于无穷),并且可以从理论和实验上证明其性能优于基于高斯分布的核函数。我们将所提出的核函数应用于近似最近邻搜索(ANNS),并证明与广泛使用的基于图的搜索算法HNSW相比,我们的方法实现了2.5倍至3倍的每秒查询数(QPS)吞吐量提升。