Fixed-radius nearest neighbor search is a common database operation that retrieves all data points within a user-specified distance to a query point. There are efficient approximate nearest neighbor search algorithms that provide fast query responses but they often have a very compute-intensive indexing phase and require parameter tuning. Therefore, exact brute force and tree-based search methods are still widely used. Here we propose a new fixed-radius nearest neighbor search method that significantly improves over brute force and tree-based methods in terms of index and query time, reliably returns exact results, and requires no parameter tuning. The method exploits a sorting of the data points by their first principal component, thereby facilitating a reduction in query search space. Further speedup is gained from an efficient implementation using high-level Basic Linear Algebra Subprograms (BLAS). We provide theoretical analysis of our method and demonstrate its practical performance when used stand-alone and when applied within the DBSCAN clustering algorithm.
翻译:固定半径最近邻搜索是一种常见的数据操作,用于检索所有与查询点距离在用户指定范围内的数据点。存在高效的近似最近邻搜索算法能够提供快速查询响应,但这些算法通常需要计算密集的索引阶段和参数调优。因此,精确的暴力搜索和基于树的搜索方法仍被广泛使用。本文提出一种新的固定半径最近邻搜索方法,该方法在索引和查询时间上显著优于暴力搜索和基于树的方法,能可靠返回精确结果,且无需参数调优。该方法通过按第一主成分对数据点进行排序,从而缩减查询搜索空间。进一步通过使用高级基本线性代数子程序(BLAS)的高效实现获得速度提升。我们对所提方法进行理论分析,并展示其在独立使用及应用于DBSCAN聚类算法时的实际性能表现。