Finding the nearest neighbor to a hyperplane (or Point-to-Hyperplane Nearest Neighbor Search, simply P2HNNS) is a new and challenging problem with applications in many research domains. While existing state-of-the-art hashing schemes (e.g., NH and FH) are able to achieve sublinear time complexity without the assumption of the data being in a unit hypersphere, they require an asymmetric transformation, which increases the data dimension from $d$ to $\Omega(d^2)$. This leads to considerable overhead for indexing and incurs significant distortion errors. In this paper, we investigate a tree-based approach for solving P2HNNS using the classical Ball-Tree index. Compared to hashing-based methods, tree-based methods usually require roughly linear costs for construction, and they provide different kinds of approximations with excellent flexibility. A simple branch-and-bound algorithm with a novel lower bound is first developed on Ball-Tree for performing P2HNNS. Then, a new tree structure named BC-Tree, which maintains the Ball and Cone structures in the leaf nodes of Ball-Tree, is described together with two effective strategies, i.e., point-level pruning and collaborative inner product computing. BC-Tree inherits both the low construction cost and lightweight property of Ball-Tree while providing a similar or more efficient search. Experimental results over 16 real-world data sets show that Ball-Tree and BC-Tree are around 1.1$\sim$10$\times$ faster than NH and FH, and they can reduce the index size and indexing time by about 1$\sim$3 orders of magnitudes on average. The code is available at \url{https://github.com/HuangQiang/BC-Tree}.
翻译:寻找超平面的最近邻(即点到超平面最近邻搜索,简称P2HNNS)是一个新颖且具有挑战性的问题,广泛应用于多个研究领域。现有最先进的哈希方案(如NH和FH)虽能在不假设数据位于单位超球面的前提下实现次线性时间复杂度,但它们需要非对称变换,这将数据维度从$d$增加至$\Omega(d^2)$,从而导致显著的索引开销并引入较大的失真误差。本文研究了一种基于树的P2HNNS求解方法,采用经典的球树索引。与基于哈希的方法相比,基于树的方法通常构建成本近似线性,并能以极佳的灵活性提供不同种类的近似。首先,我们在球树上开发了一种简单分支定界算法,并提出了新颖的下界用于执行P2HNNS。其次,我们描述了一种名为BC-Tree的新树结构,该结构在球树的叶节点中同时维护了球和锥结构,并辅以两种有效策略,即逐点剪枝与协同内积计算。BC-Tree继承了球树低构建成本和轻量特性的同时,能提供相似或更高效的搜索性能。在16个真实数据集上的实验结果表明,球树和BC-Tree比NH和FH快约1.1~10倍,并能将索引大小和索引时间平均降低约1~3个数量级。代码可在\url{https://github.com/HuangQiang/BC-Tree}获取。