Given a reference set $R$ of $n$ points and a query set $Q$ of $m$ points in a metric space, this paper studies an important problem of finding $k$-nearest neighbors of every point $q \in Q$ in the set $R$ in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree on $R$ and attempted to prove that this tree can be built in $O(n\log n)$ time while the nearest neighbor search can be done in $O(n\log m)$ time with a hidden dimensionality factor. This paper fills a substantial gap in the past proofs of time complexity by defining a simpler compressed cover tree on the reference set $R$. The first new algorithm constructs a compressed cover tree in $O(n \log n)$ time. The second new algorithm finds all $k$-nearest neighbors of all points from $Q$ using a compressed cover tree in time $O(m(k+\log n)\log k)$ with a hidden dimensionality factor depending on point distributions of the given sets $R,Q$ but not on their sizes.
翻译:给定度量空间中的参考点集 $R$(含 $n$ 个点)与查询点集 $Q$(含 $m$ 个点),本文研究在近似线性时间内对每个 $q \in Q$ 在 $R$ 中寻找 $k$ 近邻的关键问题。在ICML 2006的论文中,Beygelzimer、Kakade与Langford提出了基于 $R$ 的覆盖树,尝试证明该树可在 $O(n\log n)$ 时间内构建,且最近邻搜索可在 $O(n\log m)$ 时间内完成(含隐藏的维度因子)。本文通过定义参考集 $R$ 上更简洁的压缩覆盖树,填补了先前时间复杂度证明中的重大漏洞。第一个新算法可在 $O(n \log n)$ 时间内构建压缩覆盖树。第二个新算法利用压缩覆盖树,在 $O(m(k+\log n)\log k)$ 时间内找出 $Q$ 中所有点的全部 $k$ 近邻,其中隐藏的维度因子取决于给定集合 $R, Q$ 的点分布而非其规模。