Given a reference set $R$ of $n$ points and a query set $Q$ of $m$ points in a metric space, this paper studies an important problem of finding $k$-nearest neighbors of every point $q \in Q$ in the set $R$ in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree on $R$ and attempted to prove that this tree can be built in $O(n\log n)$ time while the nearest neighbor search can be done in $O(n\log m)$ time with a hidden dimensionality factor. This paper fills a substantial gap in the past proofs of time complexity by defining a simpler compressed cover tree on the reference set $R$. The first new algorithm constructs a compressed cover tree in $O(n \log n)$ time. The second new algorithm finds all $k$-nearest neighbors of all points from $Q$ using a compressed cover tree in time $O(m(k+\log n)\log k)$ with a hidden dimensionality factor depending on point distributions of the given sets $R,Q$ but not on their sizes.
翻译:给定度量空间中包含n个点的参考集R和包含m个点的查询集Q,本文研究在近线性时间内为每个点q∈Q在R中寻找k近邻的重要问题。在ICML 2006的论文中,Beygelzimer、Kakade和Langford引入了R上的覆盖树,并试图证明该树可在O(n log n)时间内构建,而最近邻搜索可在O(n log m)时间内完成(其中隐含维度因子)。本文通过在参考集R上定义更简洁的压缩覆盖树,填补了过去时间复杂度证明中的重大空白。第一个新算法在O(n log n)时间内构建压缩覆盖树。第二个新算法利用压缩覆盖树在O(m(k+log n)log k)时间内找出Q中所有点的所有k近邻,其隐含维度因子取决于给定集合R、Q的点分布,而非其规模。