For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, and graphics applications, and admits a straightforward $O(d n^2)$-time brute force algorithm. Further, the Chamfer distance is often used as a proxy for the more computationally demanding Earth-Mover (Optimal Transport) Distance. However, the \emph{quadratic} dependence on $n$ in the running time makes the naive approach intractable for large datasets. We overcome this bottleneck and present the first $(1+\epsilon)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time $O(nd \log (n)/\varepsilon^2)$ and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large high-dimensional point clouds. We also give evidence that if the goal is to \emph{report} a $(1+\varepsilon)$-approximate mapping from $A$ to $B$ (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist.
翻译:对于规模不超过$n$的任意两个点集$A,B \subset \mathbb{R}^d$,从$A$到$B$的Chamfer距离定义为$\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$,其中$d_X$为底层距离度量(如欧氏距离或曼哈顿距离)。Chamfer距离是点云间一种常用的不相似性度量,广泛应用于机器学习、计算机视觉和图形学领域,其朴素算法的时间复杂度为$O(d n^2)$。此外,Chamfer距离常被用作计算成本更高的Earth-Mover(最优传输)距离的替代指标。然而,运行时间对$n$的\textit{二次}依赖使得朴素方法在大型数据集上变得不可行。我们克服了这一瓶颈,提出了首个具有近似线性运行时间的$(1+\epsilon)$-近似算法来估计Chamfer距离。具体而言,我们的算法运行时间为$O(nd \log (n)/\varepsilon^2)$,且易于实现。实验表明,该算法在大型高维数据集上兼具准确性和高效性。我们相信该算法将为分析大规模高维点云开辟新途径。同时,我们也提供了证据表明,若目标是\textit{报告}从$A$到$B$的$(1+\varepsilon)$-近似映射(而非仅其距离值),则任何次二次时间算法都几乎不可能存在。