Finding a minimum spanning tree (MST) for $n$ points in an arbitrary metric space is a fundamental primitive for hierarchical clustering and many other ML tasks, but this takes $\Omega(n^2)$ time to even approximate. We introduce a framework for metric MSTs that first (1) finds a forest of disconnected components using practical heuristics, and then (2) finds a small weight set of edges to connect disjoint components of the forest into a spanning tree. We prove that optimally solving the second step still takes $\Omega(n^2)$ time, but we provide a subquadratic 2.62-approximation algorithm. In the spirit of learning-augmented algorithms, we then show that if the forest found in step (1) overlaps with an optimal MST, we can approximate the original MST problem in subquadratic time, where the approximation factor depends on a measure of overlap. In practice, we find nearly optimal spanning trees for a wide range of metrics, while being orders of magnitude faster than exact algorithms.
翻译:在任意度量空间中为$n$个点寻找最小生成树(MST)是层次聚类和许多其他机器学习任务的基础原语,但即使近似求解也需要$\Omega(n^2)$时间。我们提出了一个度量最小生成树框架:首先(1)利用实用启发式方法生成由不连通分量组成的森林,随后(2)通过寻找小权重边集将森林中的互不相交分量连接为生成树。我们证明第二步的最优求解仍需$\Omega(n^2)$时间,但提出了一个亚二次时间的2.62近似算法。基于学习增强算法的思想,我们进一步证明:若步骤(1)生成的森林与最优最小生成树存在重叠,则可在亚二次时间内近似求解原最小生成树问题,其近似因子取决于重叠程度的度量指标。实验表明,本方法能在多种度量空间中获得接近最优的生成树,同时计算速度比精确算法快数个数量级。