A $(2+\varepsilon)$-Approximation Algorithm for Metric $k$-Median

In the classical NP-hard metric $k$-median problem, we are given a set of $n$ clients and centers with metric distances between them, along with an integer parameter $k\geq 1$. The objective is to select a subset of $k$ open centers that minimizes the total distance from each client to its closest open center. In their seminal work, Jain, Mahdian, Markakis, Saberi, and Vazirani presented the Greedy algorithm for facility location, which implies a $2$-approximation algorithm for $k$-median that opens $k$ centers in expectation. Since then, substantial research has aimed at narrowing the gap between their algorithm and the best achievable approximation by an algorithm guaranteed to open exactly $k$ centers. During the last decade, all improvements have been achieved by leveraging their algorithm or a small improvement thereof, followed by a second step called bi-point rounding, which inherently increases the approximation guarantee. Our main result closes this gap: for any $ε>0$, we present a $(2+ε)$-approximation algorithm for $k$-median, improving the previous best-known approximation factor of $2.613$. Our approach builds on a combination of two algorithms. First, we present a non-trivial modification of the Greedy algorithm that operates with $O(\log n/ε^2)$ adaptive phases. Through a novel walk-between-solutions approach, this enables us to construct a $(2+ε)$-approximation algorithm for $k$-median that consistently opens at most $k + O(\log n{/ε^2})$ centers. Second, we develop a novel $(2+ε)$-approximation algorithm tailored for stable instances, where removing any center from an optimal solution increases the cost by at least an $Ω(ε^3/\log n)$ fraction. Achieving this involves a sampling approach inspired by the $k$-means++ algorithm and a reduction to submodular optimization subject to a partition matroid.

翻译：在经典的NP难度量$k$-中位数问题中，我们给定一组$n$个客户与中心，它们之间具有度量距离，以及一个整数参数$k\geq 1$。目标是选择一个包含$k$个开放中心的子集，使得每个客户到其最近开放中心的总距离最小化。在其开创性工作中，Jain、Mahdian、Markakis、Saberi和Vazirani提出了设施选址的贪婪算法，该算法可导出$k$-中位数的$2$-近似算法，且平均开放$k$个中心。此后，大量研究致力于缩小该算法与保证恰好开放$k$个中心的最佳可达近似比之间的差距。过去十年间，所有改进均通过利用该算法或其小幅改进，再结合称为双点取整的第二步实现，而该步骤本质上增加了近似保证。我们的主要结果填补了这一空白：对于任意$\varepsilon>0$，我们提出了$k$-中位数的$(2+\varepsilon)$-近似算法，将此前最佳近似比$2.613$予以改进。我们的方法基于两种算法的组合。首先，我们对贪婪算法进行了非平凡改进，该算法包含$O(\log n/\varepsilon^2)$个自适应阶段。通过一种新颖的“解间游走”方法，这使我们能够构建$k$-中位数的$(2+\varepsilon)$-近似算法，其始终开放至多$k + O(\log n/\varepsilon^2)$个中心。其次，我们针对稳定实例开发了一种新颖的$(2+\varepsilon)$-近似算法，其中从最优解中移除任意中心会使成本至少增加$\Omega(\varepsilon^3/\log n)$比例。实现这一目标涉及受$k$-均值++算法启发的采样方法，以及归约至受划分拟阵约束的子模优化问题。