The $k$-median and $k$-means clustering objectives are classic objectives for modeling clustering in a metric space. Given a set of points in a metric space, the goal of the $k$-median (resp. $k$-means) problem is to find $k$ representative points so as to minimize the sum of the distances (resp. sum of squared distances) from each point to its closest representative. Cohen-Addad, Feldmann, and Saulpic [JACM'21] showed how to obtain a $(1+\varepsilon)$-factor approximation in low-dimensional Euclidean metric for both the $k$-median and $k$-means problems in near-linear time $2^{(1/\varepsilon)^{O(d^2)}} n \cdot \text{polylog}(n)$ (where $d$ is the dimension and $n$ is the number of input points). We improve this running time to $2^{\tilde{O}(1/\varepsilon)^{d-1}} \cdot n \cdot \text{polylog}(n)$, and show an almost matching lower bound: under the Gap Exponential Time Hypothesis for 3-SAT, there is no $2^{{o}(1/\varepsilon^{d-1})} n^{O(1)}$ algorithm achieving a $(1+\varepsilon)$-approximation for $k$-means.
翻译:$k$-中位数与$k$-均值聚类目标函数是度量空间中聚类建模的经典目标。给定度量空间中的点集,$k$-中位数(对应地,$k$-均值)问题的目标是寻找$k$个代表点,以最小化每个点到其最近代表点的距离之和(对应地,距离平方和)。Cohen-Addad、Feldmann与Saulpic [JACM'21] 证明了如何在近线性时间 $2^{(1/\varepsilon)^{O(d^2)}} n \cdot \text{polylog}(n)$ 内(其中 $d$ 为维度,$n$ 为输入点数),对低维欧氏度量空间中的 $k$-中位数与 $k$-均值问题获得 $(1+\varepsilon)$ 倍近似解。我们将该运行时间改进至 $2^{\tilde{O}(1/\varepsilon)^{d-1}} \cdot n \cdot \text{polylog}(n)$,并给出了几乎匹配的下界:基于3-SAT的间隙指数时间假设,不存在 $2^{{o}(1/\varepsilon^{d-1})} n^{O(1)}$ 算法能够为 $k$-均值问题实现 $(1+\varepsilon)$ 倍近似。