Near Linear Time Approximation Schemes for Clustering of Partially Doubling Metrics

Given a finite metric space $(X\cup Y, \mathbf{d})$ the $k$-median problem is to find a set of $k$ centers $C\subseteq Y$ that minimizes $\sum_{p\in X} \min_{c\in C} \mathbf{d}(p,c)$. In general metrics, the best polynomial time algorithm computes a $(2+ε)$-approximation for arbitrary $ε>0$ (Cohen-Addad et al. STOC 2025). However, if the metric is doubling, a near linear time $(1+ε)$-approximation algorithm is known (Cohen-Addad et al. J. ACM 2021). We show that the $(1+ε)$-approximation algorithm can be generalized to the case when either $X$ or $Y$ has bounded doubling dimension (but the other set not). The case when $X$ is doubling is motivated by the assumption that even though $X$ is part of a high-dimensional space, it may be that it is close to a low-dimensional structure. The case when $Y$ is doubling is motivated by specific clustering problems where the centers are low-dimensional. Specifically, our work in this setting implies the first near linear time approximation algorithm for the $(k,\ell)$-median problem under discrete Fréchet distance when $\ell$ is constant. We further introduce a novel complexity reduction for time series of real values that leads to a similar result for the case of discrete Fréchet distance. In order to solve the case when $Y$ has a bounded doubling dimension, we introduce a dimension reduction that replaces points from $X$ by sets of points in $Y$. To solve the case when $X$ has a bounded doubling dimension, we generalize Talwar's decomposition (Talwar STOC 2004) to our setting. The running time of our algorithms is $2^{2^t} \tilde O(n+m)$ where $t=O(\mathrm{ddim} \log \frac{\mathrm{ddim}}ε)$ and where $\mathrm{ddim}$ is the doubling dimension of $X$ (resp.\ $Y$). The results also extend to the metric facility location problem.

翻译：给定一个有限度量空间$(X\cup Y, \mathbf{d})$，$k$-中位数问题旨在找到一个包含$k$个中心点的集合$C\subseteq Y$，使得$\sum_{p\in X} \min_{c\in C} \mathbf{d}(p,c)$最小化。在一般度量空间中，目前最优的多项式时间算法可计算任意$ε>0$下的$(2+ε)$-近似解（Cohen-Addad 等，STOC 2025）。然而，若度量具有加倍性质，则存在一个近乎线性时间的$(1+ε)$-近似算法（Cohen-Addad 等，J. ACM 2021）。我们证明，当$X$或$Y$中仅一方具有有界加倍维数（而另一方不具有该性质）时，该$(1+ε)$-近似算法仍可推广适用。$X$具有加倍性质的情形受如下假设驱动：即使$X$嵌于高维空间，它可能近似位于低维结构附近。$Y$具有加倍性质的情形则源于中心点具有低维特性的具体聚类问题。具体而言，我们的工作在此设定下首次为离散Fréchet距离下的$(k,\ell)$-中位数问题（当$\ell$为常数时）给出了近乎线性时间的近似算法。此外，我们引入了一种针对实值时间序列的新型复杂度约减技术，进而为离散Fréchet距离情形导出类似结论。为处理$Y$具有有界加倍维数的情况，我们提出一种降维方法，将$X$中的点替换为$Y$中的点集。为处理$X$具有有界加倍维数的情况，我们将Talwar分解（Talwar，STOC 2004）推广至本文设定。算法的运行时间为$2^{2^t} \tilde O(n+m)$，其中$t=O(\mathrm{ddim} \log \frac{\mathrm{ddim}}ε)$，$\mathrm{ddim}$表示$X$（或$Y$）的加倍维数。该结果还可推广至度量设施选址问题。