Clustering What Matters in Constrained Settings

Constrained clustering problems generalize classical clustering formulations, e.g., $k$-median, $k$-means, by imposing additional constraints on the feasibility of clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out $m$ points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained $k$-median or $k$-means problem to the corresponding outlier-free version with only $(1+\varepsilon)$-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to $f(k,m, \varepsilon)$ instances of the outlier-free version, where $f(k, m, \varepsilon) = \left( \frac{k+m}{\varepsilon}\right)^{O(m)}$. As specific applications, we get the following results: - First FPT (in the parameters $k$ and $m$) $(1+\varepsilon)$-approximation algorithm for the outlier version of capacitated $k$-median and $k$-means in Euclidean spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(3+\varepsilon)$ and $(9+\varepsilon)$ approximation algorithms for the outlier version of capacitated $k$-median and $k$-means, respectively, in general metric spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(2-\delta)$-approximation algorithm for the outlier version of the $k$-median problem under the Ulam metric. Our work generalizes the known results to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.

翻译：约束聚类问题通过在聚类可行性上施加额外约束，推广了经典聚类形式（例如 $k$-中位数、$k$-均值）。近年来，在度量空间和欧几里得空间中，求解这些问题的近似算法取得了显著进展。然而，允许从聚类中排除 $m$ 个点的离群点版本仍未被充分理解。本文提出一个通用框架，可将带约束的 $k$-中位数或 $k$-均值问题的离群点版本归约到对应的无离群点版本，且近似比仅损失 $(1+\varepsilon)$。该归约通过将原始实例映射到 $f(k,m,\varepsilon)$ 个无离群点版本实例实现，其中 $f(k,m,\varepsilon) = \left( \frac{k+m}{\varepsilon}\right)^{O(m)}$。具体应用包括以下结果： - 首个针对欧几里得空间中硬容量约束的 $k$-中位数和 $k$-均值离群点版本的FPT（参数 $k$ 和 $m$）$(1+\varepsilon)$-近似算法。 - 首个针对一般度量空间中硬容量约束的 $k$-中位数和 $k$-均值离群点版本的FPT（参数 $k$ 和 $m$）$(3+\varepsilon)$ 和 $(9+\varepsilon)$-近似算法。 - 首个针对Ulam度量下 $k$-中位数离群点版本的FPT（参数 $k$ 和 $m$）$(2-\delta)$-近似算法。本文工作将已知结果推广到更广泛的约束聚类问题类别。此外，我们的归约适用于任意度量空间，因此可扩展欧几里得空间和任意度量空间中的无离群点版本聚类算法。