We study coresets for clustering with capacity and fairness constraints. Our main result is a near-linear time algorithm to construct $\tilde{O}(k^2\varepsilon^{-2z-2})$-sized $\varepsilon$-coresets for capacitated $(k,z)$-clustering which improves a recent $\tilde{O}(k^3\varepsilon^{-3z-2})$ bound by [BCAJ+22, HJLW23]. As a corollary, we also save a factor of $k \varepsilon^{-z}$ on the coreset size for fair $(k,z)$-clustering compared to them. We fundamentally improve the hierarchical uniform sampling framework of [BCAJ+22] by adaptively selecting sample size on each ring instance, proportional to its clustering cost to an optimal solution. Our analysis relies on a key geometric observation that reduces the number of total ``effective centers" from [BCAJ+22]'s $\tilde{O}(k^2\varepsilon^{-z})$ to merely $O(k\log \varepsilon^{-1})$ by being able to ``ignore'' all center points that are too far or too close to the ring center.
翻译:研究带容量和公平性约束的聚类问题中的核心集。主要成果是提出一种近线性时间算法,用于构建大小为 $\tilde{O}(k^2\varepsilon^{-2z-2})$ 的 $\varepsilon$-核心集,用于带容量约束的 $(k,z)$-聚类,这改进了 [BCAJ+22, HJLW23] 近期给出的 $\tilde{O}(k^3\varepsilon^{-3z-2})$ 上界。作为推论,在公平 $(k,z)$-聚类中,核心集大小比其节省了 $k \varepsilon^{-z}$ 的因子。我们从根本上改进了 [BCAJ+22] 的分层均匀采样框架,通过在每个环形实例上根据其相对于最优解的聚类代价自适应地选择样本大小。我们的分析基于一个关键的几何观察,该观察将 [BCAJ+22] 中 $\tilde{O}(k^2\varepsilon^{-z})$ 个“有效中心点”的总数减少到仅 $O(k\log \varepsilon^{-1})$,通过能够“忽略”所有距离环形中心太远或太近的中心点。