We study coresets for clustering with capacity and fairness constraints. Our main result is a near-linear time algorithm to construct $\tilde{O}(k^2\varepsilon^{-2z-2})$-sized $\varepsilon$-coresets for capacitated $(k,z)$-clustering which improves a recent $\tilde{O}(k^3\varepsilon^{-3z-2})$ bound by [BCAJ+22, HJLW23]. As a corollary, we also save a factor of $k \varepsilon^{-z}$ on the coreset size for fair $(k,z)$-clustering compared to them. We fundamentally improve the hierarchical uniform sampling framework of [BCAJ+22] by adaptively selecting sample size on each ring instance, proportional to its clustering cost to an optimal solution. Our analysis relies on a key geometric observation that reduces the number of total ``effective centers" from [BCAJ+22]'s $\tilde{O}(k^2\varepsilon^{-z})$ to merely $O(k\log \varepsilon^{-1})$ by being able to ``ignore'' all center points that are too far or too close to the ring center.
翻译:我们研究带有容量和公平性约束的聚类核集。主要结果是一种近线性时间算法,用于构建大小为$\tilde{O}(k^2\varepsilon^{-2z-2})$的$\varepsilon$-核集,适用于容量$(k,z)$-聚类,这改进了[BCAJ+22, HJLW23]近期给出的$\tilde{O}(k^3\varepsilon^{-3z-2})$界限。作为推论,我们在公平$(k,z)$-聚类的核集大小上,相较于他们节省了$k \varepsilon^{-z}$的因子。我们从根本上改进了[BCAJ+22]的层次均匀采样框架,通过自适应地选择每个环实例上的样本大小,使其与最优解下的聚类成本成比例。我们的分析依赖于一个关键的几何观察:通过能够“忽略”所有距离环中心过远或过近的中心点,将总“有效中心点”数量从[BCAJ+22]的$\tilde{O}(k^2\varepsilon^{-z})$减少到仅$O(k\log \varepsilon^{-1})$。