Constructing small-sized coresets for various clustering problems in different metric spaces has attracted significant attention for the past decade. A central problem in the coreset literature is to understand what is the best possible coreset size for $(k,z)$-clustering in Euclidean space. While there has been significant progress in the problem, there is still a gap between the state-of-the-art upper and lower bounds. For instance, the best known upper bound for $k$-means ($z=2$) is $\min \{O(k^{3/2} \varepsilon^{-2}),O(k \varepsilon^{-4})\}$ [1,2], while the best known lower bound is $\Omega(k\varepsilon^{-2})$ [1]. In this paper, we make significant progress on both upper and lower bounds. For a large range of parameters (i.e., $\varepsilon, k$), we have a complete understanding of the optimal coreset size. In particular, we obtain the following results: (1) We present a new coreset lower bound $\Omega(k \varepsilon^{-z-2})$ for Euclidean $(k,z)$-clustering when $\varepsilon \geq \Omega(k^{-1/(z+2)})$. In view of the prior upper bound $\tilde{O}_z(k \varepsilon^{-z-2})$ [1], the bound is optimal. The new lower bound also implies improved lower bounds for $(k,z)$-clustering in doubling metrics. (2) For the upper bound, we provide efficient coreset construction algorithms for $(k,z)$-clustering with improved or optimal coreset sizes in several metric spaces. In particular, we provide an $\tilde{O}_z(k^{\frac{2z+2}{z+2}} \varepsilon^{-2})$-sized coreset, with a unfied analysis, for $(k,z)$-clustering for all $z\geq 1$ in Euclidean space. [1] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. [2] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22.
翻译:在过去十年中,为不同度量空间中的各类聚类问题构造小型核心集引起了广泛关注。核心集研究的一个核心问题是理解欧几里得空间中$(k,z)$-聚类所能达到的最佳核心集大小。尽管该问题已取得显著进展,但当前最优上下界之间仍存在差距。例如,$k$-均值($z=2$)的已知最优上界为$\min \{O(k^{3/2} \varepsilon^{-2}),O(k \varepsilon^{-4})\}$ [1,2],而已知最优下界为$\Omega(k\varepsilon^{-2})$ [1]。本文在上下界两方面均取得了重要进展。针对大范围参数(即$\varepsilon, k$),我们完整刻画了最优核心集大小。具体获得以下结果:(1) 当$\varepsilon \geq \Omega(k^{-1/(z+2)})$时,我们提出欧几里得$(k,z)$-聚类的新核心集下界$\Omega(k \varepsilon^{-z-2})$。结合先前上界$\tilde{O}_z(k \varepsilon^{-z-2})$ [1],该下界是最优的。新下界还改进了加倍度量空间中$(k,z)$-聚类的下界。(2) 在若干度量空间中,我们为$(k,z)$-聚类提供了高效的核心集构造算法,其核心集大小得到改进或达到最优。特别地,我们给出了一个大小为$\tilde{O}_z(k^{\frac{2z+2}{z+2}} \varepsilon^{-2})$的核心集,并针对欧几里得空间中所有$z\geq 1$的$(k,z)$-聚类提供了统一分析。[1] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. [2] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22.