Coresets for Clustering in Geometric Intersection Graphs

Designing coresets--small-space sketches of the data preserving cost of the solutions within $(1\pm \epsilon)$-approximate factor--is an important research direction in the study of center-based $k$-clustering problems, such as $k$-means or $k$-median. Feldman and Langberg [STOC'11] have shown that for $k$-clustering of $n$ points in general metrics, it is possible to obtain coresets whose size depends logarithmically in $n$. Moreover, such a dependency in $n$ is inevitable in general metrics. A significant amount of recent work in the area is devoted to obtaining coresests whose sizes are independent of $n$ (i.e., ``small'' coresets) for special metrics, like $d$-dimensional Euclidean spaces, doubling metrics, metrics of graphs of bounded treewidth, or those excluding a fixed minor. In this paper, we provide the first constructions of small coresets for $k$-clustering in the metrics induced by geometric intersection graphs, such as Euclidean-weighted Unit Disk/Square Graphs. These constructions follow from a general theorem that identifies two canonical properties of a graph metric sufficient for obtaining small coresets. The proof of our theorem builds on the recent work of Cohen-Addad, Saulpic, and Schwiegelshohn [STOC '21], which ensures small-sized coresets conditioned on the existence of an interesting set of centers, called ``centroid set''. The main technical contribution of our work is the proof of the existence of such a small-sized centroid set for graphs that satisfy the two canonical geometric properties. The new coreset construction helps to design the first $(1+\epsilon)$-approximation for center-based clustering problems in UDGs and USGs, that is fixed-parameter tractable in $k$ and $\epsilon$ (FPT-AS).

翻译：设计核心集——在数据中保留解的成本至多$(1\pm \epsilon)$近似因子的紧凑空间草图——是基于中心的$k$-聚类问题（如$k$-均值或$k$-中位数）研究中的重要方向。Feldman和Langberg [STOC'11]已证明，对于一般度量空间中$n$个点的$k$-聚类，可构造大小与$n$成对数关系的核心集。此外，在一般度量空间中，这种对$n$的依赖是不可避免的。近期该领域的大量工作致力于为特殊度量空间（如$d$维欧氏空间、双倍度量空间、有界树宽图或无固定小图作为子图的图度量空间）构造大小与$n$无关（即"小"核心集）。本文首次为几何交图（如欧氏加权单位圆盘图/单位方图）诱导的度量空间中的$k$-聚类问题提供了小核心集构造方法。这些构造基于一个通用定理，该定理识别出图形度量足以构造小核心集的两个典型几何性质。定理证明借鉴了Cohen-Addad、Saulpic和Schwiegelshohn [STOC '21]的最新工作，该工作确保在存在称为"质心集"的有趣中心集条件下可构造小规模核心集。本文的主要技术贡献在于证明：满足这两个典型几何性质的图必然存在这种小规模质心集。新的核心集构造方法首次为UDG和USG中的基于中心的聚类问题设计了$(1+\epsilon)$-近似算法，该算法在$k$和$\epsilon$上具有固定参数可解性（FPT-AS）。