We study the problem of $k$-means clustering in the space of straight-line segments in $\mathbb{R}^{2}$ under the Hausdorff distance. For this problem, we give a $(1+\epsilon)$-approximation algorithm that, for an input of $n$ segments, for any fixed $k$, and with constant success probability, runs in time $O(n+ \epsilon^{-O(k)} + \epsilon^{-O(k)}\cdot \log^{O(k)} (\epsilon^{-1}))$. The algorithm has two main ingredients. Firstly, we express the $k$-means objective in our metric space as a sum of algebraic functions and use the optimization technique of Vigneron~\cite{Vigneron14} to approximate its minimum. Secondly, we reduce the input size by computing a small size coreset using the sensitivity-based sampling framework by Feldman and Langberg~\cite{Feldman11, Feldman2020}. Our results can be extended to polylines of constant complexity with a running time of $O(n+ \epsilon^{-O(k)})$.
翻译:我们研究在豪斯多夫距离下,$\mathbb{R}^{2}$中直线段空间上的$k$-均值聚类问题。针对该问题,我们提出一个$(1+\epsilon)$-近似算法,该算法对于包含$n$条线段的输入、任意固定$k$值,以恒定成功概率运行,时间复杂度为$O(n+ \epsilon^{-O(k)} + \epsilon^{-O(k)}\cdot \log^{O(k)} (\epsilon^{-1}))$。该算法包含两个主要组成部分:首先,我们将度量空间中的$k$-均值目标函数表示为代数函数之和,并采用Vigneron~\cite{Vigneron14}的优化技术来近似其最小值;其次,我们利用Feldman和Langberg~\cite{Feldman11, Feldman2020}提出的基于灵敏度的采样框架计算一个规模较小的核心集,从而缩减输入规模。我们的结果可扩展至具有恒定复杂度的折线,运行时间为$O(n+ \epsilon^{-O(k)})$。