$k$-means++ is an important algorithm for choosing initial cluster centers for the $k$-means clustering algorithm. In this work, we present a new algorithm that can solve the $k$-means++ problem with nearly optimal running time. Given $n$ data points in $\mathbb{R}^d$, the current state-of-the-art algorithm runs in $\widetilde{O}(k )$ iterations, and each iteration takes $\widetilde{O}(nd k)$ time. The overall running time is thus $\widetilde{O}(n d k^2)$. We propose a new algorithm \textsc{FastKmeans++} that only takes in $\widetilde{O}(nd + nk^2)$ time, in total.
翻译:$k$-means++ 是为 $k$-means 聚类算法选择初始聚类中心的重要算法。本文提出了一种新算法,能够以近乎最优的运行时间解决 $k$-means++ 问题。给定 $\mathbb{R}^d$ 中的 $n$ 个数据点,当前最先进的算法需运行 $\widetilde{O}(k)$ 次迭代,每次迭代耗时 $\widetilde{O}(nd k)$,因此总运行时间为 $\widetilde{O}(n d k^2)$。我们提出的新算法 \textsc{FastKmeans++} 的总运行时间仅为 $\widetilde{O}(nd + nk^2)$。