We consider the online $k$-median clustering problem in which $n$ points arrive online and must be irrevocably assigned to a cluster on arrival. As there are lower bound instances that show that an online algorithm cannot achieve a competitive ratio that is a function of $n$ and $k$, we consider a beyond worst-case analysis model in which the algorithm is provided a priori with a predicted budget $B$ that upper bounds the optimal objective value. We give an algorithm that achieves a competitive ratio that is exponential in the the number $k$ of clusters, and show that the competitive ratio of every algorithm must be linear in $k$. To the best of our knowledge this is the first investigation in the literature that considers cluster consistency using competitive analysis.
翻译:我们考虑在线 $k$-均值聚类问题,其中 $n$ 个点依次在线到达,且必须不可撤销地分配给一个簇。由于存在下界实例表明在线算法无法达到关于 $n$ 和 $k$ 的函数形式的竞争比,我们采用一种超越最坏情况的分析模型:算法预先获知一个预测预算 $B$,该预算为最优目标值的上界。我们提出一种算法,其竞争比随聚类数 $k$ 呈指数增长,并证明任意算法的竞争比至少与 $k$ 呈线性关系。据我们所知,这是文献中首次采用竞争分析方法研究簇一致性问题。