The $k$-center problem is a fundamental clustering variant with applications in learning systems and data summarization. In several real-world scenarios, the dataset to be clustered is not static, but evolves over time, as new data points arrive and old ones become stale. To account for dynamicity, the $k$-center problem has been mainly studied under the sliding window setting, where only the $N$ most recent points are considered non-stale, or the fully dynamic setting, where arbitrary sequences of point arrivals and deletions without prior notice may occur. In this paper, we introduce the dynamic setting with lifetimes, which bridges the two aforementioned classical settings by still allowing arbitrary arrivals and deletions, but making the deletion time of each point known upon its arrival. Under this new setting, we devise a deterministic $(2+\varepsilon)$-approximation algorithm with $\tilde{O}(k/\varepsilon)$ amortized update time and memory usage linear in the number of currently active points. Moreover, we develop a deterministic $(6+\varepsilon)$-approximation algorithm that, under tame update sequences, has $\tilde{O}(k/\varepsilon)$ worst-case update time and heavily sublinear working memory.
翻译:$k$-中心问题是聚类中的一个基本变体,广泛应用于学习系统和数据摘要。在若干现实场景中,待聚类的数据集并非静态,而是随时间演变,新数据点不断到达,旧数据点逐渐失效。为应对动态性,$k$-中心问题主要在滑动窗口设置(仅考虑最近$N$个非陈旧数据点)或全动态设置(可发生任意顺序的点到达与删除,且无事先通知)下研究。本文引入了具有生命周期的动态设置,该设置在仍然允许任意到达与删除的同时,使每个点的删除时间在其到达时即已知,从而桥接了上述两种经典设置。在此新设置下,我们设计了一种确定性$(2+\varepsilon)$-近似算法,其分摊更新时间为$\tilde{O}(k/\varepsilon)$,内存使用量与当前活跃点的数量呈线性关系。此外,我们开发了一种确定性$(6+\varepsilon)$-近似算法,在温和更新序列下,其最坏情况更新时间为$\tilde{O}(k/\varepsilon)$,且工作内存高度次线性。