Fully dynamic clustering and diversity maximization in doubling metrics

We present approximation algorithms for some variants of center-based clustering and related problems in the fully dynamic setting, where the pointset evolves through an arbitrary sequence of insertions and deletions. Specifically, we target the following problems: $k$-center (with and without outliers), matroid-center, and diversity maximization. All algorithms employ a coreset-based strategy and rely on the use of the cover tree data structure, which we crucially augment to maintain, at any time, some additional information enabling the efficient extraction of the solution for the specific problem. For all of the aforementioned problems our algorithms yield $(\alpha+\varepsilon)$-approximations, where $\alpha$ is the best known approximation attainable in polynomial time in the standard off-line setting (except for $k$-center with $z$ outliers where $\alpha = 2$ but we get a $(3+\varepsilon)$-approximation) and $\varepsilon>0$ is a user-provided accuracy parameter. The analysis of the algorithms is performed in terms of the doubling dimension of the underlying metric. Remarkably, and unlike previous works, the data structure and the running times of the insertion and deletion procedures do not depend in any way on the accuracy parameter $\varepsilon$ and, for the two $k$-center variants, on the parameter $k$. For spaces of bounded doubling dimension, the running times are dramatically smaller than those that would be required to compute solutions on the entire pointset from scratch. To the best of our knowledge, ours are the first solutions for the matroid-center and diversity maximization problems in the fully dynamic setting.

翻译：我们提出了针对全动态环境下基于中心的聚类及其相关问题的近似算法，该环境中点集通过任意序列的插入和删除操作演化。具体而言，我们重点关注以下问题：k-中心（含与不含离群点）、拟阵中心以及多样性最大化。所有算法均采用基于核心集的策略，并依赖于覆盖树数据结构的应用——我们对该结构进行了关键性增强，使其能随时维护额外信息，从而高效提取特定问题的解。对于上述所有问题，我们的算法均能获得(α+ε)-近似解，其中α为已知在标准离线设置中多项式时间内可达到的最优近似比（对于含z个离群点的k-中心问题，α=2，但我们的算法可达到(3+ε)-近似），而ε>0为用户指定的精度参数。算法分析基于底层度量的加倍维度。值得注意的是，与先前工作不同，我们的数据结构及插入/删除操作的运行时间完全不依赖于精度参数ε（对于两种k-中心变体，亦不依赖于参数k）。对于有界加倍维度的空间，其运行时间远小于从头对整个点集计算解所需的时间。据我们所知，这是首个针对全动态环境下拟阵中心与多样性最大化问题的解决方案。