On Parallel k-Center Clustering

We consider the classic $k$-center problem in a parallel setting, on the low-local-space Massively Parallel Computation (MPC) model, with local space per machine of $\mathcal{O}(n^{\delta})$, where $\delta \in (0,1)$ is an arbitrary constant. As a central clustering problem, the $k$-center problem has been studied extensively. Still, until very recently, all parallel MPC algorithms have been requiring $\Omega(k)$ or even $\Omega(k n^{\delta})$ local space per machine. While this setting covers the case of small values of $k$, for a large number of clusters these algorithms require large local memory, making them poorly scalable. The case of large $k$, $k \ge \Omega(n^{\delta})$, has been considered recently for the low-local-space MPC model by Bateni et al. (2021), who gave an $\mathcal{O}(\log \log n)$-round MPC algorithm that produces $k(1+o(1))$ centers whose cost has multiplicative approximation of $\mathcal{O}(\log\log\log n)$. In this paper we extend the algorithm of Bateni et al. and design a low-local-space MPC algorithm that in $\mathcal{O}(\log\log n)$ rounds returns a clustering with $k(1+o(1))$ clusters that is an $\mathcal{O}(\log^*n)$-approximation for $k$-center.

翻译：我们考虑经典$k$-中心问题在并行环境下的求解，采用低局部空间的大规模并行计算（MPC）模型，其中每台机器的局部空间为$\mathcal{O}(n^{\delta})$，$\delta \in (0,1)$为任意常数。作为核心聚类问题，$k$-中心问题已被广泛研究。然而，直到最近，所有并行MPC算法仍要求每台机器具有$\Omega(k)$甚至$\Omega(k n^{\delta})$的局部空间。尽管该设置覆盖了$k$值较小的情况，但对于大量聚类中心，这些算法需要较大的局部内存，导致可扩展性较差。大$k$（即$k \ge \Omega(n^{\delta})$）的情况最近由Bateni等人（2021）在低局部空间MPC模型中考虑，他们提出了一种$\mathcal{O}(\log \log n)$轮的MPC算法，生成$k(1+o(1))$个中心，其代价的乘法近似比为$\mathcal{O}(\log\log\log n)$。本文扩展了Bateni等人的算法，设计了一种低局部空间MPC算法，该算法在$\mathcal{O}(\log \log n)$轮内返回一个包含$k(1+o(1))$个聚类的聚类结果，且对$k$-中心问题具有$\mathcal{O}(\log^*n)$的近似比。

相关内容

Omega

关注 17

在Omega中，资源发放是乐观的(optimistic)，每一个应用都发放了所有的可用的资源，冲突是在提交的时候被解决的。Omega的资源管理器，本质上是一个保存着每一个节点的状态关系数据库，并且用不同的乐观并发控制来解决冲突。这样的好处是其大大的提高了调度器的性能(完全的并行，full parallelism)和资源利用率。

【ICML2021】随机森林机器遗忘

专知会员服务

21+阅读 · 2021年8月9日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日