We consider the classic $k$-center problem in a parallel setting, on the low-local-space Massively Parallel Computation (MPC) model, with local space per machine of $\mathcal{O}(n^{\delta})$, where $\delta \in (0,1)$ is an arbitrary constant. As a central clustering problem, the $k$-center problem has been studied extensively. Still, until very recently, all parallel MPC algorithms have been requiring $\Omega(k)$ or even $\Omega(k n^{\delta})$ local space per machine. While this setting covers the case of small values of $k$, for a large number of clusters these algorithms require large local memory, making them poorly scalable. The case of large $k$, $k \ge \Omega(n^{\delta})$, has been considered recently for the low-local-space MPC model by Bateni et al. (2021), who gave an $\mathcal{O}(\log \log n)$-round MPC algorithm that produces $k(1+o(1))$ centers whose cost has multiplicative approximation of $\mathcal{O}(\log\log\log n)$. In this paper we extend the algorithm of Bateni et al. and design a low-local-space MPC algorithm that in $\mathcal{O}(\log\log n)$ rounds returns a clustering with $k(1+o(1))$ clusters that is an $\mathcal{O}(\log^*n)$-approximation for $k$-center.
翻译:我们考虑经典$k$-中心问题在并行环境下的求解,采用低局部空间的大规模并行计算(MPC)模型,其中每台机器的局部空间为$\mathcal{O}(n^{\delta})$,$\delta \in (0,1)$为任意常数。作为核心聚类问题,$k$-中心问题已被广泛研究。然而,直到最近,所有并行MPC算法仍要求每台机器具有$\Omega(k)$甚至$\Omega(k n^{\delta})$的局部空间。尽管该设置覆盖了$k$值较小的情况,但对于大量聚类中心,这些算法需要较大的局部内存,导致可扩展性较差。大$k$(即$k \ge \Omega(n^{\delta})$)的情况最近由Bateni等人(2021)在低局部空间MPC模型中考虑,他们提出了一种$\mathcal{O}(\log \log n)$轮的MPC算法,生成$k(1+o(1))$个中心,其代价的乘法近似比为$\mathcal{O}(\log\log\log n)$。本文扩展了Bateni等人的算法,设计了一种低局部空间MPC算法,该算法在$\mathcal{O}(\log \log n)$轮内返回一个包含$k(1+o(1))$个聚类的聚类结果,且对$k$-中心问题具有$\mathcal{O}(\log^*n)$的近似比。