Given a point set $P \subseteq X$ of size $n$ in a metric space $(X,dist)$ of doubling dimension $d$ and two parameters $k \in N$ and $z \in N$, the $k$-center problem with $z$ outliers asks to return a set $C^\ast \subseteq X$ of $k$ centers such that the maximum distance of all but $z$ points of $P$ to their nearest center in $C^\ast$ is minimized. An $(\epsilon,k,z)$-coreset for this problem is a weighted point set $P^*$ such that an optimal solution for the $k$-center problem with $z$ outliers on $P^*$ gives a $(1\pm\epsilon)$-approximation for the $k$-center problem with $z$ outliers on $P$. We study the construction of such coresets in the Massively Parallel Computing (MPC) model, and in the insertion-only as well as the fully dynamic streaming model. We obtain the following results, for any given $0 < \epsilon \le 1$: In all cases, the size of the computed coreset is $O(k/\epsilon^d+z)$. - In the MPC model, we present a deterministic $2$-round and a randomized $1$-round algorithm. Additionally, we provide a deterministic algorithm that obtains a trade-off between the number of rounds, $R$, and the storage per machine. - For the insertion-only streaming model, we present an algorithm and a tight lower bound to support it. - We also discuss the dynamic streaming model, which allows both insertions and deletions in the data stream. In this model, we present the first algorithm and a lower bound. - Finally, we consider the sliding window model, where we are interested in maintaining an $(\epsilon,k,z)$-coreset for the last $W$ points in the stream, we present a tight lower bound that confirms the optimality of the previous work by De Berg, Monemizadeh, and Zhong (ESA2020).
翻译:给定度量空间$(X,dist)$(加倍维数为$d$)中大小为$n$的点集$P \subseteq X$,以及两个参数$k \in N$和$z \in N$,带$z$个离群点的$k$-中心问题要求返回一个包含$k$个中心的集合$C^\ast \subseteq X$,使得$P$中除$z$个点外其余所有点到$C^\ast$中最近中心的最大距离最小化。该问题的$(\epsilon,k,z)$-核心集是一个带权点集$P^*$,使得在$P^*$上求解带$z$个离群点的$k$-中心问题的最优解,能给出在$P$上该问题的$(1\pm\epsilon)$-近似解。我们研究在巨量并行计算(MPC)模型、仅插入流模型以及全动态流模型中此类核心集的构造方法。对于任意给定的$0 < \epsilon \le 1$,我们得到以下结果:在所有情况下,计算出的核心集大小为$O(k/\epsilon^d+z)$。 - 在MPC模型中,我们提出确定性的$2$轮算法和随机化的$1$轮算法。此外,我们提供了一种确定性算法,可在轮数$R$与每台机器的存储量之间取得折中。 - 对于仅插入流模型,我们提出一种算法并给出紧的下界以支持该结果。 - 我们还讨论了允许数据流中插入和删除操作的动态流模型,在该模型中我们提出了首个算法及下界。 - 最后,我们考虑滑动窗口模型(目标是维护流中最近$W$个点的$(\epsilon,k,z)$-核心集),并给出一个紧下界,证实了De Berg、Monemizadeh和Zhong(ESA2020)先前工作的最优性。