For a given set of points in a metric space and an integer $k$, we seek to partition the given points into $k$ clusters. For each computed cluster, one typically defines one point as the center of the cluster. A natural objective is to minimize the sum of the cluster center's radii, where we assign the smallest radius $r$ to each center such that each point in the cluster is at a distance of at most $r$ from the center. The best-known polynomial time approximation ratio for this problem is $3.389$. In the setting with outliers, i.e., we are given an integer $m$ and allow up to $m$ points that are not in any cluster, the best-known approximation factor is $12.365$. In this paper, we improve both approximation ratios to $3+\epsilon$. Our algorithms are primal-dual algorithms that use fundamentally new ideas to compute solutions and to guarantee the claimed approximation ratios. For example, we replace the classical binary search to find the best value of a Lagrangian multiplier $\lambda$ by a primal-dual routine in which $\lambda$ is a variable that is raised. Also, we show that for each connected component due to almost tight dual constraints, we can find one single cluster that covers all its points and we bound its cost via a new primal-dual analysis. We remark that our approximation factor of $3+\epsilon$ is a natural limit for the known approaches in the literature. Then, we extend our results to the setting of lower bounds. There are algorithms known for the case that for each point $i$ there is a lower bound $L_{i}$, stating that we need to assign at least $L_{i}$ clients to $i$ if $i$ is a cluster center. For this setting, there is a $ 3.83$ approximation if outliers are not allowed and a ${12.365}$-approximation with outliers. We improve both ratios to $3.5 + \epsilon$ and, at the same time, generalize the type of allowed lower bounds.
翻译:对于度量空间中给定的点集和整数 $k$,我们旨在将给定点划分为 $k$ 个聚类。对于每个计算得到的聚类,通常定义一个点作为该聚类的中心。一个自然目标是最小化聚类中心半径的总和,其中我们为每个中心分配最小半径 $r$,使得聚类中的每个点到中心的距离至多为 $r$。该问题已知最佳多项式时间近似比为 $3.389$。在带异常点的设置中(即给定整数 $m$,允许最多 $m$ 个点不属于任何聚类),已知最佳近似因子为 $12.365$。本文将这两个近似比均改进至 $3+\epsilon$。我们的算法是基于原始-对偶的算法,采用根本性的新思想来求解并保证所声称的近似比。例如,我们以原始-对偶例程替代经典的二分搜索来寻找拉格朗日乘子 $\lambda$ 的最优值,其中 $\lambda$ 是一个不断增大的变量。此外,我们证明对于由几乎紧的对偶约束产生的每个连通分量,可以找到单个覆盖其所有点的聚类,并通过新的原始-对偶分析界定其成本。我们指出,我们得到的近似因子 $3+\epsilon$ 是文献中已知方法的自然极限。随后,我们将结果推广至带下界的设置。已知对于每个点 $i$ 存在下界 $L_i$ 的情形(即若 $i$ 作为聚类中心,则至少需分配 $L_i$ 个客户给 $i$)已有算法:在不允许异常点的情况下存在 $3.83$ 近似算法,在允许异常点时存在 ${12.365}$ 近似算法。我们将这两个近似比均改进至 $3.5 + \epsilon$,同时推广了允许的下界类型。