We initiate the study of the following general clustering problem. We seek to partition a given set $P$ of data points into $k$ clusters by finding a set $X$ of $k$ centers and assigning each data point to one of the centers. The cost of a cluster, represented by a center $x\in X$, is a monotone, symmetric norm $f$ (inner norm) of the vector of distances of points assigned to $x$. The goal is to minimize a norm $g$ (outer norm) of the vector of cluster costs. This problem, which we call $(f,g)$-Clustering, generalizes many fundamental clustering problems such as $k$-Center, $k$-Median , Min-Sum of Radii, and Min-Load $k$-Clustering . A recent line of research (Chakrabarty, Swamy [STOC'19]) studies norm objectives that are oblivious to the cluster structure such as $k$-Median and $k$-Center. In contrast, our problem models cluster-aware objectives including Min-Sum of Radii and Min-Load $k$-Clustering. Our main results are as follows. First, we design a constant-factor approximation algorithm for $(\textsf{top}_\ell,\mathcal{L}_1)$-Clustering where the inner norm ($\textsf{top}_\ell$) sums over the $\ell$ largest distances. Second, we design a constant-factor approximation\ for $(\mathcal{L}_\infty,\textsf{Ord})$-Clustering where the outer norm is a convex combination of $\textsf{top}_\ell$ norms (ordered weighted norm).
翻译:我们首次研究以下一般聚类问题。目标是通过寻找一组包含k个中心的集合X,并将每个数据点分配到某个中心,从而将给定数据点集P划分为k个簇。以中心x∈X为代表的簇成本,是分配给x的点的距离向量在单调对称范数f(内部范数)下的取值。问题的目标是最小化簇成本向量的范数g(外部范数)。该问题我们称为(f,g)-聚类,它推广了许多基础聚类问题,例如k-中心、k-中位数、半径最小和以及最小负载k-聚类。近期一系列研究(Chakrabarty, Swamy [STOC'19])关注的是对簇结构不可知的范数目标,如k-中位数和k-中心。相比之下,我们的问题建模了包括半径最小和与最小负载k-聚类在内的簇感知目标。我们的主要成果如下:首先,我们为(top_ℓ, ℒ₁)-聚类设计了一个常数因子近似算法,其中内部范数(top_ℓ)对前ℓ个最大距离求和。其次,我们为(ℒ_∞, Ord)-聚类设计了一个常数因子近似算法,其中外部范数是top_ℓ范数(有序加权范数)的凸组合。