Proportional fairness criteria inspired by democratic ideals of proportional representation have received growing attention in the clustering literature. Prior work has investigated them in two separate paradigms. Chen et al. [ICML 2019] study centroid clustering, in which each data point's loss is determined by its distance to a representative point (centroid) chosen in its cluster. Caragiannis et al. [NeurIPS 2024] study non-centroid clustering, in which each data point's loss is determined by its maximum distance to any other data point in its cluster. We generalize both paradigms to introduce semi-centroid clustering, in which each data point's loss is a combination of its centroid and non-centroid losses, and study two proportional fairness criteria -- the core and, its relaxation, fully justified representation (FJR). Our main result is a novel algorithm which achieves a constant approximation to the core, in polynomial time, even when the distance metrics used for centroid and non-centroid loss measurements are different. We also derive improved results for more restricted loss functions and the weaker FJR criterion, and establish lower bounds in each case.
翻译:受比例代表性民主理念启发的比例公平准则在聚类文献中日益受到关注。先前的研究分别在两种不同范式中对其进行了探讨。Chen等人[ICML 2019]研究了质心聚类,其中每个数据点的损失由其到所属簇内选定代表点(质心)的距离决定。Caragiannis等人[NeurIPS 2024]研究了非质心聚类,其中每个数据点的损失由其到所属簇内任意其他数据点的最大距离决定。我们将这两种范式推广至半质心聚类,其中每个数据点的损失是其质心损失与非质心损失的组合,并研究两种比例公平准则——核心及其松弛形式完全合理表征(FJR)。我们的主要成果是提出了一种新颖算法,该算法即使在质心与非质心损失测量采用不同距离度量的情况下,仍能在多项式时间内实现对核心的常数近似。我们还针对更受限的损失函数及较弱的FJR准则推导出改进结果,并在每种情况下建立了下界。