We revisit the recently developed framework of proportionally fair clustering, where the goal is to provide group fairness guarantees that become stronger for groups of data points (agents) that are large and cohesive. Prior work applies this framework to centroid clustering, where the loss of an agent is its distance to the centroid assigned to its cluster. We expand the framework to non-centroid clustering, where the loss of an agent is a function of the other agents in its cluster, by adapting two proportional fairness criteria -- the core and its relaxation, fully justified representation (FJR) -- to this setting. We show that the core can be approximated only under structured loss functions, and even then, the best approximation we are able to establish, using an adaptation of the GreedyCapture algorithm developed for centroid clustering [Chen et al., 2019; Micha and Shah, 2020], is unappealing for a natural loss function. In contrast, we design a new (inefficient) algorithm, GreedyCohesiveClustering, which achieves the relaxation FJR exactly under arbitrary loss functions, and show that the efficient GreedyCapture algorithm achieves a constant approximation of FJR. We also design an efficient auditing algorithm, which estimates the FJR approximation of any given clustering solution up to a constant factor. Our experiments on real data suggest that traditional clustering algorithms are highly unfair, whereas GreedyCapture is considerably fairer and incurs only a modest loss in common clustering objectives.
翻译:我们重新审视了最近提出的比例公平聚类框架,该框架旨在为规模较大且内聚的数据点(智能体)组提供更强的群体公平性保证。先前工作将该框架应用于质心聚类,其中智能体的损失是其到所属聚类质心的距离。我们将该框架扩展至非质心聚类——在此类聚类中,智能体的损失是其所属聚类内其他智能体的函数——通过将两种比例公平性准则(核心解及其松弛形式:完全正当表征)适配到该场景。我们证明,仅当损失函数具有特定结构时,核心解才可被近似;即便如此,通过适配为质心聚类开发的GreedyCapture算法[Chen et al., 2019; Micha and Shah, 2020],我们所能建立的最佳近似对于自然损失函数仍不理想。相比之下,我们设计了一种新的(低效)算法GreedyCohesiveClustering,该算法能在任意损失函数下精确实现松弛条件FJR;同时证明高效的GreedyCapture算法可实现FJR的常数倍近似。我们还设计了一种高效的审计算法,可在常数因子范围内估计任意给定聚类解的FJR近似程度。在真实数据上的实验表明,传统聚类算法存在显著不公平性,而GreedyCapture算法在保持常见聚类目标损失较小的同时,能显著提升公平性。