In recent years, there has been a surge in effort to formalize notions of fairness in machine learning. We focus on centroid clustering--one of the fundamental tasks in unsupervised machine learning. We propose a new axiom ``proportionally representative fairness'' (PRF) that is designed for clustering problems where the selection of centroids reflects the distribution of data points and how tightly they are clustered together. Our fairness concept is not satisfied by existing fair clustering algorithms. We design efficient algorithms to achieve PRF both for unconstrained and discrete clustering problems. Our algorithm for the unconstrained setting is also the first known polynomial-time approximation algorithm for the well-studied Proportional Fairness (PF) axiom. Our algorithm for the discrete setting also matches the best known approximation factor for PF.
翻译:近年来,机器学习公平性形式化研究呈现爆发式增长。本文聚焦于质心聚类——无监督机器学习中的基础任务之一。我们提出名为"比例代表性公平"(PRF)的新公理,该公理专为聚类问题设计,要求质心的选择能够反映数据点分布及其聚类紧密程度。现有公平聚类算法均无法满足我们提出的公平性概念。我们设计了高效算法,分别在无约束和离散聚类问题中实现PRF。针对无约束场景的算法,也是首个为经过深入研究的比例公平(PF)公理设计的多项式时间近似算法。针对离散场景的算法,同样达到了目前已知的PF最佳近似比。