In recent years, there has been a surge in effort to formalize notions of fairness in machine learning. We focus on clustering -- one of the fundamental tasks in unsupervised machine learning. We propose a new axiom ``proportional representation fairness'' (PRF) that is designed for clustering problems where the selection of centroids reflects the distribution of data points and how tightly they are clustered together. Our fairness concept is not satisfied by existing fair clustering algorithms. We design efficient algorithms to achieve PRF both for unconstrained and discrete clustering problems. Our algorithm for the unconstrained setting is also the first known polynomial-time approximation algorithm for the well-studied Proportional Fairness (PF) axiom (Chen, Fain, Lyu, and Munagala, ICML, 2019). Our algorithm for the discrete setting also matches the best known approximation factor for PF.
翻译:近年来,机器学习中的公平性概念形式化工作激增。我们聚焦于无监督机器学习的基本任务之一——聚类。我们提出了一种新的公理“比例代表性公平性”(PRF),该公理专为聚类问题设计,要求质心的选择反映数据点的分布及其聚类紧凑程度。现有公平聚类算法无法满足我们的公平性概念。我们为无约束和离散聚类问题设计了实现PRF的高效算法。针对无约束场景的算法也是已知的首个针对经过充分研究的比例公平性(PF)公理(Chen, Fain, Lyu, and Munagala, ICML, 2019)的多项式时间近似算法。针对离散场景的算法同时达到了PF的最佳已知近似因子。