The remarkable attention which fair clustering has received in the last few years has resulted in a significant number of different notions of fairness. Despite the fact that these notions are well-justified, they are often motivated and studied in a disjoint manner where one fairness desideratum is considered exclusively in isolation from the others. This leaves the understanding of the relations between different fairness notions as an important open problem in fair clustering. In this paper, we take the first step in this direction. Specifically, we consider the two most prominent demographic representation fairness notions in clustering: (1) Group Fairness (GF), where the different demographic groups are supposed to have close to population-level representation in each cluster and (2) Diversity in Center Selection (DS), where the selected centers are supposed to have close to population-level representation of each group. We show that given a constant approximation algorithm for one constraint (GF or DS only) we can obtain a constant approximation solution that satisfies both constraints simultaneously. Interestingly, we prove that any given solution that satisfies the GF constraint can always be post-processed at a bounded degradation to the clustering cost to additionally satisfy the DS constraint while the reverse is not true. Furthermore, we show that both GF and DS are incompatible (having an empty feasibility set in the worst case) with a collection of other distance-based fairness notions. Finally, we carry experiments to validate our theoretical findings.
翻译:近年来,公平聚类引起了广泛关注,产生了大量不同的公平性概念。尽管这些概念都有充分的理由支撑,但它们往往是在相互割裂的方式下被提出和研究,即孤立地考虑单一的公平性需求,而忽视其他方面。这导致不同公平性概念之间的关系成为公平聚类领域的一个重要开放问题。本文首次尝试探讨这一方向。具体而言,我们考虑了聚类中最具代表性的两种人口统计表征公平性概念:(1)群体公平性(GF),要求不同人口统计群体在每个聚类中的代表性接近其在总人口中的比例;(2)中心选择多样性(DS),要求所选中心在每组中的代表性接近其在总人口中的比例。我们证明,给定一种约束(仅GF或仅DS)的常数近似算法,可以同时满足两种约束并得到常数近似解。有趣的是,我们证明了任何满足GF约束的给定解,总能在聚类代价有限退化的前提下通过后处理额外满足DS约束,但反之则不成立。此外,我们展示了GF和DS均与一系列基于距离的公平性概念不相容(在最坏情况下可行集为空)。最后,我们通过实验验证了理论发现。