Fair Labeled Clustering

Numerous algorithms have been produced for the fundamental problem of clustering under many different notions of fairness. Perhaps the most common family of notions currently studied is group fairness, in which proportional group representation is ensured in every cluster. We extend this direction by considering the downstream application of clustering and how group fairness should be ensured for such a setting. Specifically, we consider a common setting in which a decision-maker runs a clustering algorithm, inspects the center of each cluster, and decides an appropriate outcome (label) for its corresponding cluster. In hiring for example, there could be two outcomes, positive (hire) or negative (reject), and each cluster would be assigned one of these two outcomes. To ensure group fairness in such a setting, we would desire proportional group representation in every label but not necessarily in every cluster as is done in group fair clustering. We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions. We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space. We show that this setting exhibits interesting transitions from computationally hard to easy according to additional constraints on the problem. Moreover, when the constraint parameters take on natural values we show a randomized algorithm for this setting that always achieves an optimal clustering and satisfies the fairness constraints in expectation. Finally, we run experiments on real world datasets that validate the effectiveness of our algorithms.

翻译：针对多种不同公平性概念下的聚类这一基本问题，已有大量算法被提出。当前研究最广泛的公平性概念类别或许是群体公平性，即确保每个簇中群体比例具有代表性。我们通过考虑聚类的下游应用以及在此场景下如何确保群体公平性来拓展这一方向。具体而言，我们考虑一个常见场景：决策者运行聚类算法，检查每个簇的中心点，并为其对应簇决定适当的结果（标签）。例如在招聘中，可能存在两种结果：正面（录用）或负面（拒绝），每个簇将被分配这两种结果之一。要在此场景下确保群体公平性，我们期望在每个标签上实现群体比例代表性，但不一定像群体公平聚类那样在每一个簇中实现。我们为此类问题提供了算法，并表明与群体公平聚类中NP难的问题相比，它们可被高效求解。我们还考虑了一个具有良好动机的替代场景：决策者可自由为簇分配标签，而无需考虑中心点在度量空间中的位置。我们证明，该场景在附加约束条件下会呈现从计算困难到计算简单的有趣转变。此外，当约束参数取自然值时，我们为该场景提出一种随机算法，该算法总能实现最优聚类并满足期望意义上的公平性约束。最后，我们在真实世界数据集上进行了实验，验证了所提算法的有效性。