Ensuring fairness in machine learning algorithms is a challenging and essential task. We consider the problem of clustering a set of points while satisfying fairness constraints. While there have been several attempts to capture group fairness in the $k$-clustering problem, fairness at an individual level is relatively less explored. We introduce a new notion of individual fairness in $k$-clustering based on features not necessarily used for clustering. We show that this problem is NP-hard and does not admit a constant factor approximation. Therefore, we design a randomized algorithm that guarantees approximation both in terms of minimizing the clustering distance objective and individual fairness under natural restrictions on the distance metric and fairness constraints. Finally, our experimental results against six competing baselines validate that our algorithm produces individually fairer clusters than the fairest baseline by 12.5% on average while also being less costly in terms of the clustering objective than the best baseline by 34.5% on average.
翻译:确保机器学习算法的公平性是一项具有挑战性且至关重要的任务。我们研究在满足公平性约束条件下对点集进行聚类的问题。尽管已有多种尝试在k-聚类问题中捕捉群体公平性,但个体层面的公平性研究相对较少。我们提出了一种基于非聚类特征(即不必然用于聚类的特征)的k-聚类个体公平性新定义。研究表明该问题为NP难问题,且不存在常数因子近似算法。为此,我们设计了一种随机化算法,在距离度量和公平性约束的自然限制条件下,该算法能同时保证聚类距离目标最小化和个体公平性近似。最后,与六种竞争基线的实验对比证实:我们的算法产生的聚类在个体公平性上平均比最公平的基线提升12.5%,同时在聚类目标成本上平均比最佳基线降低34.5%。