In clustering problems, a central decision-maker is given a complete metric graph over vertices and must provide a clustering of vertices that minimizes some objective function. In fair clustering problems, vertices are endowed with a color (e.g., membership in a group), and the features of a valid clustering might also include the representation of colors in that clustering. Prior work in fair clustering assumes complete knowledge of group membership. In this paper, we generalize prior work by assuming imperfect knowledge of group membership through probabilistic assignments. We present clustering algorithms in this more general setting with approximation ratio guarantees. We also address the problem of "metric membership", where different groups have a notion of order and distance. Experiments are conducted using our proposed algorithms as well as baselines to validate our approach and also surface nuanced concerns when group membership is not known deterministically.
翻译:在聚类问题中,中央决策者面对一个给定完整度量图的顶点集,需要提供一种最小化目标函数的顶点聚类方案。在公平聚类问题中,顶点被赋予颜色(例如,属于某个群体),有效聚类的特征可能还包括该聚类中颜色的代表性。先前关于公平聚类的研究假设完全掌握群体成员信息。本文通过概率分配假设群体成员信息的不完全掌握,将先前研究推广至更一般的场景。我们在此更一般的设置下提出了具有近似比保证的聚类算法。同时,我们解决了"度量成员关系"问题,即不同群体具有顺序和距离概念。实验采用我们提出的算法与基线方法进行,以验证我们的方法,并揭示群体成员关系非确定性已知时的细微问题。