Promoting fairness for deep clustering models in unsupervised clustering settings to reduce demographic bias is a challenging goal. This is because of the limitation of large-scale balanced data with well-annotated labels for sensitive or protected attributes. In this paper, we first evaluate demographic bias in deep clustering models from the perspective of cluster purity, which is measured by the ratio of positive samples within a cluster to their correlation degree. This measurement is adopted as an indication of demographic bias. Then, a novel loss function is introduced to encourage a purity consistency for all clusters to maintain the fairness aspect of the learned clustering model. Moreover, we present a novel attention mechanism, Cross-attention, to measure correlations between multiple clusters, strengthening faraway positive samples and improving the purity of clusters during the learning process. Experimental results on a large-scale dataset with numerous attribute settings have demonstrated the effectiveness of the proposed approach on both clustering accuracy and fairness enhancement on several sensitive attributes.
翻译:在无监督聚类设置中促进深度聚类模型的公平性、以减少人口统计偏差是一项具有挑战性的目标。这是因为大规模、标注了敏感或受保护属性的平衡数据存在局限性。本文首先从聚类纯度的角度评估深度聚类模型的人口统计偏差,聚类纯度通过聚类内正样本与其相关程度的比值来度量。该度量被用作人口统计偏差的指示指标。随后,我们引入一种新颖的损失函数,以鼓励所有聚类保持纯度一致性,从而维持所学聚类模型的公平性。此外,我们提出了一种新型注意力机制——交叉注意力,用于衡量多个聚类之间的相关性,强化远距离正样本,并在学习过程中改善聚类纯度。在包含多种属性设置的大规模数据集上的实验结果表明,所提方法在聚类准确性和针对多个敏感属性的公平性提升方面均具有有效性。