Algorithmic fairness has become an important machine learning problem, especially for mission-critical Web applications. This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations. Unlike existing models that target a single type of fairness, our model jointly optimizes for two fairness criteria - group fairness and counterfactual fairness - and hence makes fairer predictions at both the group and individual levels. Our model uses contrastive loss to generate embeddings that are indistinguishable for each protected group, while forcing the embeddings of counterfactual pairs to be similar. It then uses a self-knowledge distillation method to maintain the quality of representation for the downstream tasks. Extensive analysis over multiple datasets confirms the model's validity and further shows the synergy of jointly addressing two fairness criteria, suggesting the model's potential value in fair intelligent Web applications.
翻译:算法公平已成为机器学习领域的重要问题,尤其对关键性网络应用而言。本文提出一种名为DualFair的自监督模型,能够从学习到的表示中消除性别、种族等敏感属性的偏差。与现有仅针对单一公平性类型的模型不同,本模型联合优化群体公平与反事实公平两类准则,从而在群体和个体两个层面实现更公平的预测。该模型采用对比损失生成各受保护组间难以区分的嵌入表示,同时迫使反事实对的嵌入向量趋于相似。随后通过自知识蒸馏方法维持下游任务所需表示质量。跨多个数据集的广泛分析验证了模型有效性,进一步揭示了联合处理两类公平准则的协同效应,表明该模型在公平智能网络应用中的潜在价值。