This paper studies learning fair encoders in a self-supervised learning (SSL) setting, in which all data are unlabeled and only a small portion of them are annotated with sensitive attribute. Adversarial fair representation learning is well suited for this scenario by minimizing a contrastive loss over unlabeled data while maximizing an adversarial loss of predicting the sensitive attribute over the data with sensitive attribute. Nevertheless, optimizing adversarial fair representation learning presents significant challenges due to solving a non-convex non-concave minimax game. The complexity deepens when incorporating a global contrastive loss that contrasts each anchor data point against all other examples. A central question is ``{\it can we design a provable yet efficient algorithm for solving adversarial fair self-supervised contrastive learning}?'' Building on advanced optimization techniques, we propose a stochastic algorithm dubbed SoFCLR with a convergence analysis under reasonable conditions without requring a large batch size. We conduct extensive experiments to demonstrate the effectiveness of the proposed approach for downstream classification with eight fairness notions.
翻译:本文研究在自监督学习(SSL)设置下学习公平编码器,其中所有数据均未标注,仅有少量数据带有敏感属性标注。对抗公平表示学习通过最小化未标注数据的对比损失,同时最大化带有敏感属性数据预测敏感属性的对抗损失,非常适合此场景。然而,由于需要求解非凸非凹极小极大博弈问题,优化对抗公平表示学习面临显著挑战。当引入全局对比损失(将每个锚点数据与所有其他样本进行对比)时,问题复杂性进一步加深。核心问题是:能否设计一种可证明且高效的算法来解决对抗公平自监督对比学习?基于先进的优化技术,我们提出了一种名为SoFCLR的随机算法,并在合理条件下给出了收敛性分析,且无需大批量处理。我们通过大量实验验证了所提方法在八种公平性准则下进行下游分类任务的有效性。