Machine learning models automatically learn discriminative features from the data, and are therefore susceptible to learn strongly-correlated biases, such as using protected attributes like gender and race. Most existing bias mitigation approaches aim to explicitly reduce the model's focus on these protected features. In this work, we propose to mitigate bias by explicitly guiding the model's focus towards task-relevant features using domain knowledge, and we hypothesize that this can indirectly reduce the dependence of the model on spurious correlations it learns from the data. We explore bias mitigation in facial expression recognition systems using facial Action Units (AUs) as the task-relevant feature. To this end, we introduce Feature-based Positive Matching Contrastive Loss which learns the distances between the positives of a sample based on the similarity between their corresponding AU embeddings. We compare our approach with representative baselines and show that incorporating task-relevant features via our method can improve model fairness at minimal cost to classification performance.
翻译:机器学习模型自动从数据中学习判别性特征,因此容易学习强相关的偏差,例如利用性别和种族等受保护属性。现有的偏差缓解方法大多旨在明确减少模型对这些受保护特征的关注。在这项工作中,我们提出通过利用领域知识明确引导模型关注任务相关特征来缓解偏差,并假设这能间接减少模型对从数据中学到的虚假相关性的依赖。我们探索利用面部动作单元(AUs)作为任务相关特征来缓解面部表情识别系统中的偏差。为此,我们引入基于特征的正匹配对比损失(Feature-based Positive Matching Contrastive Loss),该损失基于样本对应AU嵌入的相似性学习正样本对之间的距离。我们将我们的方法与代表性基线进行比较,结果表明,通过我们的方法融入任务相关特征,可以在分类性能损失最小的情况下提高模型公平性。