Although effective deepfake detection models have been developed in recent years, recent studies have revealed that these models can result in unfair performance disparities among demographic groups, such as race and gender. This can lead to particular groups facing unfair targeting or exclusion from detection, potentially allowing misclassified deepfakes to manipulate public opinion and undermine trust in the model. The existing method for addressing this problem is providing a fair loss function. It shows good fairness performance for intra-domain evaluation but does not maintain fairness for cross-domain testing. This highlights the significance of fairness generalization in the fight against deepfakes. In this work, we propose the first method to address the fairness generalization problem in deepfake detection by simultaneously considering features, loss, and optimization aspects. Our method employs disentanglement learning to extract demographic and domain-agnostic forgery features, fusing them to encourage fair learning across a flattened loss landscape. Extensive experiments on prominent deepfake datasets demonstrate our method's effectiveness, surpassing state-of-the-art approaches in preserving fairness during cross-domain deepfake detection. The code is available at https://github.com/Purdue-M2/Fairness-Generalization
翻译:尽管近年来已开发出有效的深度伪造检测模型,但最新研究表明,这些模型可能导致不同人口群体(如种族和性别)之间出现不公平的性能差异。这可能导致特定群体面临不公平的针对性检测或被排除在检测之外,从而使被错误分类的深度伪造内容可能操纵公众舆论并削弱对模型的信任。现有解决该问题的方法是引入公平损失函数,该方法在域内评估中表现出良好的公平性能,但在跨域测试中无法维持公平性。这凸显了在对抗深度伪造中公平性泛化的重要性。本文提出首个同时考虑特征、损失和优化三个方面的深度伪造检测公平性泛化问题解决方法。该方法采用解耦学习提取人口属性无关与域无关的伪造特征,通过特征融合在平坦化损失景观上促进公平学习。在主流深度伪造数据集上的大量实验表明,本方法在跨域深度伪造检测中保持公平性的能力超越现有最优方法,有效性得到充分验证。代码已开源:https://github.com/Purdue-M2/Fairness-Generalization