Despite the development of effective deepfake detection models in recent years, several recent studies have demonstrated that biases in the training data utilized to develop deepfake detection models can lead to unfair performance for demographic groups of different races and/or genders. Such can result in these groups being unfairly targeted or excluded from detection, allowing misclassified deepfakes to manipulate public opinion and erode trust in the model. While these studies have focused on identifying and evaluating the unfairness in deepfake detection, no methods have been developed to address the fairness issue of deepfake detection at the algorithm level. In this work, we make the first attempt to improve deepfake detection fairness by proposing novel loss functions to train fair deepfake detection models in ways that are agnostic or aware of demographic factors. Extensive experiments on four deepfake datasets and five deepfake detectors demonstrate the effectiveness and flexibility of our approach in improving the deepfake detection fairness.
翻译:尽管近年来开发出了有效的深度伪造检测模型,但多项最新研究表明,用于训练这些模型的训练数据中的偏差可能导致针对不同种族和/或性别的人口组别的性能不公平。此类偏差可能使这些群体在检测中受到不公正针对或被排除在外,从而使被误分类的深度伪造内容得以操纵公众舆论并削弱对模型的信任。虽然这些研究侧重于识别和评估深度伪造检测中的不公平性,但尚未有方法从算法层面解决深度伪造检测的公平性问题。在本工作中,我们首次尝试通过提出新型损失函数来提升深度伪造检测的公平性,这些函数能以不感知或感知人口统计因素的方式训练公平的深度伪造检测模型。在四个深度伪造数据集和五个深度伪造检测器上的大量实验表明,我们的方法在提升深度伪造检测公平性方面具有有效性和灵活性。