Despite the development of effective deepfake detection models in recent years, several recent studies have demonstrated that biases in the training data utilized to develop deepfake detection models can lead to unfair performance for demographic groups of different races and/or genders. Such can result in these groups being unfairly targeted or excluded from detection, allowing misclassified deepfakes to manipulate public opinion and erode trust in the model. While these studies have focused on identifying and evaluating the unfairness in deepfake detection, no methods have been developed to address the fairness issue of deepfake detection at the algorithm level. In this work, we make the first attempt to improve deepfake detection fairness by proposing novel loss functions to train fair deepfake detection models in ways that are agnostic or aware of demographic factors. Extensive experiments on four deepfake datasets and five deepfake detectors demonstrate the effectiveness and flexibility of our approach in improving the deepfake detection fairness.
翻译:尽管近年来开发出了有效的深度伪造检测模型,但几项最新研究表明,用于训练深度伪造检测模型的训练数据中的偏差可能导致模型针对不同种族和/或性别的人口群体表现出不公平的性能。这可能导致这些群体被不公平地针对或排除在检测之外,使得被误判的深度伪造内容得以操纵公众舆论并侵蚀对模型的信任。虽然这些研究侧重于识别和评估深度伪造检测中的不公平性,但尚未有方法在算法层面解决深度伪造检测的公平性问题。在本工作中,我们首次尝试通过提出新的损失函数来改进深度伪造检测的公平性,这些函数能以忽视或感知人口统计因素的方式训练公平的深度伪造检测模型。在四个深度伪造数据集和五个深度伪造检测器上的大量实验证明了我们的方法在提升深度伪造检测公平性方面的有效性和灵活性。