DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis

The rapid progress in deep learning has given rise to hyper-realistic facial forgery methods, leading to concerns related to misinformation and security risks. Existing face forgery datasets have limitations in generating high-quality facial images and addressing the challenges posed by evolving generative techniques. To combat this, we present DiffusionFace, the first diffusion-based face forgery dataset, covering various forgery categories, including unconditional and Text Guide facial image generation, Img2Img, Inpaint, and Diffusion-based facial exchange algorithms. Our DiffusionFace dataset stands out with its extensive collection of 11 diffusion models and the high-quality of the generated images, providing essential metadata and a real-world internet-sourced forgery facial image dataset for evaluation. Additionally, we provide an in-depth analysis of the data and introduce practical evaluation protocols to rigorously assess discriminative models' effectiveness in detecting counterfeit facial images, aiming to enhance security in facial image authentication processes. The dataset is available for download at \url{https://github.com/Rapisurazurite/DiffFace}.

翻译：深度学习的飞速发展催生了超逼真的人脸伪造方法，引发了与虚假信息和安全风险相关的担忧。现有的人脸伪造数据集在生成高质量人脸图像以及应对不断演变的生成技术所带来挑战方面存在局限性。为解决此问题，我们提出DiffusionFace——首个基于扩散的面部伪造数据集，涵盖多种伪造类别，包括无条件与文本引导的人脸图像生成、图像到图像（Img2Img）、修复（Inpaint）及基于扩散的面部交换算法。我们的DiffusionFace数据集凭借涵盖11个扩散模型的广泛收集、生成图像的高质量、提供关键元数据以及用于评估的互联网来源的真实世界伪造人脸图像数据集而独具特色。此外，我们对该数据进行了深入分析，并引入实用的评估协议，以严格检验判别模型在检测伪造人脸图像方面的有效性，旨在增强人脸图像认证过程的安全性。该数据集可在\url{https://github.com/Rapisurazurite/DiffFace}下载。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日