DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection

The tremendous recent advances in generative artificial intelligence techniques have led to significant successes and promise in a wide range of different applications ranging from conversational agents and textual content generation to voice and visual synthesis. Amid the rise in generative AI and its increasing widespread adoption, there has been significant growing concern over the use of generative AI for malicious purposes. In the realm of visual content synthesis using generative AI, key areas of significant concern has been image forgery (e.g., generation of images containing or derived from copyright content), and data poisoning (i.e., generation of adversarially contaminated images). Motivated to address these key concerns to encourage responsible generative AI, we introduce the DeepfakeArt Challenge, a large-scale challenge benchmark dataset designed specifically to aid in the building of machine learning algorithms for generative AI art forgery and data poisoning detection. Comprising of over 32,000 records across a variety of generative forgery and data poisoning techniques, each entry consists of a pair of images that are either forgeries / adversarially contaminated or not. Each of the generated images in the DeepfakeArt Challenge benchmark dataset \footnote{The link to the dataset: http://anon\_for\_review.com} has been quality checked in a comprehensive manner.

翻译：生成式人工智能技术近年来的巨大进展，已在对话代理、文本内容生成、语音与视觉合成等广泛领域取得了显著成功并展现出巨大潜力。随着生成式AI的兴起及其日益广泛的应用，人们对其可能被用于恶意目的的担忧也显著增长。在利用生成式AI进行视觉内容合成的领域中，图像伪造（例如生成包含或衍生自受版权保护内容的图像）与数据投毒（即生成对抗性污染的图像）已成为备受关注的关键问题。为应对这些关键问题以促进负责任的生成式AI发展，我们推出了DeepfakeArt挑战赛——一个专门设计的大规模挑战基准数据集，旨在助力构建用于检测生成式AI艺术伪造与数据投毒的机器学习算法。该数据集涵盖多种生成式伪造与数据投毒技术，包含超过32,000条记录，每条记录由一对图像组成，分别标记为伪造/对抗污染或正常图像。DeepfakeArt挑战赛基准数据集中的每张生成图像均已通过全面质量检查\footnote{数据集链接：http://anon\_for\_review.com}。

相关内容

生成式人工智能

关注 0

生成式人工智能是利用复杂的算法、模型和规则，从大规模数据集中学习，以创造新的原创内容的人工智能技术。这项技术能够创造文本、图片、声音、视频和代码等多种类型的内容，全面超越了传统软件的数据处理和分析能力。2022年末，OpenAI推出的ChatGPT标志着这一技术在文本生成领域取得了显著进展，2023年被称为生成式人工智能的突破之年。这项技术从单一的语言生成逐步向多模态、具身化快速发展。在图像生成方面，生成系统在解释提示和生成逼真输出方面取得了显著的进步。同时，视频和音频的生成技术也在迅速发展，这为虚拟现实和元宇宙的实现提供了新的途径。生成式人工智能技术在各行业、各领域都具有广泛的应用前景。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日