The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robust methods for detecting machine-generated content are still in the early stages of development. In this paper, we introduce RU-AI, a new large-scale multimodal dataset designed for the robust and efficient detection of machine-generated content in text, image, and voice. Our dataset is constructed from three large publicly available datasets: Flickr8K, COCO, and Places205, by combining the original datasets and their corresponding machine-generated pairs. Additionally, experimental results show that our proposed unified model, which incorporates a multimodal embedding module with a multilayer perceptron network, can effectively determine the origin of the data (i.e., original data samples or machine-generated ones) from RU-AI. However, future work is still required to address the remaining challenges posed by RU-AI. The source code and dataset are available at https://github.com/ZhihaoZhang97/RU-AI.
翻译:生成式人工智能模型的最新进展能够生成逼真且类人化的内容,正显著改变人们的交流、创作与工作方式。虽然合理使用生成式AI模型可造福社会,但其滥用对数据可靠性和真实性认证构成重大威胁。然而,由于缺乏对齐的多模态数据集,目前针对机器生成内容的有效且鲁棒的检测方法仍处于早期发展阶段。本文提出RU-AI——一个面向文本、图像和语音中机器生成内容鲁棒高效检测的新型大规模多模态数据集。该数据集通过整合Flickr8K、COCO和Places205三个大型公开数据集及其对应的机器生成配对样本构建而成。此外,实验结果表明,我们提出的融合多模态嵌入模块与多层感知机网络的统一模型,能够有效判定RU-AI数据来源(即原始数据样本或机器生成样本)。但未来仍需进一步研究以解决RU-AI带来的剩余挑战。源代码与数据集发布于https://github.com/ZhihaoZhang97/RU-AI。