DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation. Detecting DeepFakes is currently solved with programmed machine learning algorithms. In this work, we investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection. We conducted qualitative and quantitative experiments to demonstrate multimodal LLMs and show that they can expose AI-generated images through careful experimental design and prompt engineering. This is interesting, considering that LLMs are not inherently tailored for media forensic tasks, and the process does not require programming. We discuss the limitations of multimodal LLMs for these tasks and suggest possible improvements.
翻译:深度伪造(DeepFakes)指由人工智能生成的媒体内容,因其被用作虚假信息的传播手段而日益引发关注。当前,深度伪造检测主要通过编程机器学习算法实现。本文探究了多模态大语言模型(LLMs)在深度伪造检测中的能力。我们通过定性与定量实验,论证了多模态大语言模型能够通过精心设计的实验方案与提示工程(prompt engineering)识别AI生成图像。这一发现颇具意义,因为大语言模型并非专门为媒体取证任务而设计,且该过程无需编程。我们讨论了多模态大语言模型在此类任务中的局限性,并提出了可能的改进方向。