The significant progress in the development of Large Language Models has contributed to blurring the distinction between human and AI-generated text. The increasing pervasiveness of AI-generated text and the difficulty in detecting it poses new challenges for our society. In this paper, we tackle the problem of detecting and attributing AI-generated text by proposing WhosAI, a triplet-network contrastive learning framework designed to predict whether a given input text has been generated by humans or AI and to unveil the authorship of the text. Unlike most existing approaches, our proposed framework is conceived to learn semantic similarity representations from multiple generators at once, thus equally handling both detection and attribution tasks. Furthermore, WhosAI is model-agnostic and scalable to the release of new AI text-generation models by incorporating their generated instances into the embedding space learned by our framework. Experimental results on the TuringBench benchmark of 200K news articles show that our proposed framework achieves outstanding results in both the Turing Test and Authorship Attribution tasks, outperforming all the methods listed in the TuringBench benchmark leaderboards.
翻译:随着大型语言模型的显著发展,人类与AI生成文本之间的界限日益模糊。AI生成文本的日益普及及其检测难度为社会带来了新的挑战。本文通过提出WhosAI——一种基于三元组网络的对比学习框架——来解决AI生成文本的检测与溯源问题,该框架旨在预测给定输入文本是否由人类或AI生成,并揭示文本的作者身份。与现有大多数方法不同,我们提出的框架旨在同时从多个生成器中学习语义相似性表示,从而对检测与溯源任务进行同等处理。此外,WhosAI具有模型无关性,并能通过将新发布的AI文本生成模型所生成的实例纳入本框架学习到的嵌入空间,实现良好的可扩展性。在包含20万篇新闻文章的TuringBench基准测试上的实验结果表明,我们提出的框架在图灵测试和作者归属任务中均取得了优异的表现,其性能超越了TuringBench基准测试排行榜中列出的所有方法。