Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

With the recent proliferation of Large Language Models (LLMs), there has been an increasing demand for tools to detect machine-generated text. The effective detection of machine-generated text face two pertinent problems: First, they are severely limited in generalizing against real-world scenarios, where machine-generated text is produced by a variety of generators, including but not limited to GPT-4 and Dolly, and spans diverse domains, ranging from academic manuscripts to social media posts. Second, existing detection methodologies treat texts produced by LLMs through a restrictive binary classification lens, neglecting the nuanced diversity of artifacts generated by different LLMs. In this work, we undertake a systematic study on the detection of machine-generated text in real-world scenarios. We first study the effectiveness of state-of-the-art approaches and find that they are severely limited against text produced by diverse generators and domains in the real world. Furthermore, t-SNE visualizations of the embeddings from a pretrained LLM's encoder show that they cannot reliably distinguish between human and machine-generated text. Based on our findings, we introduce a novel system, T5LLMCipher, for detecting machine-generated text using a pretrained T5 encoder combined with LLM embedding sub-clustering to address the text produced by diverse generators and domains in the real world. We evaluate our approach across 9 machine-generated text systems and 9 domains and find that our approach provides state-of-the-art generalization ability, with an average increase in F1 score on machine-generated text of 19.6\% on unseen generators and domains compared to the top performing existing approaches and correctly attributes the generator of text with an accuracy of 93.6\%.

翻译：随着大语言模型（LLMs）的近期普及，对检测机器生成文本工具的需求日益增长。机器生成文本的有效检测面临两个关键问题：首先，现有方法在泛化至现实场景方面严重受限，这些场景中机器生成文本由多种生成器产生（包括但不限于GPT-4和Dolly），并涵盖从学术手稿到社交媒体帖子的多元领域。其次，现有检测方法将LLMs生成的文本局限于二元分类视角，忽视了不同LLMs所产生伪影的细微多样性。本文针对现实场景中机器生成文本的检测展开系统性研究。我们首先评估了现有前沿方法的有效性，发现其在应对现实世界中多样生成器和领域产生的文本时严重受限。此外，预训练LLM编码器的t-SNE嵌入可视化显示，其无法可靠区分人类与机器生成文本。基于研究发现，我们提出了一种新颖系统T5LLMCipher，利用预训练T5编码器结合LLM嵌入子聚类技术，专门处理现实世界中多样生成器和领域产生的文本。我们在9个机器生成文本系统和9个领域上评估该方法，发现其提供了最先进的泛化能力：与现有最优方法相比，在未见过的生成器和领域上，机器生成文本的F1分数平均提升19.6%，且文本生成器归属准确率达93.6%。