From Text to Source: Results in Detecting Large Language Model-Generated Content

The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-like text, has raised concerns about misinformation and ethical implications. Addressing these concerns necessitates the development of robust methods to detect and attribute text generated by LLMs. This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The study comprehensively explores various LLM sizes and families, and assesses the impact of conversational fine-tuning techniques, quantization, and watermarking on classifier generalization. The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection. Our results reveal several key findings: a clear inverse relationship between classifier effectiveness and model size, with larger LLMs being more challenging to detect, especially when the classifier is trained on data from smaller models. Training on data from similarly sized LLMs can improve detection performance from larger models but may lead to decreased performance when dealing with smaller models. Additionally, model attribution experiments show promising results in identifying source models and model families, highlighting detectable signatures in LLM-generated text, with particularly remarkable outcomes in watermarking detection, while no detectable signatures of quantization were observed. Overall, our study contributes valuable insights into the interplay of model size, family, and training data in LLM detection and attribution.

翻译：大型语言模型（LLMs）以其生成类人文本的能力而备受赞誉，但其广泛使用引发了关于虚假信息和伦理影响的担忧。应对这些担忧需要开发稳健的方法来检测和归因LLMs生成的文本。本文研究了“跨模型检测”，即评估一个经训练可区分源LLM生成文本与人类撰写文本的分类器，是否能在无需进一步训练的情况下检测出来自目标LLM的文本。该研究全面探讨了各种LLM规模和系列，并评估了对话微调技术、量化和水印对分类器泛化能力的影响。研究还涉及模型归因，包括源模型识别、模型系列和模型规模分类，以及量化和水印检测。我们的结果揭示了几个关键发现：分类器有效性与其规模之间存在明显的反比关系，更大的LLM更难检测，尤其是当分类器基于较小模型生成的数据训练时。在相似规模LLM的数据上训练可以改善对更大模型生成文本的检测性能，但可能导致处理较小模型时性能下降。此外，模型归因实验在识别源模型和模型系列方面显示出令人鼓舞的结果，突出了LLM生成文本中的可检测特征，特别是在水印检测方面取得了显著成果，而量化未观察到可检测特征。总体而言，我们的研究为LLM检测和归因中模型规模、系列和训练数据之间的相互作用提供了宝贵的见解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日