Large language models (LLMs) such as GPT-4, PaLM, and Llama have significantly propelled the generation of AI-crafted text. With rising concerns about their potential misuse, there is a pressing need for AI-generated-text forensics. Neural authorship attribution is a forensic effort, seeking to trace AI-generated text back to its originating LLM. The LLM landscape can be divided into two primary categories: proprietary and open-source. In this work, we delve into these emerging categories of LLMs, focusing on the nuances of neural authorship attribution. To enrich our understanding, we carry out an empirical analysis of LLM writing signatures, highlighting the contrasts between proprietary and open-source models, and scrutinizing variations within each group. By integrating stylometric features across lexical, syntactic, and structural aspects of language, we explore their potential to yield interpretable results and augment pre-trained language model-based classifiers utilized in neural authorship attribution. Our findings, based on a range of state-of-the-art LLMs, provide empirical insights into neural authorship attribution, paving the way for future investigations aimed at mitigating the threats posed by AI-generated misinformation.
翻译:大型语言模型(如GPT-4、PaLM和Llama)显著推动了人工智能生成文本的发展。随着对其潜在滥用问题的日益关注,AI生成文本取证的需求变得尤为迫切。神经作者归属作为一项取证研究,旨在将AI生成文本追溯至其源发性大型语言模型。当前语言模型生态可分为两大主要类别:专有模型与开源模型。本研究深入探究这两类新兴语言模型,聚焦神经作者归属的细微差异。为深化理解,我们开展针对语言模型写作特征的经验分析,重点揭示专有模型与开源模型间的差异,并审视各组内部的变异规律。通过整合语言词汇、句法和结构层面的文体特征,我们探讨这些特征在产生可解释性结果方面的潜力,以及增强基于预训练语言模型的分类器在神经作者归属任务中的效能。基于多种前沿语言模型的实验发现,本研究为神经作者归属提供了经验性洞见,为后续缓解AI生成虚假信息威胁的研究奠定了基础。