The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-like text, has raised concerns about misinformation and ethical implications. Addressing these concerns necessitates the development of robust methods to detect and attribute text generated by LLMs. This paper investigates "Cross-Model Detection," evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The study comprehensively explores various LLM sizes and families, and assesses the impact of conversational fine-tuning techniques on classifier generalization. The research also delves into Model Attribution, encompassing source model identification, model family classification, and model size classification. Our results reveal several key findings: a clear inverse relationship between classifier effectiveness and model size, with larger LLMs being more challenging to detect, especially when the classifier is trained on data from smaller models. Training on data from similarly sized LLMs can improve detection performance from larger models but may lead to decreased performance when dealing with smaller models. Additionally, model attribution experiments show promising results in identifying source models and model families, highlighting detectable signatures in LLM-generated text. Overall, our study contributes valuable insights into the interplay of model size, family, and training data in LLM detection and attribution.
翻译:大语言模型(LLMs)以其生成类人文本的能力而备受赞誉,但其广泛应用引发了关于信息误传和伦理影响的担忧。为应对这些担忧,需要开发稳健的方法来检测和归因由LLMs生成的文本。本文研究了“跨模型检测”(Cross-Model Detection),评估一个经训练以区分源LLM生成文本与人类书写文本的分类器,是否也能在无需进一步训练的情况下检测目标LLM的文本。本研究全面探讨了不同规模和系列的LLM,并评估了对话微调技术对分类器泛化能力的影响。研究还深入探讨了模型归因(Model Attribution),涵盖源模型识别、模型系列分类和模型规模分类。我们的结果揭示了几个关键发现:分类器效能与模型规模之间存在明显的反比关系,即更大规模的LLM更难被检测,尤其是当分类器基于较小模型的数据训练时。基于相似规模LLM的数据训练可提升对较大模型的检测性能,但可能导致对较小模型的性能下降。此外,模型归因实验在识别源模型和模型系列方面展现了良好结果,凸显了LLM生成文本中的可检测特征。总体而言,我们的研究为理解模型规模、系列及训练数据在LLM检测与归因中的相互作用提供了宝贵见解。