The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-like text, has raised concerns about misinformation and ethical implications. Addressing these concerns necessitates the development of robust methods to detect and attribute text generated by LLMs. This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The study comprehensively explores various LLM sizes and families, and assesses the impact of conversational fine-tuning techniques, quantization, and watermarking on classifier generalization. The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection. Our results reveal several key findings: a clear inverse relationship between classifier effectiveness and model size, with larger LLMs being more challenging to detect, especially when the classifier is trained on data from smaller models. Training on data from similarly sized LLMs can improve detection performance from larger models but may lead to decreased performance when dealing with smaller models. Additionally, model attribution experiments show promising results in identifying source models and model families, highlighting detectable signatures in LLM-generated text, with particularly remarkable outcomes in watermarking detection, while no detectable signatures of quantization were observed. Overall, our study contributes valuable insights into the interplay of model size, family, and training data in LLM detection and attribution.
翻译:大型语言模型(LLMs)以其生成类人文本的能力而备受赞誉,但其广泛使用引发了关于虚假信息和伦理影响的担忧。应对这些担忧需要开发稳健的方法来检测和归因LLMs生成的文本。本文研究了“跨模型检测”,即评估一个经训练可区分源LLM生成文本与人类撰写文本的分类器,是否能在无需进一步训练的情况下检测出来自目标LLM的文本。该研究全面探讨了各种LLM规模和系列,并评估了对话微调技术、量化和水印对分类器泛化能力的影响。研究还涉及模型归因,包括源模型识别、模型系列和模型规模分类,以及量化和水印检测。我们的结果揭示了几个关键发现:分类器有效性与其规模之间存在明显的反比关系,更大的LLM更难检测,尤其是当分类器基于较小模型生成的数据训练时。在相似规模LLM的数据上训练可以改善对更大模型生成文本的检测性能,但可能导致处理较小模型时性能下降。此外,模型归因实验在识别源模型和模型系列方面显示出令人鼓舞的结果,突出了LLM生成文本中的可检测特征,特别是在水印检测方面取得了显著成果,而量化未观察到可检测特征。总体而言,我们的研究为LLM检测和归因中模型规模、系列和训练数据之间的相互作用提供了宝贵的见解。