This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation. Instead of adding an additional classification layer to a base LM, we reframe the classification task as a next-token prediction task and directly fine-tune the base LM to perform it. We utilize the Text-to-Text Transfer Transformer (T5) model as the backbone for our experiments. We compared our approach to the more direct approach of utilizing hidden states for classification. Evaluation shows the exceptional performance of our method in the text classification task, highlighting its simplicity and efficiency. Furthermore, interpretability studies on the features extracted by our model reveal its ability to differentiate distinctive writing styles among various LLMs even in the absence of an explicit classifier. We also collected a dataset named OpenLLMText, containing approximately 340k text samples from human and LLMs, including GPT3.5, PaLM, LLaMA, and GPT2.
翻译:本文提出一种识别文本生成过程中可能涉及的大语言模型(LLMs)的新方法。我们并未在基础语言模型上额外添加分类层,而是将分类任务重构为下一词元预测任务,并直接对基础语言模型进行微调以实现该目标。实验采用文本到文本迁移转换器(T5)模型作为主干网络,并与直接利用隐状态进行分类的方法进行对比。评估结果表明,该方法在文本分类任务中表现出卓越性能,凸显其简洁性与高效性。此外,针对模型提取特征的可解释性研究发现,即便在无显式分类器的情况下,本方法仍能区分不同LLM独特的写作风格。我们同时构建了名为OpenLLMText的数据集,包含约34万份人类及各类LLM生成的文本样本,涵盖GPT3.5、PaLM、LLaMA和GPT2等模型。