ChatGPT and other general large language models (LLMs) have achieved remarkable success, but they have also raised concerns about the misuse of AI-generated texts. Existing AI-generated text detection models, such as based on BERT and RoBERTa, are prone to in-domain over-fitting, leading to poor out-of-domain (OOD) detection performance. In this paper, we first collected Chinese text responses generated by human experts and 9 types of LLMs, for which to multiple domains questions, and further created a dataset that mixed human-written sentences and sentences polished by LLMs. We then proposed LLM-Detector, a novel method for both document-level and sentence-level text detection through Instruction Tuning of LLMs. Our method leverages the wealth of knowledge LLMs acquire during pre-training, enabling them to detect the text they generate. Instruction tuning aligns the model's responses with the user's expected text detection tasks. Experimental results show that previous methods struggle with sentence-level AI-generated text detection and OOD detection. In contrast, our proposed method not only significantly outperforms baseline methods in both sentence-level and document-level text detection but also demonstrates strong generalization capabilities. Furthermore, since LLM-Detector is trained based on open-source LLMs, it is easy to customize for deployment.
翻译:ChatGPT及其他通用大语言模型(LLMs)取得了显著成功,但也引发了对AI生成文本被滥用的担忧。现有AI生成文本检测模型(如基于BERT和RoBERTa的模型)容易出现领域内过拟合,导致跨域(OOD)检测性能较差。本文首先收集了人类专家及9种LLM针对多领域问题生成的中文文本回复,并进一步构建了一个混合人工撰写句子与LLM润色句子的数据集。随后,我们提出LLM-Detector,一种通过LLM指令微调实现文档级与句子级文本检测的新方法。该方法利用LLM在预训练阶段习得的知识积累,使其能够检测自身生成的文本。指令微调将模型响应与用户预期的文本检测任务对齐。实验结果表明,现有方法在句子级AI生成文本检测及跨域检测中表现欠佳,而本文方法不仅在句子级与文档级文本检测中显著优于基线方法,还展现出强大的泛化能力。此外,由于LLM-Detector基于开源LLM训练,易于定制化部署。