Natural Language Generation has been rapidly developing with the advent of large language models (LLMs). While their usage has sparked significant attention from the general public, it is important for readers to be aware when a piece of text is LLM-generated. This has brought about the need for building models that enable automated LLM-generated text detection, with the aim of mitigating potential negative outcomes of such content. Existing LLM-generated detectors show competitive performances in telling apart LLM-generated and human-written text, but this performance is likely to deteriorate when paraphrased texts are considered. In this study, we devise a new data collection strategy to collect Human & LLM Paraphrase Collection (HLPC), a first-of-its-kind dataset that incorporates human-written texts and paraphrases, as well as LLM-generated texts and paraphrases. With the aim of understanding the effects of human-written paraphrases on the performance of state-of-the-art LLM-generated text detectors OpenAI RoBERTa and watermark detectors, we perform classification experiments that incorporate human-written paraphrases, watermarked and non-watermarked LLM-generated documents from GPT and OPT, and LLM-generated paraphrases from DIPPER and BART. The results show that the inclusion of human-written paraphrases has a significant impact of LLM-generated detector performance, promoting TPR@1%FPR with a possible trade-off of AUROC and accuracy.
翻译:随着大语言模型(LLM)的出现,自然语言生成技术正迅速发展。尽管其应用已引起公众的广泛关注,但读者仍需具备识别LLM生成文本的能力。这催生了构建自动化LLM生成文本检测模型的需求,旨在减轻此类内容可能带来的负面影响。现有LLM生成文本检测器在区分LLM生成文本与人类撰写文本方面表现出色,但其性能在面对改写文本时可能显著下降。本研究设计了一种新的数据收集策略,构建了首个包含人类撰写文本及其改写文本、LLM生成文本及其改写文本的数据集——人类与LLM改写文本集合(HLPC)。为探究人类撰写改写文本对前沿LLM生成文本检测器(OpenAI RoBERTa)与水印检测器性能的影响,我们设计了分类实验,实验数据涵盖人类撰写改写文本、来自GPT与OPT的水印/非水印LLM生成文档,以及来自DIPPER和BART的LLM生成改写文本。实验结果表明,引入人类撰写改写文本会显著影响LLM生成文本检测器的性能,在提升TPR@1%FPR指标的同时,可能以AUROC和准确率为代价形成权衡。