Due to the rapid development of text generation models, people increasingly often encounter texts that may start out as written by a human but then continue as machine-generated results of large language models. Detecting the boundary between human-written and machine-generated parts of such texts is a very challenging problem that has not received much attention in literature. In this work, we consider and compare a number of different approaches for this artificial text boundary detection problem, comparing several predictors over features of different nature. We show that supervised fine-tuning of the RoBERTa model works well for this task in general but fails to generalize in important cross-domain and cross-generator settings, demonstrating a tendency to overfit to spurious properties of the data. Then, we propose novel approaches based on features extracted from a frozen language model's embeddings that are able to outperform both the human accuracy level and previously considered baselines on the Real or Fake Text benchmark. Moreover, we adapt perplexity-based approaches for the boundary detection task and analyze their behaviour. We analyze the robustness of all proposed classifiers in cross-domain and cross-model settings, discovering important properties of the data that can negatively influence the performance of artificial text boundary detection algorithms.
翻译:由于文本生成模型的快速发展,人们日益频繁地遇到这样的文本:它们可能以人类撰写开头,随后却转为大型语言模型生成的机器内容。检测此类文本中人类撰写与机器生成部分之间的边界是一个极具挑战性的问题,目前文献中对此关注有限。本研究针对这一人工文本边界检测问题,考虑并比较了多种不同方法,通过对比基于不同性质特征的多个预测器,我们发现:对RoBERTa模型进行监督微调虽然在通用场景下表现良好,但在重要的跨领域和跨生成器设定中难以泛化,呈现出对数据虚假特性过拟合的倾向。为此,我们提出了基于冻结语言模型嵌入特征的新方法,该方法在"真或假文本"基准测试中既超越了人类准确率水平,也优于先前建立的基线模型。进一步地,我们将基于困惑度的检测方法适配至边界检测任务并分析其行为特征。通过系统分析所有分类器在跨领域与跨模型设定中的鲁棒性,我们揭示了可能对人工文本边界检测算法性能产生负面影响的重要数据特性。